Prerequisites
LEN
Eliminating Duplicates
Columns From Text
CONCATENATE
SEARCH/FIND
The capacity to work with larger datasets and automate tedious activities are just a couple of the many advantages of learning to code, whether with Python, JavaScript, or another programming language.
However, despite the advantages, I entirely see why many SEO specialists haven't made the switch. We're all busy, and it's not a prerequisite for SEO.
It may feel like you're reinventing the wheel if you need to do a task quickly and you already know how to do it in Excel or Google Sheets.
It took me a while to get to the point where Python is my default option for data processing because when I first started coding, I mainly used it for things that Excel couldn't do.
Looking back, I'm so glad I persisted, although there were times when it was difficult, requiring hours of scouring Stack Overflow forums.
Other SEO experts can avoid the same tragedy by reading this post.
In it, we'll discuss the Python counterparts of the most popular Excel formulas and tools for analysing SEO data; all of these tools are accessible in the Google Colab notebook that is referenced in the summary.
Download these FREE Ebooks:
1.
2. Website Planning and Creation
You can check other related blogs below:
1. Powerful SEO Techniques to rank in Google
2. How to get powerful SEO backlinks? Top 10 Tips to get Backlinks
3. Search Intent - All You Should know
4. What is page experience in Digital marketing?
5. SEO Vs PPC: Which is beneficial?
6. 7 Tips for combine Website Content to Improve SEO
7. 6 Reasons Email Marketing increase holiday sales
8. 6 SEO hacks to revive your Website
You will discover the equivalents of:
- LEN.
- Eliminate Duplicates.
- Columns from Text.
- SEARCH/FIND.
- CONCATENATE.
- Locate and replace
- LEFT/MID/RIGHT.
- IF.
- IFS.
- VLOOKUP.
- COUNTIF/SUMIF/AVERAGEIF.
- a pivot table
Amazingly, we'll be relying primarily on Pandas, with a bit of assistance from its larger brother NumPy, to do all of this.
Prerequisites
We won't be discussing a few items today due to time constraints, including:
- Python installation.
- Simple Pandas functions like filtering, previewing data frames, and importing CSVs.
If you have any questions, Hamlet's introduction to Python data analysis for SEO is the best resource.
Without further ado, let's get started.
LEN
A count of the characters in a text string is given by LEN.
A typical use case for length measurement in SEO is to check whether title tags or meta descriptions will be abbreviated in search results.
If we wanted to count the second cell in column A in Excel, we would type:
=LEN(A2)
Not too dissimilar is Python, where we can use the built-in lens function in conjunction with Pandas' loc[] to retrieve a specific row of data within a column of data:
len(df['Title'].loc[0])
In this illustration, the "Title" column of our dataframe's first row is used to represent the length.
However, knowing a cell's length isn't really helpful for SEO. In a normal situation, we'd want to apply a function to every single column!
This may be done in Excel by either double-clicking or dragging the formula cell in the bottom right corner downward.
We can use the Pandas data frames str.len function to get the number of rows in a series and then put the results in a new column:
df['Length'] = df['Title'].str.len()
A "vectorized" action called str.len is made to be applied concurrently to a number of values. Since they nearly always end up being quicker than a loop, we'll utilise these operations a lot in this article.
LEN is frequently used in conjunction with SUBSTITUTE to count the number of words in a cell:
=LEN(TRIM(A2)) - =LEN(SUBSTITUTE(A2," ","") + 1
By using the str. split and str.len functions in Pandas, we can accomplish this:
Title = df['Title'].str.split().len df['No. Words'] = ()
More specifically, what we're doing with str. split is splitting our data based on the presence of whitespace in the string, and then counting the number of component parts.
Eliminating Duplicates
By deleting totally duplicate rows (when all columns are selected) or by removing rows with the same values in particular columns, Excel's "Eliminate Duplicates" tool makes it simple to remove duplicate values from a dataset.
Drop duplicates in Pandas provide this functionality.
To remove redundant rows from a data frame type:
df.drop duplicates(inplace=True)
Include the subset parameter to remove rows based on duplication in a single column:
drop duplicates (subset = "column", inplace = True)
Alternatively, provide a list's numerous columns:
drop duplicates (subset = ['column', 'column2'], inplace = True)
The replace parameter is one addition from the list above that deserves special attention. With inplace=True, we can replace our current data frame without having to make a new one.
Of course, there are situations when we want to keep the raw data. If so, we can designate a different variable to which we can assign our deduped data frame:
Df2 equals df.drop duplicates(subset='column')
Columns From Text
The "text to columns" feature, another indispensable tool, allows you to divide a text string based on a delimiter like a slash, comma, or whitespace.
Dividing a URL into its domain and various subfolders as an illustration.
To remove redundant rows from a data frame type:
df.drop duplicates(inplace=True)
Include the subset parameter to remove rows based on duplication in a single column:
drop duplicates (subset = "column", inplace = True)
Alternatively, provide a list's numerous columns:
drop duplicates (subset = ['column', 'column2'], inplace = True)
The replace parameter is one addition from the list above that deserves special attention. With inplace=True, we can replace our current data frame without having to make a new one.
Of course, there are situations when we want to keep the raw data. If so, we can designate a different variable to which we can assign our deduped data frame:
Df2 equals df.drop duplicates(subset='column')
CONCATENATE
With the use of various modifiers, users can create lists of keywords by combining numerous text strings using the CONCAT function.
In this scenario, we're expanding the list of product categories in column A to include "mens" and whitespace:
A2: =CONCAT($F$1, ","
If we're working with strings, Python's arithmetic operator can accomplish the same thing:
Mens +'+'+ df['Keyword'] = df['Combined]
Alternately, specify several data columns:
"Combined" = "Subdomain" + "URL"
Although Pandas has a separate concat method, using it to combine multiple data frames with the same columns is more advantageous.
For instance, if our preferred link analysis tool had produced multiple exports:
df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')
dflist = [df, df2, df3]
df = pd.concat(dflist, ignore_index=True)
The SEARCH and FIND formulas provide a way of locating a substring within a text string.
These instructions are frequently used in conjunction with ISNUMBER to generate a Boolean column that aids in dataset filtering, which may be quite beneficial when carrying out activities like log file analysis, as shown in this article.
                                                                                                                   
                                        
                                        
                                                                                                        E.g.:
=ISNUMBER(SEARCH("searchthis",A2)
In this scenario, we're expanding the list of product categories in column A to include "mens" and whitespace:
A2: =CONCAT($F$1, ","
The case sensitivity of find makes it different from SEARCH.
Str.contains, the analogous Pandas function, is case-sensitive by default:
Journal = Journal + URL.
engine, na=False, str.contains
Setting the case argument to False will enable case insensitivity:
Journal = Journal + URL.
In either case, adding na=False will stop null values from being returned within the Boolean column.
                                                                                                                   
                                        
                                        
                                                                                                        str.contains("engine", case=False, na=False)
Using Pandas has a number of benefits, including the fact that this function natively supports regex, unlike Excel and Google Sheets via REGEXMATCH.
Using the pipe character, also known as the OR operator, you can combine multiple substrings:
Journal = Journal + URL.
engine|search, na=F, str.contains
Is Python a more effective data analysis tool than Excel?
Python has gained popularity as more people have become aware of its powers and potential, despite the fact that it technically offers different functionality than Excel. Many developers and the larger data science community believe it to be a superior data analysis tool.Can Python take the place of Excel?
Python is a programming language, and it may be used to create a wide range of programmes in addition to data management. It goes without saying that learning to code is a prerequisite for using Python.
 
                                 
         
        