This post is only for educational and non-commercial purposes.
Many times, we have to check check how our website ranks for a particular query in Google/Yahoo/Bing. Ideally, we should be there on the first page. But if we are competing against established websites, on a general and competitive query, then we might come on the first page as soon as we publish our content, targeting that query.
In this project, my objective is to create a dashboard, which displays:
- How many times does my site appears for a particular query on Google/Yahoo/Bing?
- On pages does my site appears?
- What keywords from my content is Google/Yahoo/Bing matching with the query?
- What all sites are coming on the first page? and what keywords are they using?
- How are my site’s search results improving?
How many times does my site appears for a particular query on Google/Yahoo/Bing?
I have taken care of this problem (as well as point#2) by writing a simple script that searches a domain name against a query on Google.
For Google, we can also use Google Search API. However I have tried to write a common script for all search engines, and for educating readers about simple Python modules.
E.g. I want to search “Best hotel prices” and I want to see how many times “tripadvisor.com” appears in first 10 pages of Google search.
The result looks like this:
And following is the code outline:
1) Import modules: I have used Requests and Beautiful Soup. Though I was able to get the output from these two, I also tried Re (Regular Expression) and Selenium to create a backup plan.
2) I used input to enter the domain and Query, and use Requests to fetch the URL of Google, Yahoo and Bing search results for the query.
3) I have used Beautiful Soup to parse the HTML of first page, followed by a For-loop and simple text search to search my domain in the search results.
4) Then I selected the links of next nine pages from footer (class:f1), and converted them into URL by adding the search engine’s domain (https://www.google.com in case of Google).
5) Finally a For-loop was used to parse each of the nine pages, which in turn contained a nested loop for searching my domain with the search results. Alternately, I could have created a function while searching the first page, and called it here as well.
Next phase of this project is using NLTK to find the keywords that are helping the search results on the first page of search engines.