A Web Scraper to reduce your manual work - Yes please!
- supriyamalla
- May 5, 2022
- 2 min read
Have you or your team faced a business problem where the clients ask you to look up information of companies/individuals on Google/Linkedin?
And the number of such companies weren't 10 or 20 or 100 but 3000!
Say whaaaaat?
What if I tell you that I have a solution to your problem?
When my team told me that they were thinking to do this manually, I was taken aback! I knew the solution was creating a web-scraper but didn't have the bandwidth to actually do it during the weekdays. BUT I knew that doing this manually was never a solution so I enthusiastically spent 3 sleepless nights just to make it work and viola!
My approach:
First scrape for URL links from Google of the companies I had to find the information
Next, scrape the links that I obtain from Google (step 1) and obtain the overview information.
Wasn't as easy as creating the algorithm, honestly.
Ran into problems like:
Selenium package wasn't working properly
The object parsed from beautiful soup was not found (biggest issue that I faced)
Linkedin is smart (obviously duh!) - their HTML elements have dynamic IDs (which change at every refresh) so I couldn't call elements directly
Not finding the correct link to the profile, which then directed to LinkedIn home page which resulted in infinite scrolling mode
Finally, after 3 sleepless nights, I wrote the code which tries to address most of the exceptions placed on GitHub here
Getting Linkedin URLs from Google.py : This code basically scrapes linkedin URLs from Google (the first anchor tag which has "url" in it)
Scraping Linkedin Information from URLs.py: This code obtains information by visiting the links as found in step 1 and adding "about" to the links if vendors are organizations.
Check it out and please do share feedback! If you have any questions, don't forget to ask me in the comments :)
Comments