top of page

Data Science Project from Scratch - #2

  • Writer: supriyamalla
    supriyamalla
  • Jan 21, 2022
  • 1 min read

Data Collection


Now that we have aligned on building a job and salary data science project, it is time to scrape the data! Predict salary based on different features - that is our main goal.


We can go about getting the data via two routes:

  1. Scraping using Beautiful Soup - goes through the HTML and lets you select certain elements

  2. Scraping using Selenium - which is basically a bot which clicks on all the elements on the page and copies the data into a dataframe. It mimics a human user. This is more efficient and this is how we will scrape data.



I spent a LOT OF TIME in installing selenium. I just couldn't. Then it struck me - I just used the command "pip install selenium" in Anaconda prompt! (not the regular CMD/python shell)


Update: for some reason my web scraper didn't work. Instead, I have downloaded the data now. I am planning to buy a new laptop, hence will try that again.



- download this data set for the next step!






Comments


Post: Blog2 Post

Subscribe Form

Thanks for submitting!

©2020 by Learn Data Science with me. Proudly created with Wix.com

bottom of page