Data Science Project from Scratch - #2
- supriyamalla
- Jan 21, 2022
- 1 min read
Data Collection
Now that we have aligned on building a job and salary data science project, it is time to scrape the data! Predict salary based on different features - that is our main goal.
We can go about getting the data via two routes:
Scraping using Beautiful Soup - goes through the HTML and lets you select certain elements
Scraping using Selenium - which is basically a bot which clicks on all the elements on the page and copies the data into a dataframe. It mimics a human user. This is more efficient and this is how we will scrape data.
Here's the code we can leverage to scrape data: https://towardsdatascience.com/selenium-tutorial-scraping-glassdoor-com-in-10-minutes-3d0915c6d905
I spent a LOT OF TIME in installing selenium. I just couldn't. Then it struck me - I just used the command "pip install selenium" in Anaconda prompt! (not the regular CMD/python shell)
Update: for some reason my web scraper didn't work. Instead, I have downloaded the data now. I am planning to buy a new laptop, hence will try that again.
- download this data set for the next step!
Comments