Data Science Project from Scratch - #2

supriyamalla
Jan 21, 2022
1 min read

Data Collection

Now that we have aligned on building a job and salary data science project, it is time to scrape the data! Predict salary based on different features - that is our main goal.

We can go about getting the data via two routes:

Scraping using Beautiful Soup - goes through the HTML and lets you select certain elements
Scraping using Selenium - which is basically a bot which clicks on all the elements on the page and copies the data into a dataframe. It mimics a human user. This is more efficient and this is how we will scrape data.

Here's the code we can leverage to scrape data: https://towardsdatascience.com/selenium-tutorial-scraping-glassdoor-com-in-10-minutes-3d0915c6d905

I spent a LOT OF TIME in installing selenium. I just couldn't. Then it struck me - I just used the command "pip install selenium" in Anaconda prompt! (not the regular CMD/python shell)

Update: for some reason my web scraper didn't work. Instead, I have downloaded the data now. I am planning to buy a new laptop, hence will try that again.

https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_jobs.csv

- download this data set for the next step!

Data Science Project from Scratch - #2

Recent Posts

Comments

Subscribe Form