Importance of Big Data for Economic Analysis Part 2
This is the continuation of the part 1 that was published before this.
API or Application Programming Interface is another method of extracting data besides web scraping. It is different from web scraping because if web scraping scrapes data from the front-end or the HTML document of the website, API is usually issued by company to allows user to access their database directly. One example of API is Google Map API.
Google Map API offers the user service to use Google Map geolocation services that is useful when we are doing analysis that involves using precise mapping.
API can be free to use with limitation or needs payment, Google Map API require us to pay after we exceed data extraction quota.
Data Cleaning
The data we gathered from websites are often messy that it makes it difficult to gain insight from it. Not to mention the data may come in wrong data types for example “4” is listed as string data type instead of numerical data type like int-32. The data we collected may also contains empty rows of data that must be cleaned before we proceed to the analysis.
Python have a library called “Pandas” that enable us to open data from many types such as xlsx or even CSV. Pandas have several features that can be used to clean the data such as dropping unnecessary columns or deleting rows that contains empty value, it can even fill the empty value with value that we determine.
Web Scraping with Google Chrome Extension
Google Chrome has a good extension that can be used to scrape data from website, it is called “Instant Data Scraper” with this extension, it will automatically determine the area that we want to scrape and then after we adjust the extension for a little bit, it can run and starts to collect the data contained within the websites. Below is the example of using Instant Data Scraper to scrape Iphone products in Bukalapak.
The data that already gathered can be cleaned first by dropping the unnecessary columns such as product ID. Text data actually needs more complex data cleaning but in this case we just want to find out the price movement of Iphone in the marketplace.
The data can be visualized by using Google Data Studio which is a data visualization platform that is free to use provided by Google. The data that we collect before are imported into the Google Data Studio and we can use it to create dashboard for Iphone Data