Web Scraping Workshop - Tools & Techniques [Recap]

Code for Ghana organized a Data Scraping Workshop on 29th August at the Mobile Web Ghana lab. The workshop was facilitated by David Selassie Opoku, a fellow at School of Data and the Meltwater Entrepreneurial School of Technology (MEST). He has a vast experience in the field of data cleaning and analysis. The purpose of the event was to introduce the participants to the various tools and techniques available for web scraping.

The participants of the workshop were of different backgrounds (journalism, statistics, engineering, software development). They were all aspiring to gain mastery over web scraping tools.

Adams Agalic, from the Code for Ghana team gave a brief introductory speech. He spoke about the Code for Ghana project and the impact it is making. Also in his presentation, he made mention of one of the most important aims of the project, which is to establish a data-journalism culture in the country. This necessitates the need to build capacity, hence the various workshops organized by Code for Ghana. The last one we had was the Openstreetmap workshop. Our aim is to equip ordinary citizens with knowledge of simple tools that would help improve data-consciousness in the country. The more conscious we are about the data that is produced as a result of events and trends in the country, the more likely we are to take those data and represent them in reusable and understandable forms. This is the very essence of the Code for Ghana project and other Open Data projects in the country and beyond. In the end, we seek to stimulate an active interest in the activities of the government and other public institutions so as to curb corruption and demand the best service delivery.

Before his session, David stated categorically that ‘web scraping is not scary’. In the facilitator’s bid to erase the myths surrounding web scraping, he made this statement. Most people assume it can only take a mastery of a particular computer language to be able to scrape data. This is not entirely true, there are some simple online tools that can be used to scrape data from the net and other platforms.

David took participants through the technical use of some web a scraping tools. First he taught them how to scrape data from PDF files with the use of a simple tool called Tabula. It was not a complicated procedure as many of the participants got it the first at their first attempt. Those that needed some help raised their hands to get the attention of the facilitator - and he lent a helping hand. Secondly, he ran the participants through the process of scraping data from wikipedia using the webscraper.io platform. It is not a complicated tool as well. Time was not a friend, we couldn’t get enough of it to do some of the other interesting techniques David had prepared for the workshop.

At the end of the day, the participants were grateful for the learning experience. Most of them requested for the slides of the facilitator’s presentation and also that another workshop be organized sometime soon.

You can find the slides here.

Back to blog