Workshop on web content exploration

Workshop for students focusing on the exploration and processing of online content were held at the Poznań University of Economics and Business. The session was led by Mateusz Kuczyński, a student at our university who combines his master’s studies in Informatics and Econometrics with work in the field of data exploration.

During the meeting, both theoretical and practical fundamentals necessary to start independent data acquisition from websites were discussed. The topics covered included downloading and processing data in HTML format, as well as best practices and potential challenges. Students learned how to make use of built-in analytical tools that enable monitoring the HTML structure, CSS styles, and network requests, to effectively identify elements for further processing. Methods of efficiently parsing websites, retrieving their content, and saving the obtained information in formats suitable for further data analysis — using Python libraries such as bs4 (BeautifulSoup), requests, and pandas — were also presented. Participants had the opportunity to observe each implementation step in real time, ask questions, and discuss potential issues related to data selection or technical constraints.

The workshop took place on December 19, 2024, and were organized by SRG “Data Science”.