Important sources of information in different topics and language versions of Wikipedia

The scientific paper on automatic identification of important information sources on a specific topic in the multilingual Wikipedia based on analysis of more than 230 million references has been published on the Elsevier website. The study presents various models for automatic evaluation of information sources, which take into account the frequency of the information sources, the popularity of the content from readers and Wikipedia editors point of view.

Wikipedia articles have been divided into 70 topics with different levels of abstraction, covering areas such as: culture, geography, history, society, science, technology, engineering and mathematics. With information about references extracted from individual Wikipedia articles, it is possible to examine how well individual Wikipedia topics offer verifiable information in different language versions of Wikipedia. The figure below shows reference density values (RpA – References per Article) for each of 70 topics and 42 language versions of Wikipedia.

Additionallty scientific sources of information were identified. This allowed to determine the differences between the language versions of Wikipedia in terms of the value of the Sci score. For example, in the most developed English version of Wikipedia the share of scientific sources of information is about 2.6%, in the Polish version – 0.76%.

The results of the scientific research were presented at the KES 2022 conference. The publication is available at: doi.org/10.1016/j.procs.2022.09.387