Companies in Multilingual Wikipedia: Articles Quality and Important Sources of Information

The scientific work of members of our Department was published in the monograph “Information Technology for Management: Approaches to Improving Business and Society” published by the Springer. The research concerns the automatic assessment of the quality of Wikipedia articles and the reliability of sources of information about companies in different languages.

As part of this work, over half a million Wikipedia articles on companies in 310 languages were identified. For this purpose, data from DBpedia and Wikidata were used. An extraction and analysis of references contained in these articles was then carried out in terms of important sources of information. Three different models were used to assess the sources. This allowed for the ranking of the most relevant sources of information on companies for each language version of Wikipedia.

Publication “Companies in Multilingual Wikipedia: Articles Quality and Important Sources of Information” is available on the Springer website. Authors of the publication: Włodzimierz Lewoniewski, Krzysztof Węcel, Witold Abramowicz.

Quality of Wikipedia articles and sources of information

The open nature of Wikipedia allows anyone to add and edit content. This has its advantages and disadvantages. The advantages include the ability to quickly edit articles, thanks to which Wikipedia is able to react quickly to current events and provide information on new topics almost as soon as they appear. Free editing allows you to create articles on a variety of topics, including those that may not be covered in traditional encyclopedias. However, due to the open nature of Wikipedia, there is a risk that some information may be untrue, biased or misleading. Additionally, the quality of Wikipedia articles can vary greatly. Some are well-written, based on reliable sources, while others may be incomplete, out of date, subjective, or poorly worded. That is why assessing the quality of information and verifying information sources is so important.

Checking and evaluating the quality of information along with evaluating sources can not only help maintain high-quality content on Wikipedia, but can also be crucial for companies that want to effectively manage their image and public relations. The approach presented in the publication can also help Wikipedia volunteer editors in selecting articles that need to be refined. In addition, the presented models for assessing the reliability of sources can indicate websites that provide valuable information about companies.

DBpedia and Wikidata

Semantic databases, such as Wikidata and DBpedia, provide a wide range of possibilities, especially for people involved in data analysis, scientific research, artificial intelligence, and data-driven application development. Semantic databases make it easy to link data from different sources. This makes it easy to combine information from different domains and create rich new data sets. Additionally, thanks to the semantic relationships between the data, these databases are able to better understand the context of the data. For example, if two objects have a “parent” relation, the database “understands” that there is some kind of relationship between the objects.

DBpedia, Wikidata, and other semantic databases are open to the public and contain data on a variety of topics. Sush resources are an extremely valuable source of information for researchers and application developers. It is worth noting that semantic databases are extremely useful in the field of artificial intelligence, especially in the context of machine learning and natural language processing. They can be used to train models, create recommendation systems, recognize entities, answer questions, and even create chatbots.

OpenFact

This research is supported by the project “OpenFact – artificial intelligence tools for verification of the veracity of information sources and fake news detection” (INFOSTRATEG-I/0035/2021-00), granted within the INFOSTRATEG I program of the National Center for Research and Development, under the topic: Verifying information sources and detecting fake news.