Identifying Reliable Sources of Information about Companies in Multilingual Wikipedia

The IEEE published a paper on the automatic identification of reliable sources of information about companies in the multilingual Wikipedia. The information source assessment models presented in the publication can help Internet users find valuable sources of information about companies using open data from Wikipedia, DBpedia and Wikidata.

Authors of the publication: Włodzimierz Lewoniewski, Krzysztof Węcel, Witold Abramowicz. The research findings were presented at the FedCSIS 2022 conference.

In order to select Wikipedia articles about companies, data from various semantic knowledge bases were used – DBpedia[1] and Wikidata[2].

The most commonly used values[3] for property P31 (“instance of”) in Wikidata items related to one or more Wikipedia articles. Source: own calculations in 2022.

Wikidata – a semantic knowledge base that works on a similar basis to Wikipedia, with one significant difference – here we can put facts about objects using statements with properties and values (not sentences in natural language). Each Wikidata item contains a collection of different statements arranged in the form “Subject-Predicate-Object”. For example, information about the Poznań University of Economics and Business can be found on a separate page in Wikidata:

Wikidata is also considered to be the central data management platform for Wikipedia and most of its sister projects. This means that via Wikidata, we can find links to Wikipedia articles in different languages describing the same object. Thus, having a list of Wikidata items of a certain type (e.g. companies), we can also find corresponding Wikipedia article names.

Currently Wikidata has over 100 million items[4] (described subjects), while the number of Wikipedia articles in all languages is around 60 million[5]. Thus, not every Wikidata item needs to refer to a separate Wikipedia article on a specific topic.

The most used classes from the DBpedia ontology[6]. Source: own calculations in 2022.

DBpedia – is a semantic knowledge base that is automatically enriched using structured information from Wikipedia articles in different languages. The acquired knowledge on a given topic is available on a separate page. For example, such semantic data on the Poznań University of Economics and Business as a DBpedia resource extracted from the English Wikipedia[7] can be found at:

On such DBpedia pages, among the various properties, we can also find information about the type(s) of the described object. For our example, DBpedia indicates that the object belongs to such classes as: dbo:Organisation, dbo:EducationalInstitution, dbo:University and others. Having the names of the classes we are interested in, we can find all objects of a certain type within DBpedia.

The scientific publication can be found on the IEEE and the ACSIS website.

Notes

  1. DBpedia website: www.dbpedia.org
  2. Wikidata website: www.wikidata.org
  3. The following values have been excluded within the given chart (word cloud): Q4167410 (“Wikimedia disambiguation page”), Q13406463 (“Wikimedia list article”), Q22808320 (“Wikimedia human name disambiguation page”), Q18340514 (“events in a specific year or time period”)
  4. Wikidata statistics: www.wikidata.org/wiki/Special:Statistics
  5. List of language versions of Wikipedia: meta.wikimedia.org/wiki/List_of_Wikipedias
  6. More information on DBpedia ontology can be found at: dbpedia.org/resources/ontology/
  7. An article of English Wikipedia about the Poznań University of Economics and Business is available at: en.wikipedia.org/wiki/Poznań_University_of_Economics_and_Business