Wikipedia, being a widely available source of information in the digital era, attaches great importance to the verifiability of its content, which is fundamental to its credibility and trust. The platform’s verifiability rules require that all information, especially controversial or controversial information, be supported by credible, published sources. This ensures that the content in Wikipedia articles is not based on personal opinion or original research. However, the subjective nature of the concept of credibility and the dependence of the assessment on many factors (including language version or topic) may create a certain problem for users editing Wikipedia in terms of selecting appropriate sources of information.
With the huge number of websites (currently over a billion), individually assessing the credibility of each source becomes a challenge for Wikipedia users. Although there are detailed guidelines in various language versions of Wikipedia that define what reliable sources are, there is no comprehensive list of websites or other sources of information that can be considered reliable in the context of the various topics covered on Wikipedia. Additionally, the credibility and reputation of websites may change over time, and evaluation criteria may vary depending on the language version of Wikipedia or the topic area, which requires regular updates of such lists. For this reason, a comprehensive and constantly updated list of reliable sources would be very helpful not only to Wikipedia editors, but also to its readers who are looking for accurate and reliable information.
Based on the analysis of over 60 million articles on Wikipedia, it is possible to extract information about over 330 million references (footnotes with information sources). This allowed the identification of the best information sources of Wikipedia using different assessment models. The table below shows the results of references extraction for selected language versions and the number of unique websites in October 2023:
Wiki | Language Version | Number of Articles | Number of References | Unique Websites |
---|---|---|---|---|
ar | Arabic | 1,219,168 | 6,355,164 | 294,089 |
ca | Catalan | 735,551 | 3,895,389 | 197,470 |
cs | Czech | 532,602 | 2,752,877 | 119,313 |
de | German | 2,839,878 | 14,473,501 | 622,551 |
en | English | 6,722,214 | 79,687,819 | 1,942,579 |
es | Spanish | 1,833,749 | 12,558,623 | 509,313 |
fa | Persian | 975,931 | 2,477,763 | 133,634 |
fi | Finnish | 559,931 | 3,371,084 | 138,320 |
fr | French | 2,557,559 | 19,455,752 | 576,523 |
he | Hebrew | 342,285 | 1,867,068 | 103,848 |
hi | Hindi | 162,954 | 496,057 | 47,617 |
hu | Hungarian | 530,977 | 2,545,152 | 124,536 |
id | Indonesian | 661,844 | 2,672,604 | 162,924 |
it | Italian | 1,829,095 | 8,856,574 | 278,232 |
ja | Japanese | 1,388,532 | 14,684,917 | 359,446 |
ko | Korean | 646,717 | 1,885,878 | 91,918 |
nl | Dutch | 2,133,536 | 3,010,002 | 112,318 |
no | Norwegian | 616,624 | 2,102,507 | 107,343 |
pl | Polish | 1,583,919 | 8,847,928 | 242,835 |
pt | Portuguese | 1,110,209 | 7,692,600 | 319,534 |
ru | Russian | 1,940,113 | 15,461,960 | 454,351 |
sv | Swedish | 2,572,575 | 11,791,609 | 134,081 |
th | Thai | 158,905 | 1,010,438 | 70,395 |
tr | Turkish | 533,201 | 2,773,455 | 146,854 |
uk | Ukrainian | 1,289,727 | 5,455,954 | 217,787 |
vi | Vietnamese | 1,288,093 | 3,796,577 | 147,041 |
zh | Chinese | 1,379,496 | 8,130,187 | 283,516 |
During the webinar, Dr. Włodzimierz Lewoniewski presented the possibilities of identifying and automatically assessing the importance of information sources of Wikipedia articles from different language versions. As part of the practical part, some of the capabilities of the BestRef tool were shown, which contains information about the results of the evaluation of millions of Internet sources in Wikipedia articles from the point of view of individual language versions.
The webinar took place on November 23, 2023. The organizer of the event is the Wikimedia Polska, which supports and promotes Wikipedia and its sister projects (such as Wikidata, Wiktionary, Wikinews, Wikisource and others).
More information about research on the analysis of information sources on Wikipedia can be found in scientific publications:
- Companies in Multilingual Wikipedia: Articles Quality and Important Sources of Information (2023)
- Identification of Important Web Sources of Information on Wikipedia across various Topics and Languages (2022)
- Reliability in Time: Evaluating the Web Sources of Information on COVID-19 in Wikipedia across Various Language Editions from the Beginning of the Pandemic (2022)
- Identifying Reliable Sources of Information about Companies in Multilingual Wikipedia (2022)
- Modeling Popularity and Reliability of Sources in Multilingual Wikipedia (2020)