The scientific article of our Department’s employees entitled “Sentence Embeddings and Semantic Entity Extraction for Identification of Topics of Short Fact-Checked Claims” has been published in open access. The paper describes an approach to assigning topics to claims verified by fact-checking agencies. Authors of the publication: Prof. Krzysztof Węcel, Marcin Sawiński, Dr. Włodzimierz Lewoniewski, Dr. Milena Stróżyna, Ewelina Księżniak, Prof. Witold Abramowicz.
In the digital age, misinformation and fake news present significant challenges. To effectively combat them, it’s crucial to accurately assign topics to claims that have been debunked by fact-checking agencies. Traditional classification methods often rely on simple categories that don’t capture the full context or complexity of issues.
A team of researchers from Department of Information Systems has developed a method leveraging advanced natural language processing techniques. By applying sentence embeddings, claims are transformed into numerical representations, enabling analysis and comparison. They also employed clustering methods like HDBSCAN, UMAP, and K-means to group similar claims together.
A key aspect is the extraction of semantic entities from the claims, which involves identifying and matching specific concepts and topics from knowledge bases like Wikidata, DBpedia, Wikipedia, and YAGO. This allows topics to be represented hierarchically, facilitating navigation and understanding of the relationships between them.
The method was evaluated by comparing the results with existing annotations from professional fact-checkers. The promising outcomes suggest that this approach can significantly improve the process of identifying and classifying topics in the context of fake news.
This research is supported by the project “OpenFact – artificial intelligence tools for verification of the veracity of information sources and fake news detection” (INFOSTRATEG-I/0035/2021-00), granted within the INFOSTRATEG I program of the National Center for Research and Development, under the topic: Verifying information sources and detecting fake news.