Information retrieval deals with the representation, storage, organisation and access to information. The representation and organisation of the information items should provide the user with easy access to the information in which they are interested.
Project Mission
The general aim of SENECA project is to improve precision and recall of information retrieval such as: searching, document matching and classification or identification of web services. The improvement in relevancy can be achieved by using knowledge representation – thesauri, semantic networks and ontologies.
The availability of these structures is however limited, therefore one of our goals is to create them automatically by deriving concept hierarchies, word disambiguation and term clustering.
Effective text matching using semantic knowledge can be used in protecting corporate intellectual property, as for example plagiarism detection, filtering of corporate document stream or matching patent specifications.
Project Objectives
- Automated concept disambiguation
- Automated domain knowledge representation (ontology building) using web sources
- Evalution of ontologies on selected domain document corpora
- Using ontologies to improve effectivity of searching techniques
- Intellectual property protection and plagiarism detection
- Ontology application in classification of web services
- Identifying business processes to match appropriate web services