The goal of this department-internal project is to build a model of context-aware indexing and filtering of Polish natural language documents.

The main effort in the first phase is put into identification and disambiguation of entities in free text. The categories of entities we consider five basic dimensions of document’s context, i.e.:

  1. location,
  2. organization,
  3. person,
  4. product and
  5. time constitute.

The second phase will result with a model of a multi-dimensional index of free text documents’ content.

The results of the project will be utilized by a system supporting

  1. real estates’ value estimation and
  2. information fusion for public relations research.

The figure shows the position of the Contexts in the first exemplary application.