Sensum

Internal Research Project

In general, semantic disambiguation is motivated by the fact that current language processing models are considerably affected by sparseness of training data, and current solutions, like class-based approaches, do not elicit appropriate information: the semantic nature and linguistic expressiveness of automatically derived word classes is unclear. Many of these limitations originate from the fact that fine-grained automatic sense disambiguation is not applicable on a large scale. In Sensum, a weakly supervised method for sense modeling (i.e. reduction of possible word senses in corpora according to their genre) and its application to a huge corpus, aiming to coarsely sense-disambiguate it, is studied. This can be viewed as an incremental step towards fine-grained sense disambiguation. The underlying sense catalogue adopted in the experiments is Wordnet. The created semantic repository as well as the developed techniques will be made available as resources for future work on language modeling, semantic acquisition for text extraction, question answering, summarization, and many other natural language processing tasks.

Involved People

Roberto Basili
Marco Cammisa
Fabio Massimo Zanzotto
Marco Pennacchiotti

Sensum Internal page >>>
Sensum Demo >>>

Specific Publications

Marco Cammisa, "Tecniche di CO-training per la Word Sense Disambiguation", Tesi di Laurea Specialistica in Ingegneria Informatica, A.A. 2002/2003, Universita' di Roma Tor Vergata
Louise Guthrie, Roberto Basili, Fabio Massimo Zanzotto et al. "Final Technical Report", WS'03 Baltimore July-August 2003
Roberto Basili, Marco Cammisa, Fabio Massimo Zanzotto "A semantic similarity measure for Semantic Tagging, LREC'04, Lisbon, Portugal, June, 2004
Roberto Basili, Marco Cammisa "Unsupervised Semantic Disambiguation", LREC04 Workshop on "Beyond named Entity Recognition: Semantic Labelling for NLP Tasks", Lisbon, June 2004.

Start Date: September 2003
Status: in progress

Roberto Basili's Home, 2005