JMWNL configuration and use for EWN Lexical Resource

Index

EuroWordNet Parameters

These are the parameters that are in the conversion_properties_EWN.xml file:
sourcedir: directory of EuroWordNet original files
sourcefiles: list of EuroWordNet original files
verboseconversion: printout of the relations present in the parsed dictionary
encoding: the encodig used to read and write the resource (for most West European languages is ISO 8859-1)
dictionary_path: the directory where the converted resource is stored
index_path: the directory where the lucene index is stored
language_resource: the language of the resource
language_properties: the language of .properties file. Normally it should be en

This is the new parameter in the file_properties_EWN.xml not present in the original file_properties:
encoding: data encoding for most West European languages is ISO 8859-1

Convert the Resource

To generate EWN dictionary files for the italian resource use in a Windows shell the following command:
java -Xmx200M -cp .;jmwnl.jar it.uniroma2.art.jmwnl.ewn.conv.EWN2PrincetonFormatConverter conversion_properties_EWNItalian.xml

or launch the batch file with the parameter:
Conversion.bat conversion_properties_EWNItalian.xml

If you are using Linux you should run in a shell the following command:
java -Xmx200M -cp .:jmwnl.jar it.uniroma2.art.jmwnl.ewn.conv.EWN2PrincetonFormatConverter conversion_properties_EWNItalian.xml

or launch the file with the parameter:
Conversion.sh conversion_properties_EWNItalian.xml

To convert the resource of another language use the same command but specify the right conversion_properties_EWN file

Generate the Lucene Index

To generate the lucene index for the italian resource use in a Windows shell the following command:
java -Xmx200M -cp .;jmwnl.jar;lib\lucene-core-2.1.0.jar it.uniroma2.art.jmwnl.idx.LuceneIndexingUtility conversion_properties_EWNItalian.xml

or launch the file with the parameter:
Indexing.bat conversion_properties_EWNItalian.xml

If you are using Linux you should run in a shell the following command:
java -Xmx200M -cp .:jmwnl.jar:lib\lucene-core-2.1.0.jar it.uniroma2.art.jmwnl.idx.LuceneIndexingUtility conversion_properties_EWNItalian.xml

or launch batch file with the parameter:
Indexing.sh conversion_properties_EWNItalian.xml


To generate the lucene index for a resource of different language use the same command but specify the right conversion_properties_EWN file

Convert and Generate the Lucene Index

To convert and generate the lucene index for the italian resource use in a Windows shell:
Conv_Idx.bat conversion_properties_EWNItalian.xml

or in Linux:
Conv_Idx.sh conversion_properties_EWNItalian.xml
For a different language use the same command but specify the right conversion_properties_EWN file

Using the library

The library expect the resource in the WordNet format, so if you are using EuroWordNet remember to use the library to convert the Reosurce in WordNet format.

This library should be use in the same way as the original library is use (JWNL 1.4 rc2). The main difference is that one should call JMWNL.initialize instead of JWNL.initialize.

Original JWNL Documentation

For more information about JWNL please refer to the original documentation:

Notes

Notes on: file_properties_EWN and conversion_properties

Every language has 2 properties file (both xml files).
One (file_properties_EWN) has the information to use the library to access the selected resource and the other (conversion_properties_EWN) is used only during the conversion and the creation of the Lucene Indexes

To use the library you should edit (or create) the property file file_properties_EWNLANGUAGE.xml according to the language you wish to use
Eg. to use the italian resource edit file_properties_EWNItalian.xml

Notes on: format trasformation EWN to WN 2.1/3.0

The conversion and the creation of the indexes should take from less than a minute to about a couple of minutes each depending on the characteristics of the CPU