JMWNL configuration and use for EWN Lexical Resource
Index
- EuroWordNet Parameters
- Convert the Resource
- Generate the Lucene Index
- Using the library
- Original JWNL Documentation
- Notes
EuroWordNet Parameters
These are the parameters that are in the conversion_properties_EWN.xml file:
sourcedir: directory of EuroWordNet original files
sourcefiles: list of EuroWordNet original files
verboseconversion: printout of the relations present in the parsed dictionary
encoding: the encodig used to read and write the resource (for most West European languages is ISO 8859-1)
dictionary_path: the directory where the converted resource is stored
index_path: the directory where the lucene index is stored
language_resource: the language of the resource
language_properties: the language of .properties file. Normally it should be en
This is the new parameter in the file_properties_EWN.xml not present in the original file_properties:
encoding: data encoding for most West European languages is ISO 8859-1
Convert the Resource
To generate EWN dictionary files for the italian resource use in a Windows shell the following command:
java -Xmx200M -cp .;jmwnl.jar it.uniroma2.art.jmwnl.ewn.conv.EWN2PrincetonFormatConverter conversion_properties_EWNItalian.xml
or launch the batch file with the parameter:
Conversion.bat conversion_properties_EWNItalian.xml
If you are using Linux you should run in a shell the following command:
java -Xmx200M -cp .:jmwnl.jar it.uniroma2.art.jmwnl.ewn.conv.EWN2PrincetonFormatConverter conversion_properties_EWNItalian.xml
or launch the file with the parameter:
Conversion.sh conversion_properties_EWNItalian.xml
To convert the resource of another language use the same command but specify the right conversion_properties_EWN file
Generate the Lucene Index
To generate the lucene index for the italian resource use in a Windows shell the following command:
java -Xmx200M -cp .;jmwnl.jar;lib\lucene-core-2.1.0.jar it.uniroma2.art.jmwnl.idx.LuceneIndexingUtility conversion_properties_EWNItalian.xml
or launch the file with the parameter:
Indexing.bat conversion_properties_EWNItalian.xml
If you are using Linux you should run in a shell the following command:
java -Xmx200M -cp .:jmwnl.jar:lib\lucene-core-2.1.0.jar it.uniroma2.art.jmwnl.idx.LuceneIndexingUtility conversion_properties_EWNItalian.xml
or launch batch file with the parameter:
Indexing.sh conversion_properties_EWNItalian.xml
To generate the lucene index for a resource of different language use the same command but specify the right conversion_properties_EWN file
Convert and Generate the Lucene Index
To convert and generate the lucene index for the italian resource use in a Windows shell:Conv_Idx.bat conversion_properties_EWNItalian.xml
or in Linux:
Conv_Idx.sh conversion_properties_EWNItalian.xml
For a different language use the same command but specify the right conversion_properties_EWN file
Using the library
The library expect the resource in the WordNet format, so if you are using EuroWordNet remember to use the library to convert the Reosurce in WordNet format.
This library should be use in the same way as the original library is use (JWNL 1.4 rc2). The main difference is that one should call JMWNL.initialize instead of JWNL.initialize.
Original JWNL Documentation
For more information about JWNL please refer to the original documentation:
Notes
Notes on: file_properties_EWN and conversion_properties
Every language has 2 properties file (both xml files).
One (file_properties_EWN) has the information to use the library to access the selected resource and the other (conversion_properties_EWN) is used only during the conversion and the creation of the Lucene Indexes
To use the library you should edit (or create) the property file file_properties_EWNLANGUAGE.xml according to the language you wish to use
Eg. to use the italian resource edit file_properties_EWNItalian.xml
Notes on: format trasformation EWN to WN 2.1/3.0
The conversion and the creation of the indexes should take from less than a minute to about a couple of minutes each depending on the characteristics of the CPU