Indexing Dictionaries

To speed up dictionary lookups, Babbletower uses index files. You need to create an index file for each of your dictionaries. Babbletower comes with a command line tool for this task:

java -classpath babbletower.jar Index dictionary_file encoding index_file index_depth

Note: An index files needs to be placed into the same directory as its accompanying dictionary file, bearing the same name plus the ending .idx.

The meaning of the parameters:

dictionary_file Name of the dictionary file to index.
encoding text encoding of dictionary
index_file Name of the file into which to save the index. If this file exists it will be overwritten without warning!
index_depth The depth of the index. This is the maximum number of significant characters that the indexer will use when indexing words. For example, when using a depth of 4, the indexer 'looks' only at the first four letters of words, so the words conference and confederation would be put into the same index entry. This does however not mean that when looking up conference, you would also get confederation as a search result. Lookup results do not depend on the depth of an index. With the index depth you merely determine the space vs. time tradeoff of index files: A 'shallow' index is smaller, but also slower, while a 'deep' index is faster, but bigger.

For a dictionary that mostly carries words written with Latin letters, i.e. a small alphabet, a depth of 6 or 7 is recommended. For example, there are quite a few words in English starting with conf, so a depth of 4 could lead to long lookup times when searching for a word with a very common prefix. On the other hand, dictionaries that carry only words from a language with a large 'alphabet', a smaller depth may be sufficient. For a monolingual Japanese dictionary for example, a depth of 4 should be sufficient. However, the impact of the depth also depends on the size of the dictionary. For smaller ones, a shallow index may still be fast enough.

Back to top of manual