View on GitHub

ESAlib

implementation of Explicit Semantic Analysis in Java using SQLite/MySQL

Download this project as a .zip file Download this project as a tar.gz file

What is this good for?

When you need to find out how much semantically similar two pieces of text are (no matter if just two words, or two articles).

ESA is currently the state-of-the-art method for comparing semantic similarity of texts.

So how to get ESA running in 2 minutes for English?

  1. Download the repo with the pre-built jar and database

    $ git clone https://github.com/ticcky/esalib.git
    $ cd esalib
    
  2. Create a symbolic link to the sample database

    $ ln -s example/esa_en.db esa_db.db
    
  3. Get relatedness estimate of two texts:

    $ ./run_analyzer "computer" "apple"
    

WARNING!

The tool is verified to yield good results (meaning correlation with human judgement as reported in the original ESA paper) with the provided prebuilt English Wikipedia ESA background from 2005. I have not had success building the ESA background from the recent dumps of Wikipedia. Please let me know if you manage.

What to do if you are struggling with something?

Just email me for help ;)

Problems and solutions

Occasionally I get some questions about ESAlib on email, and in this section I publish the ones that I thought other ESAlib users might find useful.

Is possible to get as a output also the ESA vectors of each text and not only the relatedness estimate between them?

Yes, it is. Look into src/clldsystem/esa/IConceptVector.java which is the interface for the object that represents the ESA vector. Its instance is returned by the getVector method in src/clldsystem/esa/ESAAnalyzer.java, which get ESA vector fro the given piece of text. getVector is also used internally by the similarity calculator. With the default ESA background that I provided, the dimensions correspond to the pageId's on wikipedia (i.e. dimension # 11400 -> http://en.wikipedia.org/wiki/?curid=11400).

Assertion failed: (jBlob) ....

Assertion failed: (jBlob), function Java_org_sqlite_NativeDB_column_1blob, file ../src/main/java/org/sqlite/NativeDB.c, line 513. ./run_analyzer: line 7: 4119 Abort trap: 6 java -cp lib/*:esalib.jar clldsystem.esa.ESAAnalyzer "$1" "$2"

Check that the directory with the background (.db) resides in a directory where you have writing permissions.

Credits

Source code is partially based on the original implementation of Evgeniy Gabrilovich.

I was working under supervision of Petr Knoth.