This is an example project of the SIGIR 2016 tutorial Succinct Data Structures in Information Retrieval: Theory and Practice presented by Simon Gog and Rossano Venturini.
The example shows how the Succinct Data Structure Library can be used to implement a space-efficient top-k query completion system. The final result is an almost state-of-the-art system which is implemented in less than 300 lines of code.
Here is an example of our final system. The index is built over titles and click counts of Wikipedia pages.
./install.sh
cd build
cmake ..
make
CMake will parse the index.config
file and generate
binaries for each index. The index name will be the prefix
of the corresponding executables.
./index1-main ../data/stops_nl.txt
The binary will generate an index and wait for user input
and answer queries (one per line) interactively. The
index is stored in ../data/stops_nl.txt.index1.sdsl
and
a visualization of its memory consumption is available
at stops_nl.txt.index1.html. In general, each
executable IDX-*
will store the generated index
at file.IDX.sdsl
and its space visualization at
file.IDX.html
.
./index1-webserver ../data/stops_nl.txt 8000
The binary will generate an index and start a webserver which will listen to the specified port.
- Change into the
build
directory - Download the Wikipedia titles by calling
make download
- Build the executable by calling
make index4ci-webserver
- Generate the index and start the webserver by calling
./index4ci-webserver ../data/enwiki-20160601-all-titles
- You can access the demo at http://127.0.0.1:8000
-
Thanks to Sascha Witt for preparing the example input file which contains the pairs of Dutch train stations and number of daily train stops.
-
Thanks to all contributers to the SDSL project.