Name		Name	Last commit message	Last commit date
parent directory ..
bin		bin
indexes		indexes
info		info
intervals		intervals
results		results
src		src
visualize		visualize
Makefile		Makefile
README.md		README.md
index.config		index.config
sample.config		sample.config
test_case.config		test_case.config

README.md

Benchmarking operation `extract` on FM-indexes

Methodology

Explored dimensions:

text type
instance size (just adjust the test_case.config file for this)
suffix array sampling density
index implementations

Interval selection:

We use the methodology of Ferragina et al. (Section 5.4), i.e. 10,000 substrings of length 512 starting at random texts positions are extracted.

Directory structure

bin: Contains the executables of the project.
- build_idx_* generates indexes
- query_idx_* executes the extract experiments
- info_* outputs the space breakdown of an index.
- genintervals intervals generator
indexes: Contains the generated indexes.
intervals: Contains the generated intervals.
results: Contains the results of the experiments.
src: Contains the source code of the benchmark.
visualize: Contains a R-script which generates a report in LaTeX format.

Files included in this archive from the Pizza&Chili website:
- src/run_quries_sdsl.cpp is a adapted version of the Pizza&Chili file run_queries.c .
- src/genintervals.c is their interval generation program.

Prerequisites

For the visualization you need the following software:
- R with package tikzDevice. You can install the package by calling install.packages("filehash", repos="http://cran.r-project.org") and install.packages("tikzDevice", repos="http://R-Forge.R-project.org") in R.
- Compressors xz and gzip are used to get compression baselines.
- pdflatex to generate the pdf reports.
The construction of the 200MB indexes requires about 1GB of RAM.

Usage

make timing compiles the programs, downloads the 200MB Pizza&Chili test cases, builds the indexes, runs the performance tests, and generated a report located at visualize/extract.pdf. The raw numbers of the timings can be found in the results/all.txt. Indexes and temporary files are stored in the directory indexes and tmp. For the 5 x 200 MB of Pizza&Chili data the project will produce about 36 GB of additional data. On my machine (MacBookPro Retina 2.6GHz Intel Core i7, 16GB 1600 Mhz DDR3, SSD) the benchmark, triggerd by make timing, took about 2 hours (excluding the time to download the test instances). Have a look at the generated report.
All created indexes and test results can be deleted by calling make cleanall.

Customization of the benchmark

The project contains several configuration files:

index.config: Specify data structures' ID, sdsl-class and LaTeX-name for the report.
test_case.config: Specify test cases's ID, path, LaTeX-name for the report, and download URL.
sample.config: Specify samplings' ID, rate for SA, and rate for ISA.

Note that the benchmark will execute every combination of your choices.

Finally, the visualization can also be configured:

visualize/index-filter.config: Specify which indexes should be listed in the report and which style should be used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexing_extract

indexing_extract

README.md

Benchmarking operation `extract` on FM-indexes

Methodology

Directory structure

Prerequisites

Usage

Customization of the benchmark

Files

indexing_extract

Directory actions

More options

Directory actions

More options

Latest commit

History

indexing_extract

Folders and files

parent directory

README.md

Benchmarking operation extract on FM-indexes

Methodology

Directory structure

Prerequisites

Usage

Customization of the benchmark

Benchmarking operation `extract` on FM-indexes