GitHub - CENDARI/dblookup: Create an elasticsearch index containing dbpedia entities from dbpedia dumps.

This python project allows to build an index entry in ElasticSearch containing all the entities useful somewhow to the Cendari project.

dbpedia has no decent web service for autocomplete so entering the data in a local elasticsearch database allows to easily autocomplete with all the information required locally.

Installation

Python 2.7 is required, as well as bzip2 and wget. From python, pip needs to be installed, as well as Fabric and virtualenv.

To create the environment, type:

fab setup

To download all the dbpedia dump files, type:

fab download_dbpedia

This might take time (one hour on a home network) and space (1Gb).

To compute the index file, type:

fab create_index

It should create a large compressed file called dbpedia-<date>.json.bz2 in around one hour depending on your machine.

To create the index in elasticsearch, run

./create_index.sh

To send the prepared data to elasticsearch, use the shell script:

./big_bulk_index.sh dbpedia-<date>.json.bz2

It will create a directory called split in the current directory (it should be in /tmp I guess), split the dump file in 1000 lines chunks, and send them all to elasticsearch on localhost. Configure the script of you want to change the index or host to send it to.

In the end, the split directory is kept for inspection. For all the files with strange names (e.g. xzcyg), there is the reply from elasticsearch names abc.out. The first thing visible in it is the error condition, which should be "errors":false.

Once you have inspected the files, you can get rid of the directory: rm -rf split

Keep the json dump file if you want to reinstall everything after a crash. Otherwise, it will take time to rebuild.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
big_bulk_index.sh		big_bulk_index.sh
bulk_index.sh		bulk_index.sh
create_index.sh		create_index.sh
dbpedia-2016-02-21T09-40-30.json.bz2		dbpedia-2016-02-21T09-40-30.json.bz2
delete_dbpedia.py		delete_dbpedia.py
delete_index.sh		delete_index.sh
fabfile.py		fabfile.py
mapping.json		mapping.json
ontology.py		ontology.py
populate.py		populate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

About

Releases

Packages

Contributors 2

Languages

License

CENDARI/dblookup

Folders and files

Latest commit

History

Repository files navigation

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages