Repository for (almost) all your document search needs.

Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.

DocUServe is a public open source repository that enables researchers and developers to quickly experiment with various search engines (such as ElasticSearch, ChromaDB, Milvus, PrimeQA, FAISS) both in direct search and reranking scenarios. By using DocUVerse, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. DocUVerse is built on top of the Transformers, PrimeQA, and Elasticsearch toolkits and uses datasets and models that are directly downloadable.

Design

The following is a code snippet showing how to ingesting a new corpus (create an index for a specific engine), read the query file, run the search, compute the results and print them:

from docuverse import SearchEngine
engine = SearchEngine(config_or_path="data/clapnq_small/milvus-test.yaml")

# Read the ClapNQ dataset
data = engine.read_data() # or engine.read_data(engine.config.input_passages)
#Ingest the data
engine.ingest(data)

# Read the queries
queries = engine.read_questions() # or engine.read_questions(engine.config.input_queries)
# Run the retrieval
results = engine.search(queries)
# Evaluation and print the results
scores = engine.compute_score(queries, results)

# Print the evaluation results in a human-readable format.
print(f"Results:\n{scores}")

✔️ Getting Started

Installation

Installation doc

# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# Full install (editable)
pip install -e .[all]

# Install milvus and/or elastic dependencies, and the pyizumo library (if you have acecess to it)
pip install -r requirements-milvus.txt
pip install -r requirements-elastic.txt
pip install -r requirements_extra.txt

Please note that dependencies (specified in setup.py) are pinned to provide a stable experience. When installing from source these can be modified, however this is not officially supported.

🔭 Learn more (not yet working)

Section	Description
📒 Documentation	Start API documentation and tutorials
📓 Tutorials: Jupyter Notebooks	Notebooks to get started on QA tasks
🤗 Model sharing and uploading	Upload and share your fine-tuned models with the community
✅ Pull Request	PrimeQA Pull Request
📄 Generate Documentation	How Documentation works

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository for (almost) all your document search needs.

Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.

Design

✔️ Getting Started

Installation

🔭 Learn more (not yet working)

❤️ DocUVerse collaborators include: Sara Rosenthal, Parul Awasthy, Scott McCarley, Jatin Ganhotra, and Radu Florian.

Files

README.md

Latest commit

History

README.md

File metadata and controls

Repository for (almost) *all* your document search needs. Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.

Design

✔️ Getting Started

Installation

🔭 Learn more (not yet working)

❤️ DocUVerse collaborators include: Sara Rosenthal, Parul Awasthy, Scott McCarley, Jatin Ganhotra, and Radu Florian.

Repository for (almost) all your document search needs.

Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.