Skip to content

Latest commit

 

History

History
101 lines (76 loc) · 4.94 KB

README.md

File metadata and controls

101 lines (76 loc) · 4.94 KB

primeqa

Repository for (almost) *all* your document search needs.

Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.

DocUServe is a public open source repository that enables researchers and developers to quickly experiment with various search engines (such as ElasticSearch, ChromaDB, Milvus, PrimeQA, FAISS) both in direct search and reranking scenarios. By using DocUVerse, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. DocUVerse is built on top of the Transformers, PrimeQA, and Elasticsearch toolkits and uses datasets and models that are directly downloadable.

Design

The following is a code snippet showing how to ingesting a new corpus (create an index for a specific engine), read the query file, run the search, compute the results and print them:

from docuverse import SearchEngine
engine = SearchEngine(config_or_path="data/clapnq_small/milvus-test.yaml")

# Read the ClapNQ dataset
data = engine.read_data() # or engine.read_data(engine.config.input_passages)
#Ingest the data
engine.ingest(data)

# Read the queries
queries = engine.read_questions() # or engine.read_questions(engine.config.input_queries)
# Run the retrieval
results = engine.search(queries)
# Evaluation and print the results
scores = engine.compute_score(queries, results)

# Print the evaluation results in a human-readable format.
print(f"Results:\n{scores}")

✔️ Getting Started

Installation

Installation doc

# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# Full install (editable)
pip install -e .[all]

# Install milvus and/or elastic dependencies, and the pyizumo library (if you have acecess to it)
pip install -r requirements-milvus.txt
pip install -r requirements-elastic.txt
pip install -r requirements_extra.txt

Please note that dependencies (specified in setup.py) are pinned to provide a stable experience. When installing from source these can be modified, however this is not officially supported.

🔭 Learn more (not yet working)

Section Description
📒 Documentation Start API documentation and tutorials
📓 Tutorials: Jupyter Notebooks Notebooks to get started on QA tasks
🤗 Model sharing and uploading Upload and share your fine-tuned models with the community
Pull Request PrimeQA Pull Request
📄 Generate Documentation How Documentation works

❤️ DocUVerse collaborators include: Sara Rosenthal, Parul Awasthy, Scott McCarley, Jatin Ganhotra, and Radu Florian.