GitHub - wizrds/haystack: :mag: Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.

What to build with Haystack

Ask questions in natural language and find granular answers in your documents.
Perform semantic search and retrieve documents according to meaning, not keywords
Use off-the-shelf models or fine-tune them to your domain.
Use user feedback to evaluate, benchmark, and continuously improve your live models.
Leverage existing knowledge bases and better handle the long tail of queries that chatbots receive.
Automate processes by automatically applying a list of questions to new documents and using the extracted answers.

Core Features

Latest models: Utilize all latest transformer-based models (e.g., BERT, RoBERTa, MiniLM) for extractive QA, generative QA, and document retrieval.
Modular: Multiple choices to fit your tech stack and use case. Pick your favorite database, file converter, or modeling framework.
Pipelines: The Node and Pipeline design of Haystack allows for custom routing of queries to only the relevant components.
Open: 100% compatible with HuggingFace's model hub. Tight interfaces to other frameworks (e.g., Transformers, FARM, sentence-transformers)
Scalable: Scale to millions of docs via retrievers, production-ready backends like Elasticsearch / FAISS, and a fastAPI REST API
End-to-End: All tooling in one place: file conversion, cleaning, splitting, training, eval, inference, labeling, etc.
Developer friendly: Easy to debug, extend and modify.
Customizable: Fine-tune models to your domain or implement your custom DocumentStore.
Continuous Learning: Collect new training data via user feedback in production & improve your models continuously


📒 Docs	Overview, Components, Guides, API documentation
💾 Installation	How to install Haystack
🎓 Tutorials	See what Haystack can do with our Notebooks & Scripts
🔰 Quick Demo	Deploy a Haystack application with Docker Compose and a REST API
🖖 Community	Slack, Twitter, Stack Overflow, GitHub Discussions
❤️ Contributing	We welcome all contributions!
📊 Benchmarks	Speed & Accuracy of Retriever, Readers and DocumentStores
🔭 Roadmap	Public roadmap of Haystack
📰 Blog	Read our articles on Medium
☎️ Jobs	We're hiring! Have a look at our open positions

💾 Installation

If you're interested in learning more about Haystack and using it as part of your application, we offer several options.

1. Installing from a package

You can install Haystack by using pip.

    pip3 install farm-haystack

Please check our page on PyPi for more information.

2. Installing from GitHub

You can also clone it from GitHub — in case you'd like to work with the master branch and check the latest features:

    git clone https://github.com/deepset-ai/haystack.git
    cd haystack
    pip install --editable .

To update your installation, do a git pull. The --editable flag will update changes immediately.

Note that this command will install the base version of the package, which includes only the Elasticsearch document store and the most commonly used components.

For a complete installation that includes all optional components, please run instead:

    git clone https://github.com/deepset-ai/haystack.git
    cd haystack
    pip install --upgrade pip
    pip install --editable .[all]   # or 'all-gpu' to get the GPU-enabled dependencies

Do not forget to upgrade pip before performing the installation: pip version below 21.3.1 might enter infinite loops due to a bug. If you encounter such loop, either upgrade pip or replace [all] with [docstores,crawler,preprocessing,ocr,ray,rest,ui,dev,onnx].

For an complete list of the dependency groups available, have a look at the setup.cfg file.

3. Installing on Windows

On Windows, you might need:

    pip install farm-haystack -f https://download.pytorch.org/whl/torch_stable.html

🎓 Tutorials

Follow our introductory tutorial to setup a question answering system using Python and start performing queries! Explore the rest of our tutorials to learn how to tweak pipelines, train models and perform evaluation.

Tutorial 1 - Basic QA Pipeline: Jupyter notebook | Colab | Python
Tutorial 2 - Fine-tuning a model on own data: Jupyter notebook | Colab | Python
Tutorial 3 - Basic QA Pipeline without Elasticsearch: Jupyter notebook | Colab | Python
Tutorial 4 - FAQ-style QA: Jupyter notebook | Colab | Python
Tutorial 5 - Evaluation of the whole QA-Pipeline: Jupyter noteboook | Colab | Python
Tutorial 6 - Better Retrievers via "Dense Passage Retrieval": Jupyter noteboook | Colab | Python
Tutorial 7 - Generative QA via "Retrieval-Augmented Generation": Jupyter noteboook | Colab | Python
Tutorial 8 - Preprocessing: Jupyter noteboook | Colab | Python
Tutorial 9 - DPR Training: Jupyter noteboook | Colab | Python
Tutorial 10 - Knowledge Graph: Jupyter noteboook | Colab | Python
Tutorial 11 - Pipelines: Jupyter noteboook | Colab | Python
Tutorial 12 - Long-Form Question Answering: Jupyter noteboook | Colab | Python
Tutorial 13 - Question Generation: Jupyter noteboook | Colab | Python
Tutorial 14 - Query Classifier: Jupyter noteboook | Colab | Python
Tutorial 15 - TableQA: Jupyter noteboook | Colab | Python

🔰 Quick Demo

Hosted

Try out our hosted Explore The World live demo here! Ask any question on countries or capital cities and let Haystack return the answers to you.

Local

Start up a Haystack service via Docker Compose. With this you can begin calling it directly via the REST API or even interact with it using the included Streamlit UI.

Click here for a step-by-step guide

1. Update/install Docker and Docker Compose, then launch Docker

    apt-get update && apt-get install docker && apt-get install docker-compose
    service docker start

2. Clone Haystack repository

    git clone https://github.com/deepset-ai/haystack.git

3. Pull images & launch demo app

    cd haystack
    docker-compose pull
    docker-compose up
    
    # Or on a GPU machine: docker-compose -f docker-compose-gpu.yml up

You should be able to see the following in your terminal window as part of the log output:

..
ui_1             |   You can now view your Streamlit app in your browser.
..
ui_1             |   External URL: http://192.168.108.218:8501
..
haystack-api_1   | [2021-01-01 10:21:58 +0000] [17] [INFO] Application startup complete.

4. Open the Streamlit UI for Haystack by pointing your browser to the "External URL" from above.

You should see the following:

You can then try different queries against a pre-defined set of indexed articles related to Game of Thrones.

Note: The following containers are started as a part of this demo:

Haystack API: listens on port 8000
DocumentStore (Elasticsearch): listens on port 9200
Streamlit UI: listens on port 8501

Please note that the demo will publish the container ports to the outside world. We suggest that you review the firewall settings depending on your system setup and the security guidelines.

🖖 Community

There is a very vibrant and active community around Haystack which we are regularly interacting with! If you have a feature request or a bug report, feel free to open an issue in Github. We regularly check these and you can expect a quick response. If you'd like to discuss a topic, or get more general advice on how to make Haystack work for your project, you can start a thread in Github Discussions or our Slack channel. We also check Twitter and Stack Overflow.

❤️ Contributing

We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. To learn how to get started, check out our Contributor Guidelines first. You can also find instructions to run the tests locally there.

Thanks so much to all those who have contributed to our project!

Name		Name	Last commit message	Last commit date
Latest commit History 1,056 Commits
.github		.github
annotation_tool		annotation_tool
docs		docs
haystack		haystack
json-schemas		json-schemas
rest_api		rest_api
test		test
tutorials		tutorials
ui		ui
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile-GPU		Dockerfile-GPU
LICENSE		LICENSE
README.md		README.md
VERSION.txt		VERSION.txt
code_of_conduct.txt		code_of_conduct.txt
docker-compose-gpu.yml		docker-compose-gpu.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What to build with Haystack

Core Features

💾 Installation

🎓 Tutorials

🔰 Quick Demo

🖖 Community

❤️ Contributing

About

Releases

Packages

Languages

License

wizrds/haystack

Folders and files

Latest commit

History

Repository files navigation

What to build with Haystack

Core Features

💾 Installation

🎓 Tutorials

🔰 Quick Demo

🖖 Community

❤️ Contributing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages