Skip to content

Commit 748ada7

Browse files
authored
add streaming example code (langchain-ai#11)
* add streaming example code * cleanup * add gif to readme * update readme * update readme * update readme * consolidate * consolidate * fix readme * address comments * format * update requirements
1 parent c2b10c3 commit 748ada7

18 files changed

+635
-106
lines changed

.gitignore

+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
130+
131+
# JetBrains
132+
.idea
133+
134+
*.db
135+
136+
.DS_Store
137+
138+
vectorstore.pkl
139+
langchain.readthedocs.io/

Makefile

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.PHONY: start
2+
start:
3+
uvicorn main:app --reload --port 9000
4+
5+
.PHONY: format
6+
format:
7+
black .
8+
isort .

README.md

+18-10
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,26 @@
1-
# ChatLangChain
1+
# 🦜️🔗 ChatLangChain
22

3-
This repo is an implementation of a chatbot specifically focused on question answering over the [LangChain documentation](https://langchain.readthedocs.io/en/latest/).
3+
This repo is an implementation of a locally hosted chatbot specifically focused on question answering over the [LangChain documentation](https://langchain.readthedocs.io/en/latest/).
4+
Built with [LangChain](https://github.com/hwchase17/langchain/) and [FastAPI](https://fastapi.tiangolo.com/).
5+
6+
The app leverages LangChain's streaming support and async API to update the page in real time for multiple users.
7+
8+
## ✅ To run:
9+
1. Install dependencies: `pip install -r requirements.txt`
10+
1. Run `ingest.sh` to ingest LangChain docs data into the vectorstore (only needs to be done once).
11+
1. You can use other [Document Loaders](https://langchain.readthedocs.io/en/latest/modules/document_loaders.html) to load your own data into the vectorstore.
12+
1. Run the app: `make start`
13+
1. To enable tracing, make sure `langchain-server` is running locally and pass `tracing=True` to `get_chain` in `main.py`.
14+
1. Open [localhost:9000](http://localhost:9000) in your browser.
415

516
## 🚀 Important Links
617

7-
Website: [chat.langchain.dev](https://chat.langchain.dev)
18+
Deployed version (to be updated soon): [chat.langchain.dev](https://chat.langchain.dev)
819

9-
Hugging Face Space: [huggingface.co/spaces/hwchase17/chat-langchain](https://huggingface.co/spaces/hwchase17/chat-langchain)
20+
Hugging Face Space (to be updated soon): [huggingface.co/spaces/hwchase17/chat-langchain](https://huggingface.co/spaces/hwchase17/chat-langchain)
1021

11-
Blog Post: [blog.langchain.dev/langchain-chat/](https://blog.langchain.dev/langchain-chat/)
22+
Blog Posts:
23+
* [blog.langchain.dev/langchain-chat/](https://blog.langchain.dev/langchain-chat/)
1224

1325
## 📚 Technical description
1426

@@ -21,12 +33,8 @@ Ingestion has the following steps:
2133
3. Split documents with LangChain's [TextSplitter](https://langchain.readthedocs.io/en/latest/modules/utils/combine_docs_examples/textsplitter.html)
2234
4. Create a vectorstore of embeddings, using LangChain's [vectorstore wrapper](https://langchain.readthedocs.io/en/latest/modules/utils/combine_docs_examples/vectorstores.html) (with OpenAI's embeddings and Weaviate's vectorstore).
2335

24-
Question-Answering has the following steps:
36+
Question-Answering has the following steps, all handled by [ChatVectorDBChain](https://langchain.readthedocs.io/en/latest/modules/chains/combine_docs_examples/chat_vector_db.html):
2537

2638
1. Given the chat history and new user input, determine what a standalone question would be (using GPT-3).
2739
2. Given that standalone question, look up relevant documents from the vectorstore.
2840
3. Pass the standalone question and relevant documents to GPT-3 to generate a final answer.
29-
30-
## 🧠 How to Extend to your documentation?
31-
32-
Coming soon.

app.py renamed to archive/app.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,8 @@
44
import gradio as gr
55
import langchain
66
import weaviate
7-
from langchain.vectorstores import Weaviate
8-
97
from chain import get_new_chain1
8+
from langchain.vectorstores import Weaviate
109

1110
WEAVIATE_URL = os.environ["WEAVIATE_URL"]
1211

chain.py renamed to archive/chain.py

-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919

2020

2121
class CustomChain(Chain, BaseModel):
22-
2322
vstore: Weaviate
2423
chain: BaseCombineDocumentsChain
2524
key_word_extractor: Chain

archive/ingest.py

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
"""Load html from files, clean up, split, ingest into Weaviate."""
2+
import os
3+
from pathlib import Path
4+
5+
import weaviate
6+
from bs4 import BeautifulSoup
7+
from langchain.text_splitter import CharacterTextSplitter
8+
9+
10+
def clean_data(data):
11+
soup = BeautifulSoup(data)
12+
text = soup.find_all("main", {"id": "main-content"})[0].get_text()
13+
return "\n".join([t for t in text.split("\n") if t])
14+
15+
16+
docs = []
17+
metadatas = []
18+
for p in Path("langchain.readthedocs.io/en/latest/").rglob("*"):
19+
if p.is_dir():
20+
continue
21+
with open(p) as f:
22+
docs.append(clean_data(f.read()))
23+
metadatas.append({"source": p})
24+
25+
26+
text_splitter = CharacterTextSplitter(
27+
separator="\n",
28+
chunk_size=1000,
29+
chunk_overlap=200,
30+
length_function=len,
31+
)
32+
33+
documents = text_splitter.create_documents(docs, metadatas=metadatas)
34+
35+
36+
WEAVIATE_URL = os.environ["WEAVIATE_URL"]
37+
client = weaviate.Client(
38+
url=WEAVIATE_URL,
39+
additional_headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
40+
)
41+
42+
client.schema.delete_class("Paragraph")
43+
client.schema.get()
44+
schema = {
45+
"classes": [
46+
{
47+
"class": "Paragraph",
48+
"description": "A written paragraph",
49+
"vectorizer": "text2vec-openai",
50+
"moduleConfig": {
51+
"text2vec-openai": {
52+
"model": "ada",
53+
"modelVersion": "002",
54+
"type": "text",
55+
}
56+
},
57+
"properties": [
58+
{
59+
"dataType": ["text"],
60+
"description": "The content of the paragraph",
61+
"moduleConfig": {
62+
"text2vec-openai": {
63+
"skip": False,
64+
"vectorizePropertyName": False,
65+
}
66+
},
67+
"name": "content",
68+
},
69+
{
70+
"dataType": ["text"],
71+
"description": "The link",
72+
"moduleConfig": {
73+
"text2vec-openai": {
74+
"skip": True,
75+
"vectorizePropertyName": False,
76+
}
77+
},
78+
"name": "source",
79+
},
80+
],
81+
},
82+
]
83+
}
84+
85+
client.schema.create(schema)
86+
87+
with client.batch as batch:
88+
for text in documents:
89+
batch.add_data_object(
90+
{"content": text.page_content, "source": str(text.metadata["source"])},
91+
"Paragraph",
92+
)

archive/ingest.sh

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Bash script to ingest data
2+
# This involves scraping the data from the web and then cleaning up and putting in Weaviate.
3+
!set -eu
4+
wget -r -A.html https://langchain.readthedocs.io/en/latest/
5+
python3 ingest.py
6+
python3 ingest_examples.py
File renamed without changes.

archive/requirements.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
langchain==0.0.64
2+
beautifulsoup4
3+
weaviate-client
4+
openai
5+
black
6+
isort
7+
Flask
8+
transformers
9+
gradio

assets/images/Chat_Your_Data.gif

274 KB
Loading

callback.py

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
"""Callback handlers used in the app."""
2+
from typing import Any, Dict, List
3+
4+
from langchain.callbacks.base import AsyncCallbackHandler
5+
6+
from schemas import ChatResponse
7+
8+
9+
class StreamingLLMCallbackHandler(AsyncCallbackHandler):
10+
"""Callback handler for streaming LLM responses."""
11+
12+
def __init__(self, websocket):
13+
self.websocket = websocket
14+
15+
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
16+
resp = ChatResponse(sender="bot", message=token, type="stream")
17+
await self.websocket.send_json(resp.dict())
18+
19+
20+
class QuestionGenCallbackHandler(AsyncCallbackHandler):
21+
"""Callback handler for question generation."""
22+
23+
def __init__(self, websocket):
24+
self.websocket = websocket
25+
26+
async def on_llm_start(
27+
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
28+
) -> None:
29+
"""Run when LLM starts running."""
30+
resp = ChatResponse(
31+
sender="bot", message="Synthesizing question...", type="info"
32+
)
33+
await self.websocket.send_json(resp.dict())

0 commit comments

Comments
 (0)