Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs for chroma persistance #1202

Merged
merged 1 commit into from
Feb 21, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 110 additions & 5 deletions docs/modules/indexes/vectorstore_examples/chroma.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "aac9563e",
"metadata": {},
"outputs": [],
Expand All @@ -25,7 +25,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
Expand All @@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"id": "5eabdb75",
"metadata": {},
"outputs": [
Expand All @@ -63,7 +63,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 4,
"id": "4b172de8",
"metadata": {},
"outputs": [
Expand All @@ -89,10 +89,115 @@
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "8061454b",
"metadata": {},
"source": [
"## Persistance\n",
"\n",
"The below steps cover how to persist a ChromaDB instance"
]
},
{
"cell_type": "markdown",
"id": "2b76db26",
"metadata": {},
"source": [
"### Initialize PeristedChromaDB\n",
"Create embeddings for each chunk and insert into the Chroma vector database. The persist_directory argument tells ChromaDB where to store the database when it's persisted.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cdb86e0d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"No existing DB found in db, skipping load\n",
"No existing DB found in db, skipping load\n"
]
}
],
"source": [
"# Embed and store the texts\n",
"# Supplying a persist_directory will store the embeddings on disk\n",
"persist_directory = 'db'\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory=persist_directory)"
]
},
{
"cell_type": "markdown",
"id": "f568a322",
"metadata": {},
"source": [
"### Persist the Database\n",
"In a notebook, we should call persist() to ensure the embeddings are written to disk. This isn't necessary in a script - the database will be automatically persisted when the client object is destroyed."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "74b08cb4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Persisting DB to disk, putting it in the save folder db\n",
"PersistentDuckDB del, about to run persist\n",
"Persisting DB to disk, putting it in the save folder db\n"
]
}
],
"source": [
"vectordb.persist()\n",
"vectordb = None"
]
},
{
"cell_type": "markdown",
"id": "cc9ed900",
"metadata": {},
"source": [
"### Load the Database from disk, and create the chain\n",
"Be sure to pass the same persist_directory and embedding_function as you did when you instantiated the database. Initialize the chain we will use for question answering."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "31fecfe9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"loaded in 4 embeddings\n",
"loaded in 1 collections\n"
]
}
],
"source": [
"# Now we can load the persisted database from disk, and use it as normal. \n",
"vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a359ed74",
"id": "4dde7a0d",
"metadata": {},
"outputs": [],
"source": []
Expand Down