Skip to content

Feature/vectorstore myscale #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 21, 2023
31 changes: 31 additions & 0 deletions docs/ecosystem/myscale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# MyScale

This page covers how to use MyScale vector database within LangChain.
It is broken into two parts: installation and setup, and then references to specific MyScale wrappers.

## Installation and Setup
- Install the Python SDK with `pip install clickhouse-connect`

## Wrappers
supported functions:
- `add_texts`
- `add_documents`
- `from_texts`
- `from_documents`
- `similarity_search`
- `asimilarity_search`
- `similarity_search_by_vector`
- `asimilarity_search_by_vector`
- `similarity_search_with_relevance_scores`

### VectorStore

There exists a wrapper around MyScale database, allowing you to use it as a vectorstore,
whether for semantic search or similar example retrieval.

To import this vectorstore:
```python
from langchain.vectorstores import MyScale
```

For a more detailed walkthrough of the MyScale wrapper, see [this notebook](../modules/indexes/vectorstores/examples/myscale.ipynb)
215 changes: 215 additions & 0 deletions docs/modules/indexes/vectorstores/examples/myscale.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# MyScale\n",
"\n",
"This notebook shows how to use functionality related to the MyScale vector database."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import MyScale\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a9d16fa3",
"metadata": {},
"source": [
"## Setting up envrionments\n",
"\n",
"There are two ways to set up parameters for myscale index.\n",
"\n",
"1. Environment Variables\n",
"\n",
" Before you run the app, please set the environment variable with `export`:\n",
"\n",
" `export MYSCALE_URL='<your-endpoints-url>' MYSCALE_PORT=<your-endpoints-port> MYSCALE_USERNAME=<your-username> MYSCALE_PASSWORD=<your-password> ...`\n",
"\n",
" Every attributes under `MyScaleSettings` can be set with prefix `MYSCALE_` and is case insensitive.\n",
"\n",
"2. Create `MyScaleSettings` object with parameters\n",
"\n",
"\n",
" ```python\n",
" from langchain.vectorstores import MyScale, MyScaleSettings\n",
" config = MyScaleSetting(host=\"localhost\", port=8123, ...)\n",
" index = MyScale(embedding_function, config)\n",
" index.add_documents(...)\n",
" ```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e104aee",
"metadata": {},
"outputs": [],
"source": [
"for d in docs:\n",
" d.metadata = {'some': 'metadata'}\n",
"docsearch = MyScale.from_documents(docs, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c608226",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e3a8b105",
"metadata": {},
"source": [
"## Show data schema"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69996818",
"metadata": {},
"outputs": [],
"source": [
"print(str(docsearch))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f59360c0",
"metadata": {},
"source": [
"## Filtering\n",
"\n",
"You can have direct access to myscale SQL where statement. You can write `WHERE` clause following standard SQL.\n",
"\n",
"**NOTE**: Please be aware of SQL injection, this interface must not be directly called by end-user.\n",
"\n",
"If you custimized your `column_map` under your setting, you search with filter like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "232055f6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import MyScale, MyScaleSettings\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"for d in docs:\n",
" d.metadata = {'some': 'metadata'}\n",
"\n",
"docsearch = MyScale.from_documents(docs, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ddbcee77",
"metadata": {},
"outputs": [],
"source": [
"meta = docsearch.metadata_column\n",
"docs = docsearch.similarity_search_with_relevance_scores('What did the president say about Ketanji Brown Jackson?', \n",
" where_str=f\"{meta}.some='metadata'\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a359ed74",
"metadata": {},
"source": [
"## On deleting your data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb6a9d36",
"metadata": {},
"outputs": [],
"source": [
"docsearch.drop()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00e85477",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions langchain/vectorstores/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from langchain.vectorstores.qdrant import Qdrant
from langchain.vectorstores.supabase import SupabaseVectorStore
from langchain.vectorstores.weaviate import Weaviate
from langchain.vectorstores.myscale import MyScale, MyScaleSettings

__all__ = [
"ElasticVectorSearch",
Expand All @@ -26,5 +27,6 @@
"AtlasDB",
"DeepLake",
"Annoy",
"MyScale",
"SupabaseVectorStore",
]
Loading