Skip to content

Feature/vectorstore myscale #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 21, 2023
31 changes: 31 additions & 0 deletions docs/ecosystem/myscale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# MyScale

This page covers how to use the MyScale ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific MyScale wrappers.

## Installation and Setup
- Install the Python SDK with `pip install clickhouse-connect`

## Wrappers
supported functions:
- `add_texts`
- `add_documents`
- `from_texts`
- `from_documents`
- `similarity_search`
- `asimilarity_search`
- `similarity_search_by_vector`
- `asimilarity_search_by_vector`
- `similarity_search_with_relevance_scores`

### VectorStore

There exists a wrapper around MyScale database, allowing you to use it as a vectorstore,
whether for semantic search or example selection.

To import this vectorstore:
```python
from langchain.vectorstores import MyScale
```

For a more detailed walkthrough of the MyScale wrapper, see [this notebook](../modules/indexes/vectorstores/examples/myscale.ipynb)
247 changes: 247 additions & 0 deletions docs/modules/indexes/vectorstores/examples/myscale.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# MyScale\n",
"\n",
"This notebook shows how to use functionality related to the MyScale vector database."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aac9563e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import MyScale\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a9d16fa3",
"metadata": {},
"source": [
"## Setting up envrionments\n",
"\n",
"There are two ways to set up parameters for myscale index.\n",
"\n",
"1. Environment Variables\n",
"\n",
" Before you run the app, please set the environment variable with `export`:\n",
"\n",
" `export myscale_host='<your-endpoints-url>' myscale_port=<your-endpoints-port> myscale_username=<your-username> ...`\n",
"\n",
" Every attributes under `MyScaleSettings` can be set with prefix `myscale` and is case insensitive.\n",
"\n",
"2. Create `MyScaleSettings` object with parameters\n",
"\n",
"\n",
" ```python\n",
" from langchain.vectorstores import MyScale, MyScaleSettings\n",
" config = MyScaleSetting(host=\"localhost\", port=8123, ...)\n",
" index = MyScale(embedding_function, config)\n",
" index.add_documents(...)\n",
" ```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6e104aee",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Inserting data...: 100%|██████████| 42/42 [00:22<00:00, 1.90it/s]\n"
]
}
],
"source": [
"docsearch = MyScale.from_documents(docs, embeddings.embed_query)\n",
" \n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9c608226",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e3a8b105",
"metadata": {},
"source": [
"## Show data schema"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "69996818",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[92m\u001b[1mdefault.vector_table @ msc-3dfcbf0c.ap-southeast-1.aws.dev.myscale.cloud:8443\u001b[0m\n",
"\n",
"\u001b[1musername: mpsk\u001b[0m\n",
"\n",
"Table Schema:\n",
"---------------------------------------------------\n",
"|\u001b[94mid \u001b[0m|\u001b[96mString \u001b[0m|\n",
"|\u001b[94mtext \u001b[0m|\u001b[96mString \u001b[0m|\n",
"|\u001b[94mvector \u001b[0m|\u001b[96mArray(Float32) \u001b[0m|\n",
"|\u001b[94mmetadata \u001b[0m|\u001b[96mObject('json') \u001b[0m|\n",
"---------------------------------------------------\n",
"\n"
]
}
],
"source": [
"print(str(docsearch))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f59360c0",
"metadata": {},
"source": [
"## Filtering\n",
"\n",
"You can have direct access to myscale SQL where statement. You can write `WHERE` clause following standard SQL.\n",
"\n",
"**NOTE**: Please be aware of SQL injection, this interface must not be directly called by end-user.\n",
"\n",
"If you custimized your `column_map` under your setting, you search with filter like this:\n",
"```python\n",
"from langchain.vectorstores import MyScale, MyScaleSettings\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"docsearch = MyScale.from_documents(docs, embeddings.embed_query)\n",
"docsearch.similarity_search_with_relevance_scores('What did the president say about Ketanji Brown Jackson?', \n",
" where_str=f\"{docsearch.config.column_map['metadata']}.attribute>10 and \\\n",
" {docsearch.config.column_map['metadata']}.another_attribute='another string'\")\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a359ed74",
"metadata": {},
"source": [
"## On deleting your data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "fb6a9d36",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"''"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docsearch.client.command(f'DROP TABLE {docsearch.config.database}.{docsearch.config.table}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00e85477",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions langchain/vectorstores/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from langchain.vectorstores.pinecone import Pinecone
from langchain.vectorstores.qdrant import Qdrant
from langchain.vectorstores.weaviate import Weaviate
from langchain.vectorstores.myscale import MyScale, MyScaleSettings

__all__ = [
"ElasticVectorSearch",
Expand All @@ -25,4 +26,5 @@
"AtlasDB",
"DeepLake",
"Annoy",
"MyScale"
]
Loading