Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TablestoreVectorStore. #25767

Merged
merged 21 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/docs/integrations/providers/alibaba_cloud.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,11 @@ See [installation instructions and a usage example](/docs/integrations/vectorsto
```python
from langchain_community.vectorstores import Hologres
```

### Tablestore

See [installation instructions and a usage example](/docs/integrations/vectorstores/tablestore).

```python
from langchain_community.vectorstores import TablestoreVectorStore
```
385 changes: 385 additions & 0 deletions docs/docs/integrations/vectorstores/tablestore.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,385 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# TablestoreVectorStore\n",
"\n",
"> [Tablestore](https://www.aliyun.com/product/ots) is a fully managed NoSQL cloud database service that enables storage of a massive amount of structured\n",
"and semi-structured data.\n",
"\n",
"This notebook shows how to use functionality related to the `Tablestore` vector database.\n",
"\n",
"To use Tablestore, you must create an instance.\n",
"Here are the [creating instance instructions](https://help.aliyun.com/zh/tablestore/getting-started/manage-the-wide-column-model-in-the-tablestore-console)."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": "## Setup"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-community tablestore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Initialization"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:04.469458Z",
"start_time": "2024-08-20T11:09:49.541150Z"
},
"pycharm": {
"is_executing": true,
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"end_point\"] = getpass.getpass(\"Tablestore end_point:\")\n",
"os.environ[\"instance_name\"] = getpass.getpass(\"Tablestore instance_name:\")\n",
"os.environ[\"access_key_id\"] = getpass.getpass(\"Tablestore access_key_id:\")\n",
"os.environ[\"access_key_secret\"] = getpass.getpass(\"Tablestore access_key_secret:\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Create vector store. "
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:07.911086Z",
"start_time": "2024-08-20T11:10:07.351293Z"
}
},
"outputs": [],
"source": [
"import tablestore\n",
"from langchain_community.embeddings import FakeEmbeddings\n",
"from langchain_community.vectorstores import TablestoreVectorStore\n",
"from langchain_core.documents import Document\n",
"\n",
"test_embedding_dimension_size = 4\n",
"embeddings = FakeEmbeddings(size=test_embedding_dimension_size)\n",
"\n",
"store = TablestoreVectorStore(\n",
" embedding=embeddings,\n",
" endpoint=os.getenv(\"end_point\"),\n",
" instance_name=os.getenv(\"instance_name\"),\n",
" access_key_id=os.getenv(\"access_key_id\"),\n",
" access_key_secret=os.getenv(\"access_key_secret\"),\n",
" vector_dimension=test_embedding_dimension_size,\n",
" # metadata mapping is used to filter non-vector fields.\n",
" metadata_mappings=[\n",
" tablestore.FieldSchema(\n",
" \"type\", tablestore.FieldType.KEYWORD, index=True, enable_sort_and_agg=True\n",
" ),\n",
" tablestore.FieldSchema(\n",
" \"time\", tablestore.FieldType.LONG, index=True, enable_sort_and_agg=True\n",
" ),\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Manage vector store"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Create table and index."
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:10.875422Z",
"start_time": "2024-08-20T11:10:10.566400Z"
}
},
"outputs": [],
"source": [
"store.create_table_if_not_exist()\n",
"store.create_search_index_if_not_exist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Add documents."
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:14.974253Z",
"start_time": "2024-08-20T11:10:14.894190Z"
},
"pycharm": {
"is_executing": true,
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['1', '2', '3', '4', '5']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.add_documents(\n",
" [\n",
" Document(\n",
" id=\"1\", page_content=\"1 hello world\", metadata={\"type\": \"pc\", \"time\": 2000}\n",
" ),\n",
" Document(\n",
" id=\"2\", page_content=\"abc world\", metadata={\"type\": \"pc\", \"time\": 2009}\n",
" ),\n",
" Document(\n",
" id=\"3\", page_content=\"3 text world\", metadata={\"type\": \"sky\", \"time\": 2010}\n",
" ),\n",
" Document(\n",
" id=\"4\", page_content=\"hi world\", metadata={\"type\": \"sky\", \"time\": 2030}\n",
" ),\n",
" Document(\n",
" id=\"5\", page_content=\"hi world\", metadata={\"type\": \"sky\", \"time\": 2030}\n",
" ),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": "Delete document."
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:17.408739Z",
"start_time": "2024-08-20T11:10:17.269593Z"
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.delete([\"3\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": "Get documents."
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Query vector store"
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:19.379617Z",
"start_time": "2024-08-20T11:10:19.339970Z"
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(id='1', metadata={'embedding': '[1.3296732307905934, 0.0037521341868022385, 0.9821875819319514, 2.5644103644492393]', 'time': 2000, 'type': 'pc'}, page_content='1 hello world'),\n",
" None,\n",
" Document(id='5', metadata={'embedding': '[1.4558082172139821, -1.6441137122167426, -0.13113098640337423, -1.889685473174525]', 'time': 2030, 'type': 'sky'}, page_content='hi world')]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.get_by_ids([\"1\", \"3\", \"5\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Similarity search."
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:21.306717Z",
"start_time": "2024-08-20T11:10:21.284244Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(id='1', metadata={'embedding': [1.3296732307905934, 0.0037521341868022385, 0.9821875819319514, 2.5644103644492393], 'time': 2000, 'type': 'pc'}, page_content='1 hello world'),\n",
" Document(id='4', metadata={'embedding': [-0.3310144199800685, 0.29250046478723635, -0.0646862290377582, -0.23664360156781225], 'time': 2030, 'type': 'sky'}, page_content='hi world')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.similarity_search(query=\"hello world\", k=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Similarity search with filters. "
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-20T11:10:23.231425Z",
"start_time": "2024-08-20T11:10:23.213046Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[Document(id='5', metadata={'embedding': [1.4558082172139821, -1.6441137122167426, -0.13113098640337423, -1.889685473174525], 'time': 2030, 'type': 'sky'}, page_content='hi world'),\n",
" Document(id='4', metadata={'embedding': [-0.3310144199800685, 0.29250046478723635, -0.0646862290377582, -0.23664360156781225], 'time': 2030, 'type': 'sky'}, page_content='hi world')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.similarity_search(\n",
" query=\"hello world\",\n",
" k=10,\n",
" tablestore_filter_query=tablestore.BoolQuery(\n",
" must_queries=[tablestore.TermQuery(field_name=\"type\", column_value=\"sky\")],\n",
" should_queries=[tablestore.RangeQuery(field_name=\"time\", range_from=2020)],\n",
" must_not_queries=[tablestore.TermQuery(field_name=\"type\", column_value=\"pc\")],\n",
" ),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [Tutorials](/docs/tutorials/)\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `TablestoreVectorStore` features and configurations head to the API reference:\n",
" https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.tablestore.TablestoreVectorStore.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Loading
Loading