-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
community[minor]: Add TablestoreVectorStore (#25767)
Thank you for contributing to LangChain! - [x] **PR title**: community: add TablestoreVectorStore - [x] **PR message**: - **Description:** add TablestoreVectorStore - **Dependencies:** none - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration: yes 2. an example notebook showing its use: yes If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Bagatur <[email protected]> Co-authored-by: Bagatur <[email protected]>
- Loading branch information
1 parent
86b3c6e
commit b0a2988
Showing
7 changed files
with
1,057 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,385 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%% md\n" | ||
} | ||
}, | ||
"source": [ | ||
"# TablestoreVectorStore\n", | ||
"\n", | ||
"> [Tablestore](https://www.aliyun.com/product/ots) is a fully managed NoSQL cloud database service that enables storage of a massive amount of structured\n", | ||
"and semi-structured data.\n", | ||
"\n", | ||
"This notebook shows how to use functionality related to the `Tablestore` vector database.\n", | ||
"\n", | ||
"To use Tablestore, you must create an instance.\n", | ||
"Here are the [creating instance instructions](https://help.aliyun.com/zh/tablestore/getting-started/manage-the-wide-column-model-in-the-tablestore-console)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%% md\n" | ||
} | ||
}, | ||
"source": "## Setup" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install --upgrade --quiet langchain-community tablestore" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "## Initialization" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:04.469458Z", | ||
"start_time": "2024-08-20T11:09:49.541150Z" | ||
}, | ||
"pycharm": { | ||
"is_executing": true, | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import getpass\n", | ||
"import os\n", | ||
"\n", | ||
"os.environ[\"end_point\"] = getpass.getpass(\"Tablestore end_point:\")\n", | ||
"os.environ[\"instance_name\"] = getpass.getpass(\"Tablestore instance_name:\")\n", | ||
"os.environ[\"access_key_id\"] = getpass.getpass(\"Tablestore access_key_id:\")\n", | ||
"os.environ[\"access_key_secret\"] = getpass.getpass(\"Tablestore access_key_secret:\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "Create vector store. " | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:07.911086Z", | ||
"start_time": "2024-08-20T11:10:07.351293Z" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import tablestore\n", | ||
"from langchain_community.embeddings import FakeEmbeddings\n", | ||
"from langchain_community.vectorstores import TablestoreVectorStore\n", | ||
"from langchain_core.documents import Document\n", | ||
"\n", | ||
"test_embedding_dimension_size = 4\n", | ||
"embeddings = FakeEmbeddings(size=test_embedding_dimension_size)\n", | ||
"\n", | ||
"store = TablestoreVectorStore(\n", | ||
" embedding=embeddings,\n", | ||
" endpoint=os.getenv(\"end_point\"),\n", | ||
" instance_name=os.getenv(\"instance_name\"),\n", | ||
" access_key_id=os.getenv(\"access_key_id\"),\n", | ||
" access_key_secret=os.getenv(\"access_key_secret\"),\n", | ||
" vector_dimension=test_embedding_dimension_size,\n", | ||
" # metadata mapping is used to filter non-vector fields.\n", | ||
" metadata_mappings=[\n", | ||
" tablestore.FieldSchema(\n", | ||
" \"type\", tablestore.FieldType.KEYWORD, index=True, enable_sort_and_agg=True\n", | ||
" ),\n", | ||
" tablestore.FieldSchema(\n", | ||
" \"time\", tablestore.FieldType.LONG, index=True, enable_sort_and_agg=True\n", | ||
" ),\n", | ||
" ],\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "## Manage vector store" | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "Create table and index." | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:10.875422Z", | ||
"start_time": "2024-08-20T11:10:10.566400Z" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"store.create_table_if_not_exist()\n", | ||
"store.create_search_index_if_not_exist()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "Add documents." | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:14.974253Z", | ||
"start_time": "2024-08-20T11:10:14.894190Z" | ||
}, | ||
"pycharm": { | ||
"is_executing": true, | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"['1', '2', '3', '4', '5']" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"store.add_documents(\n", | ||
" [\n", | ||
" Document(\n", | ||
" id=\"1\", page_content=\"1 hello world\", metadata={\"type\": \"pc\", \"time\": 2000}\n", | ||
" ),\n", | ||
" Document(\n", | ||
" id=\"2\", page_content=\"abc world\", metadata={\"type\": \"pc\", \"time\": 2009}\n", | ||
" ),\n", | ||
" Document(\n", | ||
" id=\"3\", page_content=\"3 text world\", metadata={\"type\": \"sky\", \"time\": 2010}\n", | ||
" ),\n", | ||
" Document(\n", | ||
" id=\"4\", page_content=\"hi world\", metadata={\"type\": \"sky\", \"time\": 2030}\n", | ||
" ),\n", | ||
" Document(\n", | ||
" id=\"5\", page_content=\"hi world\", metadata={\"type\": \"sky\", \"time\": 2030}\n", | ||
" ),\n", | ||
" ]\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%% md\n" | ||
} | ||
}, | ||
"source": "Delete document." | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:17.408739Z", | ||
"start_time": "2024-08-20T11:10:17.269593Z" | ||
}, | ||
"pycharm": { | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"True" | ||
] | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"store.delete([\"3\"])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"pycharm": { | ||
"name": "#%% md\n" | ||
} | ||
}, | ||
"source": "Get documents." | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "## Query vector store" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:19.379617Z", | ||
"start_time": "2024-08-20T11:10:19.339970Z" | ||
}, | ||
"pycharm": { | ||
"name": "#%%\n" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(id='1', metadata={'embedding': '[1.3296732307905934, 0.0037521341868022385, 0.9821875819319514, 2.5644103644492393]', 'time': 2000, 'type': 'pc'}, page_content='1 hello world'),\n", | ||
" None,\n", | ||
" Document(id='5', metadata={'embedding': '[1.4558082172139821, -1.6441137122167426, -0.13113098640337423, -1.889685473174525]', 'time': 2030, 'type': 'sky'}, page_content='hi world')]" | ||
] | ||
}, | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"store.get_by_ids([\"1\", \"3\", \"5\"])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "Similarity search." | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:21.306717Z", | ||
"start_time": "2024-08-20T11:10:21.284244Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(id='1', metadata={'embedding': [1.3296732307905934, 0.0037521341868022385, 0.9821875819319514, 2.5644103644492393], 'time': 2000, 'type': 'pc'}, page_content='1 hello world'),\n", | ||
" Document(id='4', metadata={'embedding': [-0.3310144199800685, 0.29250046478723635, -0.0646862290377582, -0.23664360156781225], 'time': 2030, 'type': 'sky'}, page_content='hi world')]" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"store.similarity_search(query=\"hello world\", k=2)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "Similarity search with filters. " | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": { | ||
"ExecuteTime": { | ||
"end_time": "2024-08-20T11:10:23.231425Z", | ||
"start_time": "2024-08-20T11:10:23.213046Z" | ||
} | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(id='5', metadata={'embedding': [1.4558082172139821, -1.6441137122167426, -0.13113098640337423, -1.889685473174525], 'time': 2030, 'type': 'sky'}, page_content='hi world'),\n", | ||
" Document(id='4', metadata={'embedding': [-0.3310144199800685, 0.29250046478723635, -0.0646862290377582, -0.23664360156781225], 'time': 2030, 'type': 'sky'}, page_content='hi world')]" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"store.similarity_search(\n", | ||
" query=\"hello world\",\n", | ||
" k=10,\n", | ||
" tablestore_filter_query=tablestore.BoolQuery(\n", | ||
" must_queries=[tablestore.TermQuery(field_name=\"type\", column_value=\"sky\")],\n", | ||
" should_queries=[tablestore.RangeQuery(field_name=\"time\", range_from=2020)],\n", | ||
" must_not_queries=[tablestore.TermQuery(field_name=\"type\", column_value=\"pc\")],\n", | ||
" ),\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Usage for retrieval-augmented generation\n", | ||
"\n", | ||
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", | ||
"\n", | ||
"- [Tutorials](/docs/tutorials/)\n", | ||
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n", | ||
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## API reference\n", | ||
"\n", | ||
"For detailed documentation of all `TablestoreVectorStore` features and configurations head to the API reference:\n", | ||
" https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.tablestore.TablestoreVectorStore.html" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 1 | ||
} |
Oops, something went wrong.