Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Metadata Filtering on Lambda for LangChain & Pinecone Integration #29539

Open
4 tasks done
vaishnav-mk opened this issue Feb 2, 2025 · 2 comments
Open
4 tasks done
Labels
Ɑ: vector store Related to vector store module

Comments

@vaishnav-mk
Copy link

Discussed in #29537

Originally posted by vaishnav-mk February 2, 2025

Checked other resources

  • I added a very descriptive title to this question.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.

Commit to Help

  • I commit to help with one of those options 👆

Example Code

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 5, "score_threshold": 0.5},
)

@tool
def projectTool(query: str) -> str:
    """
    Project tool:
        Handle queries about projects and project details.
    """
    response = retriever.invoke(
        f"{prompts_dict['projectPrompt']}",
        filter={"table_name": "Project", "project_id": projectId},
        k=1,
    )
    logger.info(response)
    return response

Description

I'm using the langchain library in combination with Pinecone for vector retrieval and metadata filtering. The filtering works as expected locally, but when deploying the same code to AWS Lambda, it fails to retrieve the correct data based on the metadata, despite using the same project_id and table_name.

What I’m doing:
I have a vector store retriever that is configured to filter results using metadata, specifically the table_name and project_id. Here’s an example of how I'm invoking the retriever:

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 5, "score_threshold": 0.5},
)

@tool
def projectTool(query: str) -> str:
    """
    Project tool:
        Handle queries about projects and project details.
    Examples:
        - Show me the project details
        - Get the project details
        - Show me the project information
    """
    response = retriever.invoke(
        f"{prompts_dict['projectPrompt']}",
        filter={"table_name": "Project", "project_id": projectId},
        k=1,
    )
    logger.info(response)
    return response

@tool
def materialTool(query: str) -> str:
    """
    Material tool:
        Use ONLY for queries about materials
    Examples:
        - What materials do we have in stock?
        - Show me the material catalog
        - Get material specifications
    DO NOT use for expense-related queries
    """
    response = retriever.invoke(
        f"{prompts_dict['materialPrompt']}", filter={"table_name": "Material", "project_id": projectId}
    )
    logger.info(response)
    return response

What I expect:
The retriever should successfully fetch the project details based on the project_id and table_name metadata filters.

What happens instead:
Locally, everything works as expected, and I can retrieve project data. However, when deploying to Lambda and using the same project_id and table_name, the retrieval fails to return any data.

It works fine with all the other tools except for the project tool for some reason
image

Data stored in the vector store:
The metadata for the project is stored as follows:

{
  "last_updated": 1738430342,
  "project_id": "18422398-bd99-4228-9337-c0664c9c5045",
  "table_name": "Project",
  "text": "{\"project_summary\": \"This is a detailed summary of the project 'Hello Kitty House'.\\nThe project was initiated on January 29, 2025 and was last updated on January 30, 2025.\\n\\nProject Overview:\\ {...} stages.\"}")}

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #202408141037 SMP PREEMPT_DYNAMIC Wed Aug 14 14:47:07 UTC 2024
> Python Version:  3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]

Package Information
-------------------
> langchain_core: 0.3.29
> langsmith: 0.2.10
> langchain_cli: 0.0.35
> langserve: 0.3.1

Other Dependencies
------------------
> fastapi: 0.115.6
> gitpython: 3.1.37
> gritql: 0.2.0
> httpx: 0.28.1
> jsonpatch: 1.33
> langserve[all]: Installed. No version info available.
> langsmith-pyo3: Installed. No version info available.
> orjson: 3.10.13
> packaging: 24.0
> pydantic: 2.10.4
> PyYAML: 6.0.1
> requests: 2.32.2
> requests-toolbelt: 1.0.0
> sse-starlette: 1.8.2
> tenacity: 8.2.2
> tomlkit: 0.13.2
> typer[all]: Installed. No version info available.
> typing-extensions: 4.12.2
> uvicorn: 0.34.0
> zstandard: 0.22.0
```</div>
@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Feb 2, 2025
@keenborder786
Copy link
Contributor

It's very difficult to answer your question. From the looks of it, you seem to be passing the filter argument correctly. You can try one thing, rather than passing in the filter duing the invoke, pass the filter when retriever is being initiated:

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 5, "score_threshold": 0.5, "filter": {"table_name": "Material", "project_id": projectId}},
)

@vaishnav-mk
Copy link
Author

vaishnav-mk commented Feb 2, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

2 participants