-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self-query with generic query constructor #3607
Conversation
return Comparison(comparator=comp, attribute=attr.strip("\"'"), value=val) | ||
|
||
|
||
def parse_filter(_filter: str) -> Union[Operation, Comparison]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about refactor using lark
or pyparsing
or something similar? They should make it to extend the code and handle a large class of errors from the get-go (e.g., checking for rule ambiguity etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for sake of time, ok if we leave this as todo for refactor in sep pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah def
LTE = "lte" | ||
|
||
|
||
class Comparison(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two suggestions:
- Introduce a common type hierarchy for "operators" / "directives" whatever we choose to call them. So we don't have to do
List[Union[Comparison, Operation]]
, but instead can doSequence[Directive]
. - Could we remove everything from the schema file except for the schema, so it's easy to see the type hierarchy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does ir.py
currently contain what you were imagining or did you want to separate out even more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably makes sense to move out the visitor and keep only the ast typedefs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you deal with circular import (since ast's define method that takes visitor and visitor defines methods that take in ast's)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could always type something as Any
arbitrary_types_allowed = True | ||
|
||
|
||
def format_attribute_info(info: List[AttributeInfo]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to favor Sequence
over List
on inputs -- Sequences
accept tuples/lists and the collection is immutable
def format_attribute_info(info: List[AttributeInfo]) -> str: | |
def format_attribute_info(info: Sequence[AttributeInfo]) -> str: |
def parse(self, text: str) -> StructuredQuery: | ||
try: | ||
expected_keys = ["query", "filter"] | ||
parsed = parse_json_markdown(text, expected_keys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the output is modified to be something like
<command>
request(..., ...)
</command>
Then after extracting the content of command, the string can be fed into
ast_parse` without having to deal with parsing JSON and dealing with query and filter separately
HOT HOT HOT 🔥 🔥 🔥 |
🙃 🙃 🙃 🙃 |
Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <[email protected]>
Alternate implementation of langchain-ai#3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as langchain-ai#3452 on exampels --------- Co-authored-by: Harrison Chase <[email protected]>
expected_keys = ["query", "filter"] | ||
parsed = parse_json_markdown(text, expected_keys) | ||
if len(parsed["query"]) == 0: | ||
parsed["query"] = " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you fallback to " "
when the returned string is empty?
I am also noticing that in the case In other words, while the self query is supposed to be a superset of the standard querying (structured + unstructured), the unstructured querying seems inferior to using the vector store directly. docsearch = ....
retriever = docsearch.as_retriever()
sq_retriever = SelfQueryRetriever.from_llm(
llm, docsearch, document_content_description, metadata_field_info, verbose=True
)
# this uses the entire phrase for similarity match
retriever.get_relevant_documents("What is the capital city of Vermont?")
# whereas assuming there is no structured operator here, the phrase sent to the vector store might just be "Vermont"
sq_retriever.get_relevant_documents("What is the capital city of Vermont?") |
Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels