-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(wren-ai-service): retrieval improvement #599
Conversation
08297f9
to
1a96ec5
Compare
99417d4
to
71e5765
Compare
71e5765
to
1b628ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
should_force_deploy=bool(os.getenv("SHOULD_FORCE_DEPLOY", "")), | ||
column_indexing_batch_size=( | ||
int(os.getenv("COLUMN_INDEXING_BATCH_SIZE")) | ||
if os.getenv("COLUMN_INDEXING_BATCH_SIZE") | ||
else 50 | ||
), | ||
table_retrieval_size=( | ||
int(os.getenv("TABLE_RETRIEVAL_SIZE")) | ||
if os.getenv("TABLE_RETRIEVAL_SIZE") | ||
else 10 | ||
), | ||
table_column_retrieval_size=( | ||
int(os.getenv("TABLE_COLUMN_RETRIEVAL_SIZE")) | ||
if os.getenv("TABLE_COLUMN_RETRIEVAL_SIZE") | ||
else 1000 | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just mention first, and I don't think it need to be done in this PR. we can consider to use a config class to initial all env and it will make the code more concise.
indexing pipeline:
COLUMN_INDEXING_BATCH_SIZE
which users can decide how many columns to index in one document at one timeretrieval pipeline:
TABLE_RETRIEVAL_SIZE
) tables based on table name and table descriptions (table_descriptions collection)TABLE_COLUMN_RETRIEVAL_SIZE
) tables and columns based on previous results (db_schma)we also expose two env vars for table and column selection:
TABLE_RETRIEVAL_SIZE
andTABLE_COLUMN_RETRIEVAL_SIZE