-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: add row_limit for text generators #1054
Conversation
Is this to server a customer feedback or to support our test infra? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with 2 nit comments.
bigframes/ml/llm.py
Outdated
@@ -598,6 +598,8 @@ class TextEmbeddingGenerator(base.BaseEstimator): | |||
connection_name (str or None): | |||
Connection to connect with remote service. str of the format <PROJECT_NUMBER/PROJECT_ID>.<LOCATION>.<CONNECTION_ID>. | |||
If None, use default connection in session context. | |||
row_limit (int or None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: max_cols
, which is used in other pandas methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bigframes/ml/llm.py
Outdated
@@ -598,6 +598,8 @@ class TextEmbeddingGenerator(base.BaseEstimator): | |||
connection_name (str or None): | |||
Connection to connect with remote service. str of the format <PROJECT_NUMBER/PROJECT_ID>.<LOCATION>.<CONNECTION_ID>. | |||
If None, use default connection in session context. | |||
row_limit (int or None): | |||
The maximum number of rows that this model can process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The maximum number of rows that this model can process per call
. The model can be called multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG!
max_rows (int or None): | ||
The maximum number of rows that this model can process per call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is this is better as a bigframes-wide setting. Alternatively, if we are just focusing on sem-join cardinality explosion, we can apply a narrow limit just for that case
Added |
This helps prevent unexpected large amount of data being processed by LLM