Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add row_limit for text generators #1054

Closed
wants to merge 2 commits into from
Closed

chore: add row_limit for text generators #1054

wants to merge 2 commits into from

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Oct 4, 2024

This helps prevent unexpected large amount of data being processed by LLM

@sycai sycai requested review from a team as code owners October 4, 2024 23:01
@sycai sycai requested a review from chelsea-lin October 4, 2024 23:01
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Oct 4, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Oct 4, 2024
@shobsi
Copy link
Contributor

shobsi commented Oct 4, 2024

This helps prevent unexpected large amount of data being processed by LLM

Is this to server a customer feedback or to support our test infra?

@sycai sycai requested a review from TrevorBergeron October 4, 2024 23:09
Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with 2 nit comments.

@@ -598,6 +598,8 @@ class TextEmbeddingGenerator(base.BaseEstimator):
connection_name (str or None):
Connection to connect with remote service. str of the format <PROJECT_NUMBER/PROJECT_ID>.<LOCATION>.<CONNECTION_ID>.
If None, use default connection in session context.
row_limit (int or None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: max_cols, which is used in other pandas methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -598,6 +598,8 @@ class TextEmbeddingGenerator(base.BaseEstimator):
connection_name (str or None):
Connection to connect with remote service. str of the format <PROJECT_NUMBER/PROJECT_ID>.<LOCATION>.<CONNECTION_ID>.
If None, use default connection in session context.
row_limit (int or None):
The maximum number of rows that this model can process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The maximum number of rows that this model can process per call. The model can be called multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG!

Comment on lines +601 to +602
max_rows (int or None):
The maximum number of rows that this model can process per call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is this is better as a bigframes-wide setting. Alternatively, if we are just focusing on sem-join cardinality explosion, we can apply a narrow limit just for that case

@chelsea-lin chelsea-lin added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Oct 4, 2024
@chelsea-lin
Copy link
Contributor

Added do not merge for further API reviewing.

@sycai sycai closed this Dec 13, 2024
@sycai sycai deleted the sycai_llm_limit branch December 13, 2024 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. do not merge Indicates a pull request not ready for merge, due to either quality or timing. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants