Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add filter and timestamp splits #627

Merged
merged 11 commits into from
Aug 18, 2021

Conversation

ivanmkc
Copy link
Contributor

@ivanmkc ivanmkc commented Aug 13, 2021

Changes to training splits:

  • Defers default splits to the server instead of setting in client
  • Raises error when multiple split types passed
  • Adds filter splits (b/172365904) to image, text and video
  • Adds timestamp split (b/172368070) to tabular
  • Added tests for all split types

Future:

  • next PR will change forecasting training jobs to extend tabular; therefore, adding timestamp and predefined split to forecasting

Continued from: #549

@product-auto-label product-auto-label bot added the api: aiplatform Issues related to the AI Platform API. label Aug 13, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 13, 2021
@ivanmkc ivanmkc force-pushed the imkc--filter_timestamp_splits branch 2 times, most recently from b7eac85 to 91d9c20 Compare August 17, 2021 05:16
training_fraction_split: float = 0.8,
validation_fraction_split: float = 0.1,
test_fraction_split: float = 0.1,
training_fraction_split: Optional[float] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved default setting responsibility to server-side instead of in multiple places in the client.

@ivanmkc ivanmkc force-pushed the imkc--filter_timestamp_splits branch from 6558de2 to 07b8843 Compare August 17, 2021 06:41
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
training_fraction_split=None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to None, hence deferring to server.

@ivanmkc ivanmkc force-pushed the imkc--filter_timestamp_splits branch 2 times, most recently from 9b512fd to 4032b82 Compare August 17, 2021 07:27
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
@ivanmkc ivanmkc force-pushed the imkc--filter_timestamp_splits branch 3 times, most recently from 705e976 to 114560f Compare August 17, 2021 18:43
@@ -5091,13 +5672,22 @@ def _run(
model_tbt.display_name = model_display_name or self._display_name
model_tbt.labels = model_labels or self._labels

# AutoMLVideo does not support validation, so pass in '-' if any other filter split is provided.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird thing we have to do for AutoMLVideo, pending discussion with Video team.

@@ -261,18 +258,11 @@ def test_run_call_pipeline_service_create(
if not sync:
model_from_job.wait()

true_fraction_split = gca_training_pipeline.FractionSplit(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since we test splits separately later.

Copy link
Member

@sasha-gitg sasha-gitg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ivan! LGTM and left a few comments.

google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
of data will be used for training, 10% for validation, and 10% for test.

Data filter splits:
Assigns input data to training, validation, and test sets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assigns input data to training, validation, and test sets
If using filter splits all of ``training_filter_split``, ``validation_filter_split`` and
``test_filter_split`` must be provided.
Assigns input data to training, validation, and test sets

Please apply this to class docstrings that support filter splits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
Supported only for tabular Datasets.

Timestamp splits:
Assigns input data to training, validation, and test sets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assigns input data to training, validation, and test sets
If using timestamp split please provide ```timestamp_split_column_name```
Any of ``training_fraction_split``, ``validation_fraction_split`` and
``test_fraction_split`` may optionally be provided.
Assigns input data to training, validation, and test sets.

Please apply this to class docstrings that support timestamp splits.

google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
google/cloud/aiplatform/training_jobs.py Outdated Show resolved Hide resolved
@ivanmkc ivanmkc force-pushed the imkc--filter_timestamp_splits branch from 95b4229 to c07d7fa Compare August 17, 2021 21:09
@ivanmkc
Copy link
Contributor Author

ivanmkc commented Aug 18, 2021

Will lint and merge.

@ivanmkc ivanmkc merged commit 1a13577 into googleapis:master Aug 18, 2021
@ivanmkc
Copy link
Contributor Author

ivanmkc commented Aug 18, 2021

Merged without passing 'Sample Lint' check as I didn't modify sample code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the AI Platform API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants