Skip to content

Allow non-existent checkpoint location path in index validation#313

Merged
dai-chen merged 1 commit intoopensearch-project:mainfrom
dai-chen:fix-non-existing-checkpoint-location-support
Apr 19, 2024
Merged

Allow non-existent checkpoint location path in index validation#313
dai-chen merged 1 commit intoopensearch-project:mainfrom
dai-chen:fix-non-existing-checkpoint-location-support

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented Apr 18, 2024

Description

The pre-validation of checkpoint locations, introduced in PR #297, requires the specified checkpoint path to exist. However, Spark is designed to automatically create necessary sub-folders when a streaming job begins. This PR is to relax this strict validation to enhance user convenience and ensure backward compatibility.

Testing

For checkpoint location without access permission, the behavior is the same as before:

spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
...                (clientip VALUE_SET) 
...                WITH (
...                  auto_refresh = true,
...                  checkpoint_location = 's3://test/test'
...                );

java.lang.IllegalArgumentException: requirement failed: 
  No permission to access the checkpoint location s3://test/test

For checkpoint location with non-existent sub-folders, the validation can pass now and Spark streaming job creates it when start:

# validation-test folder doesn't exist
spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
...                (clientip VALUE_SET) 
...                WITH (
...                  auto_refresh = true,
...                  checkpoint_location = 's3://daichen/validation-test'
...                );
Time taken: 11.97 seconds

# both validation-test and validation-test/subtest1 folder doesn't exist
spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs 
...                (clientip VALUE_SET) 
...                WITH (
...                  auto_refresh = true,
...                  checkpoint_location = 's3://daichen/validation-test/subtest1'
...                );
Time taken: 12.978 seconds

Issues Resolved

#65

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added bug Something isn't working 0.4 labels Apr 18, 2024
@dai-chen dai-chen self-assigned this Apr 18, 2024
@dai-chen dai-chen marked this pull request as ready for review April 18, 2024 22:30
Copy link
Copy Markdown
Collaborator

@seankao-az seankao-az left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

@dai-chen dai-chen merged commit b5ab7bd into opensearch-project:main Apr 19, 2024
@dai-chen dai-chen deleted the fix-non-existing-checkpoint-location-support branch April 19, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

0.4 bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants