-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-16085: S3Guard to use object version or etags (interim PR) #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
steveloughran
wants to merge
3
commits into
apache:trunk
from
steveloughran:s3/HADOOP-16085-s3guard-versioning
Closed
HADOOP-16085: S3Guard to use object version or etags (interim PR) #803
steveloughran
wants to merge
3
commits into
apache:trunk
from
steveloughran:s3/HADOOP-16085-s3guard-versioning
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit ae876ab2df46c68ddd923edf8dd1d314191fcc94 Merge: 2e0254e 6a42745 Author: Ben Roling <[email protected]> Date: Thu May 2 10:14:10 2019 -0500 Merge branch 'trunk' into HADOOP-16085-squashed-2 commit 2e0254e Author: Ben Roling <[email protected]> Date: Thu Apr 18 12:13:40 2019 -0500 Remove unused import commit d1275e4 Merge: 450ba66 df76cdc Author: Ben Roling <[email protected]> Date: Thu Apr 18 12:10:01 2019 -0500 Merge branch 'trunk' into HADOOP-16085-squashed commit 450ba66 Author: Ben Roling <[email protected]> Date: Thu Apr 18 11:45:41 2019 -0500 Improvements to TestObjectChangeDetectionAttributes, AbstractS3AMockTest commit 408af6c Author: Ben Roling <[email protected]> Date: Thu Apr 18 10:29:05 2019 -0500 Use HttpStatus code constant instead of magic number commit 5f0532b Author: Ben Roling <[email protected]> Date: Thu Apr 18 10:02:50 2019 -0500 Update core-default.xml commit 3488b20 Author: Ben Roling <[email protected]> Date: Wed Apr 17 16:14:43 2019 -0500 Fix runaround of creating FileStatus and then calling fromFileStatus() commit 90d5c9c Author: Ben Roling <[email protected]> Date: Wed Apr 17 15:45:51 2019 -0500 Fix minor nits commit 3ff59e4 Author: Ben Roling <[email protected]> Date: Wed Apr 17 15:07:02 2019 -0500 Mutate S3AFileStatus instead of creating new instance commit 13fab97 Author: Ben Roling <[email protected]> Date: Wed Apr 17 14:30:55 2019 -0500 Rename S3LocatedFileStatus to S3ALocatedFileStatus commit bee4e52 Author: Ben Roling <[email protected]> Date: Wed Apr 17 14:25:17 2019 -0500 Stop pretending to support group and permission attributes on S3AFileStatus commit 807e13b Author: Ben Roling <[email protected]> Date: Wed Apr 17 14:20:14 2019 -0500 Add serialVersionUID to S3LocatedFileStatus commit 9974cec Author: Ben Roling <[email protected]> Date: Mon Apr 8 13:34:38 2019 -0500 Fix missed group or owner tweak commit 708c001 Author: Ben Roling <[email protected]> Date: Mon Apr 8 12:58:14 2019 -0500 Fix S3AFileStatus group handling ITestS3AConfiguration.testUsernameFromUGI was failing, expecting the user to be copied into the group. Strict copying of user into group causes TestLocalMetadataStore.testPutNew() to fail since it expects the group to be preserved from the original FileStatus. This change copies user into group when group is null/empty. With this change, all existing tests pass. commit 5239a9f Author: Ben Roling <[email protected]> Date: Thu Apr 4 16:38:31 2019 -0500 Skip tests that require versionId when bucket doesn't have versioning enabled commit 4c6331e Author: Ben Roling <[email protected]> Date: Mon Apr 1 13:58:24 2019 -0500 Fix broken TestObjectChangeDetectionAttributes commit 8a19c42 Author: Ben Roling <[email protected]> Date: Mon Apr 1 10:08:33 2019 -0500 Squashed commit of the following: commit 9f4ad88 Author: Ben Roling <[email protected]> Date: Mon Apr 1 09:29:35 2019 -0500 Add test for 412 response commit dc0a3fb Author: Ben Roling <[email protected]> Date: Thu Mar 28 16:53:46 2019 -0500 Update tests that started failing due to HADOOP-15999 commit 5e1f3e3 Author: Ben Roling <[email protected]> Date: Thu Mar 28 15:49:26 2019 -0500 Speed up ITestS3ARemoteFileChanged commit 1b6be40 Author: Ben Roling <[email protected]> Date: Thu Mar 28 14:23:53 2019 -0500 Skip invalid test when object versioning enabled commit 8597d2e Merge: 2d235f8 b5db238 Author: Ben Roling <[email protected]> Date: Thu Mar 28 11:54:50 2019 -0500 Merge remote-tracking branch 'apache/trunk' into HADOOP-16085 commit 2d235f8 Author: Ben Roling <[email protected]> Date: Thu Mar 28 11:51:26 2019 -0500 Fix typo commit dc83cef Author: Ben Roling <[email protected]> Date: Thu Mar 28 10:28:09 2019 -0500 Generalize TestObjectETag to cover versionId and test overwrite commit 0d71f32 Author: Ben Roling <[email protected]> Date: Thu Mar 28 08:45:42 2019 -0500 Fix trailing whitespace commit 324be6d Author: Ben Roling <[email protected]> Date: Wed Mar 27 22:00:57 2019 -0500 S3GuardTool updates to correct ETag or versionId metadata commit 2a2bba7 Author: Ben Roling <[email protected]> Date: Wed Mar 27 21:27:27 2019 -0500 Clarify log message commit 6e62a3a Author: Ben Roling <[email protected]> Date: Wed Mar 27 21:17:48 2019 -0500 Documentation updates per PR feedback commit 1ff8bef Author: Ben Roling <[email protected]> Date: Wed Mar 27 16:05:59 2019 -0500 check version.required on CopyResult commit e296275 Author: Ben Roling <[email protected]> Date: Wed Mar 27 16:04:50 2019 -0500 Minor javadoc improvements from PR review commit 3e9ea19 Author: Ben Roling <[email protected]> Date: Wed Mar 27 13:15:58 2019 -0500 Skip tests that aren't applicable with change.detection.source=versionId commit ddbf68b Author: Ben Roling <[email protected]> Date: Wed Mar 27 11:56:38 2019 -0500 Add tests of case where no version metadata is present commit 21d37dd Author: Ben Roling <[email protected]> Date: Wed Mar 27 09:25:46 2019 -0500 Fix compiler deprecation warning commit b8e1569 Author: Ben Roling <[email protected]> Date: Wed Mar 27 09:19:46 2019 -0500 Fix license issue commit 33bb5f9 Author: Ben Roling <[email protected]> Date: Wed Mar 27 09:19:32 2019 -0500 Fix findbugs issue commit 5b7fadb Author: Ben Roling <[email protected]> Date: Wed Mar 27 09:00:39 2019 -0500 Fix checkstyle issues commit 6110a11 Author: Ben Roling <[email protected]> Date: Wed Mar 27 08:28:37 2019 -0500 Remove trailing whitespace commit d82069b Author: Ben Roling <[email protected]> Date: Tue Mar 26 16:05:01 2019 -0500 Improve S3Guard doc commit ca2f0e9 Author: Ben Roling <[email protected]> Date: Tue Mar 26 14:29:03 2019 -0500 Fix ITestS3ARemoteFileChanged commit 1e4fa85 Author: Ben Roling <[email protected]> Date: Tue Mar 26 11:37:48 2019 -0500 Increase local metastore cache timeout commit 34b0c80 Author: Ben Roling <[email protected]> Date: Tue Mar 26 11:35:34 2019 -0500 Fix isEmptyDir inconsistency commit bbf8365 Author: Ben Roling <[email protected]> Date: Mon Mar 25 16:55:24 2019 -0500 TestPathMetadataDynamoDBTranslation tests null etag, versonId commit 2ae7d16 Author: Ben Roling <[email protected]> Date: Mon Mar 25 16:54:49 2019 -0500 Add constants in TestDirListingMetadata commit 068a55d Author: Ben Roling <[email protected]> Date: Mon Mar 25 15:43:45 2019 -0500 Add copy exception handling commit 0eca6f3 Author: Ben Roling <[email protected]> Date: Mon Mar 25 12:43:51 2019 -0500 Don't process response from copy commit ad9e152 Author: Ben Roling <[email protected]> Date: Mon Feb 25 16:41:54 2019 -0600 HADOOP-16085-003.patch Rebase of previous work after merge of HADOOP-15625.
Includes retries for regular reads, select(), and rename()
+add stevel review (primarily of tests) Change-Id: I75a3b70917eefc0a0ec3190ca1de527e2081551e
Contributor
|
The changes here looked good to me and I pulled this into #794 as mentioned here: |
Contributor
Author
|
thanks, I'll move onto that again with my ongoing work. |
shanthoosh
added a commit
to shanthoosh/hadoop
that referenced
this pull request
Oct 15, 2019
Samza users may need to increase the partition count of the input streams of their stateful samza jobs. For example, Kafka needs to limit the maximum size of each partition to scale up its performance. Thus the number of partitions of a Kafka topic needs to be expanded to reduce the partition size if the average byte-in-rate or retention time of the Kafka topic has doubled. In order to perform a join between streams, stateful jobs generally have to route the partitions from the different input streams to same task of a container. However, when a input stream repartitioning happens, key space of a partition gets redistributed. This will make the stateful jobs to produce erroneous results. So if the partition count of input stream is increased then the users have to manually purge the changelog topics, local RocksDb state of their stateful jobs. This results in an increased operational complexity and data loss. This patch takes a first stab at solving the above problem and is comprised of the following changes: * Introduce a new group method in `SystemStreamPartitionGrouper` interface to generate task assignment factoring in the partition expansion of input streams. * Introduced a `StreamPartitionMapper` abstraction to allow the user to plugin the input stream partitioning function. * Fixed the existing unit tests and added new unit tests to validate the new grouper changes. In a followup PR shortly, these grouper changes would be integrated with `JobModelManager`(Waiting for PR 790 to be landed for this. It had made significant changes to `JobModelManager`) Author: Shanthoosh Venkataraman <[email protected]> Reviewers: Prateek M<[email protected]>, Ray Matharu<[email protected]>, Daniel Nishimura<[email protected]> Closes apache#803 from shanthoosh/SEP-5
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is #794 with my edits added.