Skip to content

Producer, consumer design for having listing and renaming in parallel.#52

Merged
saxenapranav merged 75 commits intoABFSDriver:ABFS_3.3.2_devfrom
saxenapranav:ABFS_3.3.2_dev_rename_improvements
Jun 23, 2023
Merged

Producer, consumer design for having listing and renaming in parallel.#52
saxenapranav merged 75 commits intoABFSDriver:ABFS_3.3.2_devfrom
saxenapranav:ABFS_3.3.2_dev_rename_improvements

Conversation

@saxenapranav
Copy link
Collaborator

@saxenapranav saxenapranav commented May 15, 2023

Details:

  1. To have a way where in list and rename happens in parallel. List API can give max 5000 results which need the client to call server again with a continuation token. This will need number of sequential call to server and then rename starts. Its a not an optimized use of resources. Will increase latency of rename for dev code, chances of OOM happening(keeping info of all blobs in memory before renaming). In this PR, the rename will start as soon as the client has got some blobs from the server. The producer will call the list API and populate a queue which will be dequeued by the consumer. The producer will pause producing until the consumer-lag comes in control. This design of producer-consumer will be used in the deleteDir API changes as well.
  2. Lease control on files being renamed in atomic directory. The source dir will be leased until the whole rename is completed. The blobs in directory are leased when their copy has to start. The reasoning behing this;
/*
               * Conditionally get a lease on the source blob to prevent other writers
               * from changing it. This is used for correctness in HBase when log files
               * are renamed. It generally should do no harm other than take a little
               * more time for other rename scenarios. When the HBase master renames a
               * log file folder, the lease locks out other writers.  This
               * prevents a region server that the master thinks is dead, but is still
               * alive, from committing additional updates.  This is different than
               * when HBase runs on HDFS, where the region server recovers the lease
               * on a log file, to gain exclusive access to it, before it splits it.
               */

This helps in preventing parallel process in appending a blob which is copy in-progress. Or parallel rename on the same sourceDir.
3. RenamePendingJSON will not have the fileList now. When we need to resume, we will list again(it will return only the ones which are present in the srcDir) and rename.


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 2
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:356 » TestTimedOut test timed o...
[ERROR] ITestAzureBlobFileSystemOauth.testBlobDataContributor:80 » AccessDenied Operat...
[ERROR] ITestAzureBlobFileSystemOauth.testBlobDataReader:137 » AccessDenied Operation ...
[INFO]
[ERROR] Tests run: 726, Failures: 0, Errors: 3, Skipped: 224
[INFO] Results:
[INFO]
[WARNING] Tests run: 273, Failures: 0, Errors: 0, Skipped: 53

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 3
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsListStatusRemoteIterator.testWithAbfsIteratorDisabled:168 [After removing every iterm found from the iterator, there should be no more elements in the fileNames] expected:<[0]> but was:<[1]>[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:356 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 726, Failures: 1, Errors: 1, Skipped: 180
[INFO] Results:
[INFO]
[WARNING] Tests run: 273, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 3
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStatistics.testCreateStatistics:108->AbstractAbfsIntegrationTest.assertAbfsStatistics:505->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Mismatch in directories_created expected:<2> but was:<1>
[ERROR] ITestAzureBlobFileSystemCheckAccess.testCheckAccessForAccountWithoutNS:191 Expecting org.apache.hadoop.security.AccessControlException with text "This request is not authorized to perform this operation using this permission.", 403 but got : "void"
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:356 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 726, Failures: 2, Errors: 1, Skipped: 278
[INFO] Results:
[INFO]
[WARNING] Tests run: 273, Failures: 0, Errors: 0, Skipped: 45

NonHNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 3
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStatistics.testCreateStatistics:108->AbstractAbfsIntegrationTest.assertAbfsStatistics:505->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Mismatch in directories_created expected:<2> but was:<1>
[ERROR] ITestAzureBlobFileSystemCheckAccess.testCheckAccessForAccountWithoutNS:191 Expecting org.apache.hadoop.security.AccessControlException with text "This request is not authorized to perform this operation using this permission.", 403 but got : "void"
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemFlush.testTracingHeaderForAppendBlob:319 » IO AppendBl...
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:356 » TestTimedOut test timed o...
[ERROR] ITestAzureBlobFileSystemOauth.testBlobDataContributor:80 » AbfsRestOperation O...
[ERROR] ITestAzureBlobFileSystemOauth.testBlobDataReader:137 » AccessDenied Operation ...
[INFO]
[ERROR] Tests run: 726, Failures: 2, Errors: 4, Skipped: 284
[INFO] Results:
[INFO]
[WARNING] Tests run: 273, Failures: 0, Errors: 0, Skipped: 57

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 2
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsListStatusRemoteIterator.testAbfsIteratorWithHasNext:88 [After removing every iterm found from the iterator, there should be no more elements in the fileNames] expected:<[0]> but was:<[1]>
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:356 » TestTimedOut test timed o...
[ERROR] ITestAzureBlobFileSystemOauth.testBlobDataContributor:80 » AccessDenied Operat...
[ERROR] ITestAzureBlobFileSystemOauth.testBlobDataReader:137 » AccessDenied Operation ...
[INFO]
[ERROR] Tests run: 726, Failures: 1, Errors: 3, Skipped: 224
[INFO] Results:
[INFO]
[WARNING] Tests run: 273, Failures: 0, Errors: 0, Skipped: 53

Time taken: 46 mins 23 secs.

…d after all child are renamed -> tests needs small change
…ileSystem, need to check if current file given is not dir or present. mkdirs to be ignored if coming from createNonRecursive
… of copy) when rename on an atomicDir going on.
@saxenapranav saxenapranav marked this pull request as ready for review May 16, 2023 08:47
@saxenapranav saxenapranav requested review from anmolanmol1234, snvijaya and sreeb-msft and removed request for anmolanmol1234 and sreeb-msft May 16, 2023 08:47
@saxenapranav
Copy link
Collaborator Author


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting:181->fuzzyValidate:64 The actual value 13 is not within the expected range: [5.60, 8.40].
[INFO]
[ERROR] Tests run: 113, Failures: 1, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:371 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 759, Failures: 0, Errors: 1, Skipped: 170
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 44

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAzureBlobFileSystemRandomRead.testValidateSeekBounds:275->Assert.assertTrue:42->Assert.fail:89 There should not be any network I/O (elapsedTimeMs=106).
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:383 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 759, Failures: 1, Errors: 1, Skipped: 170
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 44

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStatistics.testCreateStatistics:108->AbstractAbfsIntegrationTest.assertAbfsStatistics:510->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Mismatch in directories_created expected:<2> but was:<1>
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemExplictImplicitRename.testRenameImplicitDirectoryToNonExistentDstWithParentIsFile:345->explicitImplicitDirectoryRenameTest:758->createSourcePaths:835->AbstractAbfsIntegrationTest.createAzCopyDirectory:524 » IO
[INFO]
[ERROR] Tests run: 759, Failures: 1, Errors: 1, Skipped: 275
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 45

NonHNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStatistics.testCreateStatistics:108->AbstractAbfsIntegrationTest.assertAbfsStatistics:510->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Mismatch in directories_created expected:<2> but was:<1>
[INFO]
[ERROR] Tests run: 759, Failures: 1, Errors: 0, Skipped: 275
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 45

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 113, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemDelegationSASForBlobEndpoint.setup:71->AbstractAbfsIntegrationTest.setup:178->AbstractAbfsIntegrationTest.createFileSystem:309 » TokenAccessProvider
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:371 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 759, Failures: 0, Errors: 2, Skipped: 169
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 44

Time taken: 49 mins 0 secs.

@saxenapranav
Copy link
Collaborator Author

What has changed in the PR from previous review:

  1. All directory related code has been clubbed together.
  2. In case we dont get any heirarchy, we just start the rename process now. We are not checking if it exists / isDirectory / isFile. For copy/delete, doesn't matter if its marker/non-marker blob. if path doesn't exist, server will return exception.
  3. In directory rename, if any blob rename fails which lead to dir rename failure, we will release lease on the srcDirectory marker blob.

Copy link
Collaborator

@snvijaya snvijaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post the planned review of test plan with team, proceed with checkin.

@saxenapranav
Copy link
Collaborator Author


:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 140, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:397->lambda$testAcquireRetry$6:403 » TestTimedOut
[INFO]
[ERROR] Tests run: 760, Failures: 0, Errors: 1, Skipped: 171
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 44

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 140, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:366 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 760, Failures: 0, Errors: 1, Skipped: 171
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 44

NonHNS-SharedKey

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting:181->fuzzyValidate:64 The actual value 14 is not within the expected range: [5.60, 8.40].
[INFO]
[ERROR] Tests run: 140, Failures: 1, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStatistics.testCreateStatistics:108->AbstractAbfsIntegrationTest.assertAbfsStatistics:510->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Mismatch in directories_created expected:<2> but was:<1>
[INFO]
[ERROR] Tests run: 760, Failures: 1, Errors: 0, Skipped: 275
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 45

NonHNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 140, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStatistics.testCreateStatistics:108->AbstractAbfsIntegrationTest.assertAbfsStatistics:510->Assert.assertEquals:647->Assert.failNotEquals:835->Assert.fail:89 Mismatch in directories_created expected:<2> but was:<1>
[ERROR] ITestAzureBlobFileSystemRandomRead.testValidateSeekBounds:276->Assert.assertTrue:42->Assert.fail:89 There should not be any network I/O (elapsedTimeMs=85).
[INFO]
[ERROR] Tests run: 760, Failures: 2, Errors: 0, Skipped: 275
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 45

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 140, Failures: 0, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAzureBlobFileSystemRandomRead.testValidateSeekBounds:276->Assert.assertTrue:42->Assert.fail:89 There should not be any network I/O (elapsedTimeMs=97).
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testAcquireRetry:366 » TestTimedOut test timed o...
[INFO]
[ERROR] Tests run: 760, Failures: 1, Errors: 1, Skipped: 171
[INFO] Results:
[INFO]
[WARNING] Tests run: 278, Failures: 0, Errors: 0, Skipped: 44

@saxenapranav saxenapranav merged commit fbfe195 into ABFSDriver:ABFS_3.3.2_dev Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants