Skip to content

Conversation

@neils-dev
Copy link
Contributor

What changes were proposed in this pull request?

To have FsShell rm (eg. bin/ozone fs -rm o3fs://bucket.volume/key) commands delete items from the filesystem Trash without need of the -skipTrash flag, changes are made to the ozone filesystem specific TrashPolicy. Implemented through overriding the TrashPolicy moveTrash method for TrashPolicyOzone.

Here if the item to be deleted with the FsShell rm command is located in the key root trash directory, it is deleted from the filesystem. Before this patch, items removed through the rm command were moved to the root Trash directory when trash was enabled. Patch enacts similar behavior as that of FsShell rm command with the -skipTrash flag specified. Behavior with the -skipTrash flag enabled is unaffected.

Necessary for this patch is configuring the hadoop filesystem to use the ozone implementation specific TrashPolicyOzone though the 'core-site.xml' as in :
<property> <name>fs.trash.classname</name> <value>org.apache.hadoop.ozone.om.TrashPolicyOzone</value> </property>

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4043

How was this patch tested?

Patch was tested through both integration tests and manual tests.

Integration test implemented in ozone shell integration test : TestOzoneShellHA.java. Tested through hadoop-ozone/integration_test$ mvn -Dtest=TestOzoneShellHA#testDeleteTrashNoSkipTrash test


Test set: org.apache.hadoop.ozone.shell.TestOzoneShellHA

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.099 s - in org.apache.hadoop.ozone.shell.TestOzoneShellHA

Manual test through docker cluster deployment:

before patch ->

  1. configure core-site.xml to use TrashPolicyDefault <property> <name>fs.trash.classname</name> <value>org.apache.hadoop.ozone.om.TrashPolicyDefault</value> </property> OR leave unchanged (default behavior)
    enable Trash
    <property> <name>fs.trash.interval</name> <value>60</value> </property>
  2. run shell commands to create volume, bucket and key (eg. /vol1/bucket1/key1)
  3. run shell command to delete key (eg. bin/ozone fs -rm -R o3fs://bucket1.vol1/key1)
  4. run shell command to delete trash (eg. bin/ozone fs -rm -R o3fs://bucket1.vol1/.Trash)
    Trash directory remains.
    deleteTrashProblem

patch ->

  1. configure core-site.xml to use TrashPolicyOzone <property> <name>fs.trash.classname</name> <value>org.apache.hadoop.ozone.om.TrashPolicyOzone</value> </property>
    enable Trash
    <property> <name>fs.trash.interval</name> <value>60</value> </property>
  2. run shell commands to create volume, bucket and key (eg. /vol1/bucket1/key1)
  3. run shell command to delete key (eg. bin/ozone fs -rm -R o3fs://bucket1.vol1/key1)
  4. run shell command to delete trash (eg. bin/ozone fs -rm -R o3fs://bucket1.vol1/.Trash)
    Trash directory is deleted.

deleteTrashFix

…tems WITHOUT need of -skipTrash flag. Ozone specific trash policy affected. Integration test, testDeleteTrashNoSkipTrash, created to test and validate ozone trash policy changes for FsShell rm command deleting Trash items without -skipTrash flag.
… Trash items WITHOUT need of -skipTrash flag. Ozone specific trash policy affected. Integration test, testDeleteTrashNoSkipTrash, created to test and validate ozone trash policy changes for FsShell rm command deleting Trash items without -skipTrash flag."

This reverts commit ebfa5ea.
…tems WITHOUT need of -skipTrash flag. Ozone specific trash policy affected. Integration test, testDeleteTrashNoSkipTrash, created to test and validate ozone trash policy changes for FsShell rm command deleting Trash items without -skipTrash flag.
Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neils-dev for the good work. I have left a few minor comments . Apart from that patch looks good to me.

public boolean moveToTrash(Path path) throws IOException {
this.fs.getFileStatus(path);
Path trashRoot = this.fs.getTrashRoot(path);
Path trashCurrent = new Path(trashRoot, CURRENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove trashCurrent variable as it is unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused variable.

throw new InvalidPathException("Invalid path Name " + key);
}

if (trashRootKey.startsWith(key) || key.startsWith(trashRootKey)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only key.startsWith(trashRootKey) check would suffice. Do we need the OR condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking to clarify. Added comments in code to explain. In process also improved conditional to satisfy an edge condition where key = .Tra (and the trash prefix is .Trash that would cause an unintentional action, by deleting and not moving to trash).

    // first condition tests when length key is <= length trash
    // and second when length key > length trash
    if ((key.contains(this.fs.TRASH_PREFIX)) && (trashRootKey.startsWith(key))
            || key.startsWith(trashRootKey)) {

Copy link
Contributor

@sadanand48 sadanand48 Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explaining it in the comment . Although , when length of key < length of trashroot , the key would not be inside .Trash directory.
Suppose key = vol/buck/.Trash/user/randomKey ,the trashroot here would be vol/buck/.Trash
So length of key is always greater than or equal to the length of trashroot and first condition would not be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadanand48, yes for what you described that is true. The trashroot is derived from the bucket under the key, trashroot=/vol/bucket/.Trash/username. One case where length key < length trashroot can be when key = /vol/bucket/.Trash; here trashroot is /vol/bucket/.Trash/username.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying this @neils-dev . I forgot to consider the username in trashroot. It makes sense now . Thanks

…olicyOzone.java; added additonal log messages to integration test TestOzoneShellHA; added small modification to conditional statement in TrashPolicyOzone to clarify statement and satisfy edge conditions for trash removal.
@neils-dev
Copy link
Contributor Author

Thanks @sadanand48 for reviewing the PR and the great comments! Latest commit with changes for comments.

// Test delete from Trash directory removes item from filesystem

// setup configuration to use TrashPolicyOzone
// (default is TrashPolicyDefault)
Copy link
Contributor

@sadanand48 sadanand48 Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neils-dev Please use getClientConfForOFS(hostPrefix,conf) for initialising clientConf here. The integration test is failing because of this

 // Test delete from Trash directory removes item from filesystem

    // setup configuration to use TrashPolicyOzone
    // (default is TrashPolicyDefault)
    final String hostPrefix = OZONE_OFS_URI_SCHEME + "://" + omServiceId;
    OzoneConfiguration clientConf = getClientConfForOFS(hostPrefix,conf);
    clientConf.setInt(FS_TRASH_INTERVAL_KEY, 60);
    clientConf.setClass("fs.trash.classname", TrashPolicyOzone.class,
                        TrashPolicy.class);
    OzoneFsShell shell = new OzoneFsShell(clientConf);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @sadanand48. Commit to set configuration in function setting TrashPolicyOzone after getClientConfForOFS as suggested.

Copy link
Contributor

@sadanand48 sadanand48 Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neils-dev for making the change . Sorry , I wasn't clear . I meant to use the same URI as that of testDeleteToTrashOrSkipTrash i.e to include serviceId as part of the URI. Since this is HA setup ,its required to pass the serviceID. Please use the same URI as that of testDeleteToTrashOrSkipTrash i.e

final String hostPrefix = OZONE_OFS_URI_SCHEME + "://" + omServiceId;
OzoneConfiguration clientConf = getClientConfForOFS(hostPrefix,conf);

instead of
OzoneConfiguration clientConf = getClientConfForOzoneTrashPolicy("ofs://localhost", conf);

I tried running the individual test in IDE and it's failing. The test only passes when the entire test file is run which is why the integration test passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment @sadanand48 . Commit a change for the service id as suggested. Thanks for following up. BTW, the host set to localhost, as was previously used for the clientconf should not be a problem with the integration test. It was tested both individually hadoop-ozone/integration-test$ mvn -Dtest=TestOzoneShellHA#testDeleteTrashNoSkipTrash test and when the entire suite is run, hadoop-ozone/integration-test$ mvn -Dtest=TestOzoneShellHA test. See attached image.

hadoop-ozone/integration-test$ mvn -Dtest=TestOzoneShellHA#testDeleteTrashNoSkipTrash test
skipTrash

… conf with TrashPolicyOzone - fixes for CI integration test run.
…nging conf to use test suite serviceid omserviceId.
Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neils-dev for addressing comments. +1 . LGTM

@elek
Copy link
Member

elek commented May 4, 2021

Thanks the patch @neils-dev. It looks good to me, but I tried to run the Unit test locally before the commit.

When I try to debug it the unit test is frequently failing:

java.lang.AssertionError: 
Expected :0
Actual   :1
<Click to see difference>


	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at org.apache.hadoop.ozone.shell.TestOzoneShellHA.testDeleteTrashNoSkipTrash(TestOzoneShellHA.java:635)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
	at java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.lang.Thread.run(Thread.java:748)

So I tried to run the unit with IntelliJ "repeat until fail" functionality, but it is always failed at the second time.

Do you have the same behavior? Can you run it multiple times without any error?

I would like to be sure that it's not an intermittent test.

@neils-dev
Copy link
Contributor Author

neils-dev commented May 7, 2021

Thanks for looking into potential problems with integration test @elek. I can confirm the error you see on your local test environment, however the messages I see associated with running the test 'repeat until fail' are timeout errors - 2021-05-07 15:51:56,695 [Time-limited test] ERROR ha.OMFailoverProxyProvider (OMFailoverProxyProvider.java:getRetryAction(294)) - Failed to connect to OMs: [nodeId=omNode-3,nodeAddress=127.0.0.1:13660, nodeId=omNode-1,nodeAddress=127.0.0.1:13648, nodeId=omNode-2,nodeAddress=127.0.0.1:13654]. Attempted 15 failovers. Got following exceptions during retries: ; followed by the assertion error you posted.

In addition I tried the same test setup with "repeat until failure" option with Intellij on the org.apache.hadoop.ozone.shell.TestOzoneShellHA.testDeleteToTrashOrSkipTrash with same error.

It does not appear to be intermittent, rather a problem with running the tests continuously. It may be from parts of the test initialized globally for org.apache.hadoop.ozone.shell.TestOzoneShellHA (@before, @after, @BeforeClass ...), hence are not reinitialized on each invocation of a specific test such as testDeleteToTrashOrSkipTrash. I will look further into this and update.

@neils-dev
Copy link
Contributor Author

The problem indeed involves running the TestOzoneShellHA trash tests repetitively continuously. The test cluster is initialized at the start of the suite of tests with the Om rpc server and Om rpc client in sync, however when the test is repeated as in this case, the rpc client becomes out of sync and uses the wrong port to access the Om rpc server - occurs with shell rpc submitRequest calls. Problem is identified. Will investigate further.

neils-dev added 2 commits May 24, 2021 21:41
…NoSkipTrash correcting shutdown and initializing error with OmTransport RPC for shell. FileSystem needs to be closed on shutdown of testcases to properly close OmTransport between shell and om (close() method for instantitated fs for cleanup).
@neils-dev
Copy link
Contributor Author

Recent commit contains fix for problem running trash service integration tests consecutively (repeat until failure). The OmTransport RPC connection between the shell and om was not closed at end of test; improper cleanup prior to start of next test. Fixed with calling the close() method of the filesystem class implementation.

@sadanand48
Copy link
Contributor

Thanks @neils-dev for identifying the issue. @elek Can you please take a look and merge the change?

@elek
Copy link
Member

elek commented May 27, 2021

Thanks for the review @sadanand48 and the patch @neils-dev. Merging it now.

@elek elek merged commit 8586815 into apache:master May 27, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Jun 1, 2021
…ing-upgrade-master-merge

* upstream/master: (76 commits)
  HDDS-5280. Make XceiverClientManager creation when necessary in ContainerOperationClient (apache#2289)
  HDDS-5272. Make ozonefs.robot execution repeatable (apache#2280)
  HDDS-5123. Use the pre-created apache/ozone-testkrb5 image during secure acceptance tests (apache#2165)
  HDDS-4993. Add guardrail for reserved buffer size when DN reads a chunk (apache#2058)
  HDDS-4936. Change ozone groupId from org.apache.hadoop to org.apache.ozone (apache#2018)
  HDDS-4043. allow deletion from Trash directory without -skipTrash option (apache#2110)
  HDDS-4927. Determine over and under utilized datanodes in Container Balancer. (apache#2230)
  HDDS-5273. Handle unsecure cluster convert to secure cluster for SCM. (apache#2281)
  HDDS-5158. Add documentation for SCM HA Security. (apache#2205)
  HDDS-5275. Datanode Report Publisher publishes one extra report after DN shutdown (apache#2283)
  HDDS-5241. SCM UI should have leader/follower and Primordial SCM information (apache#2260)
  HDDS-5219. Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated. (apache#2243)
  HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. (apache#2271)
  HDDS-5191. Increase default pvc storage size (apache#2219)
  HDDS-5073. Use ReplicationConfig on client side  (apache#2136)
  HDDS-5250. Build integration tests with Maven cache (apache#2269)
  HDDS-5236. Require block token for more operations (apache#2254)
  HDDS-5266 Misspelt words in S3MultipartUploadCommitPartRequest.java line 202 (apache#2279)
  HDDS-5249. Race Condition between Full and Incremental Container Reports (apache#2268)
  HDDS-5142. Make generic streaming client/service for container re-replication, data read, scm/om snapshot download (apache#2256)
  ...

Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
	hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
	hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java
	hadoop-ozone/dist/src/main/compose/testlib.sh
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
	hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
	hadoop-ozone/ozone-manager/pom.xml
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java
	hadoop-ozone/s3gateway/pom.xml
bharatviswa504 pushed a commit to bharatviswa504/hadoop-ozone that referenced this pull request Jul 25, 2021
…ion (apache#2110)

(cherry picked from commit 8586815)
Change-Id: I6ea9af9ab59b5384e991dd3c6e2245ba1bf245b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants