Skip to content

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Jul 17, 2020

What changes were proposed in this pull request?

A proper server-side trash cleanup solution (HDDS-3915) might not land any time soon.

This jira aims to completely disable "move to trash" when a client is deleting files and {{fs.trash.interval > 0}} by intercepting the deprecated {{fs.rename(src, dst, options)}} call used by {{TrashPolicyDefault#moveToTrash}}.

This should be reverted when trash cleanup is implemented.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3982

How was this patch tested?

  1. Added new tests.

  2. Also tested manually:

As usual, compile -> launch docker-compose cluster -> exec bash on om

$ mvn clean install -Pdist -DskipTests -e -Dmaven.javadoc.skip=true -DskipShade
$ cd hadoop-ozone/dist/target/ozone-*-SNAPSHOT/compose/ozone
$ docker-compose up -d
$ docker-compose exec om /bin/bash

Add fs.trash.interval to client config:

$ vi /etc/hadoop/core-site.xml
# Add a line in <configuration> tag and save:
<property><name>fs.trash.interval</name><value>1</value></property>

o3fs

$ ozone sh volume create volume1
$ ozone sh bucket create /volume1/bucket2
$ ozone fs -mkdir -p o3fs://bucket2.volume1.om/dir3
$ ozone fs -touch o3fs://bucket2.volume1.om/dir3/key5
$ ozone fs -rm o3fs://bucket2.volume1.om/dir3/key5
2020-07-17 19:22:29,778 [main] INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2020-07-17 19:22:29,867 [main] INFO ozone.BasicOzoneFileSystem: Move to trash is disabled for o3fs, deleting instead: o3fs://bucket2.volume1.om/dir3/key5
2020-07-17 19:22:29,881 [main] INFO fs.TrashPolicyDefault: Moved: 'o3fs://bucket2.volume1.om/dir3/key5' to trash at: /.Trash/hadoop/Current/dir3/key5

Verification: key is deleted, but trash directory is created:

$ ozone fs -ls -R o3fs://bucket2.volume1.om/.Trash/hadoop/Current/
drwxrwxrwx   - hadoop hadoop          0 2020-07-17 19:22 o3fs://bucket2.volume1.om/.Trash/hadoop/Current/dir3

OFS

$ ozone fs -mkdir -p ofs://om/volume1/bucket2/dir3
$ ozone fs -touch ofs://om/volume1/bucket2/dir3/key5
$ ozone fs -rm ofs://om/volume1/bucket2/dir3/key5
2020-07-17 19:26:59,193 [main] INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2020-07-17 19:26:59,306 [main] INFO ozone.BasicRootedOzoneFileSystem: Move to trash is disabled for ofs, deleting instead: ofs://om/volume1/bucket2/dir3/key5
2020-07-17 19:26:59,334 [main] INFO fs.TrashPolicyDefault: Moved: 'ofs://om/volume1/bucket2/dir3/key5' to trash at: ofs://om/volume1/bucket2/.Trash/hadoop/Current/volume1/bucket2/dir3/key5

Verification: key is deleted, but trash directory is created:

$ ozone fs -ls -R ofs://om/volume1/bucket2/.Trash/hadoop/Current/
drwxrwxrwx   - hadoop hadoop          0 2020-07-17 19:26 ofs://om/volume1/bucket2/.Trash/hadoop/Current/volume1
drwxrwxrwx   - hadoop hadoop          0 2020-07-17 19:26 ofs://om/volume1/bucket2/.Trash/hadoop/Current/volume1/bucket2
drwxrwxrwx   - hadoop hadoop          0 2020-07-17 19:26 ofs://om/volume1/bucket2/.Trash/hadoop/Current/volume1/bucket2/dir3

Known Issues

  • TrashPolicyDefault already creates the empty directories in trash before calling fs.rename(src, dst, options).
    • These are empty dirs so they shouldn't be a big problem.
    • We could do something about fs.mkdirs as well to solve this but I'm not sure if we should go down this path.
  • TrashPolicyDefault still prints "Moved ... to trash" message, as shown above, which may be confusing to users.

BasicRootedOzoneFileSystem#rename(src, dst, options)
@smengcl smengcl added the client label Jul 17, 2020
@smengcl smengcl self-assigned this Jul 17, 2020
@arp7 arp7 requested a review from umamaheswararao July 17, 2020 19:24
@umamaheswararao
Copy link
Contributor

umamaheswararao commented Jul 18, 2020

@smengcl Thank you for working on it.
This is a good idea.

  • But the below log message from could confuse people?
 fs.rename(path, trashPath,
            Rename.TO_TRASH);
        LOG.info("Moved: '" + path + "' to trash at: " + trashPath);

I don't see a way to avoid though. :-(
Probably we will say: A generic statement in previous log "We will not retain any data in trash. This may just reduce confusion that, they will think data moved but deleted immediately. :-)

Probably your logs will looks like:
INFO ozone.BasicOzoneFileSystem: Move to trash is disabled for o3fs, deleting instead: o3fs://bucket2.volume1.om/dir3/key5.
Files/dirs will not be retained in Trash.
INFO fs.TrashPolicyDefault: Moved: 'o3fs://bucket2.volume1.om/dir3/key5' to trash at: /.Trash/hadoop/Current/dir3/key5
I am not this is confusing more. Think about some generic message to convey that next message is just fake.

  • I think we need to add test case to make sure no files moving under trash folder.

One another thought could be that: ( I am not proposing it to do it now, just for discussion):
Currently in Hadoop side, the trash policy config is common for all kinds of fs.

 Class<? extends TrashPolicy> trashClass = conf.getClass(
        "fs.trash.classname", TrashPolicyDefault.class, TrashPolicy.class);
    TrashPolicy trash = ReflectionUtils.newInstance(trashClass, conf);
    trash.initialize(conf, fs); // initialize TrashPolicy
    return trash

If user wants different policies based on their FS behaviors, that's not possible. Probably if we have confg like

fs.<scheme>.trash.classname

and by default we can use TrashPolicyDefault.class. if user wants, they can configure per fs specific policy.
So, If config was defined this way, our life would have been easier now. In our case we could have configured our own policy and simply delete the files in it.
Anyway what we are doing is temporary until we have proper cleanup. So, this option will not help as we need changes in Hadoop side.

@smengcl
Copy link
Contributor Author

smengcl commented Jul 18, 2020

@umamaheswararao Thanks for the review.

One question about the fs.trash.classname though: I actually thought about overriding it. But once this is configured to a custom class on the client it seems it will use that Trash Policy class for ALL HCFS? In this case we need to implement different moveToTrash behavior for o3fs/OFS and other HCFSes which the client will access.

@umamaheswararao
Copy link
Contributor

umamaheswararao commented Jul 18, 2020

@smengcl yes, fs.trash.classname configuration is for all HCFS.
If we wanted to have different impls for HCFSs, we may need to introduce new advanced config, that should be like
'fs.SCHEME.trash.classname'.

The impl can be something like below:

public static TrashPolicy getInstance(Configuration conf, FileSystem fs) {
  Class<? extends TrashPolicy> defaultTrashClass = conf.getClass(
      "fs.trash.classname", TrashPolicyDefault.class, TrashPolicy.class);
  Class<? extends TrashPolicy>  trashClass = conf.getClass(String.format(
      "fs.%s.trash.classname",
      fs.getScheme()), defaultTrashClass, TrashPolicy.class);
  TrashPolicy trash = ReflectionUtils.newInstance(trashClass, conf);
  trash.initialize(conf, fs); // initialize TrashPolicy
  return trash;
}
@Test
  public void testPluggableFileSystemSpecificTrash() throws IOException {
    Configuration conf = new Configuration();
    // Test plugged TrashPolicy
    conf.setClass("fs.file.trash.classname", TestTrashPolicy.class, TrashPolicy.class);
    Trash trash = new Trash(conf);
    assertTrue(trash.getTrashPolicy().getClass().equals(TestTrashPolicy.class));
  }

So, by no change in behavior. If some one wants to override the behaviors for a specific fs, provide your TrashClass name in config. Ex: if you want to provide a trashPolicy for ofs separately, then configure like:
fs.ofs.trash.classname = OfsTrashPolicyDefault.class.getName();
Otherwise same TrashPolicyDefault.class will be used.

smengcl added 4 commits July 20, 2020 11:03
Change-Id: I594717fd5895efb59ff0f9646fcee328122747c6
Change-Id: I347583f47a4f4f9a3558243f9fea72403631c653
Change-Id: Ie5a4a7f622561aaae0baf1e045508041cf616d26
Change-Id: I9e37b8a1d6e18f236cc3e861cdb9ae33ca84ca43
@smengcl
Copy link
Contributor Author

smengcl commented Jul 20, 2020

Tests added.

// intercept when TO_TRASH is found
LOG.info("Move to trash is disabled for ofs, deleting instead: {}. "
+ "Files or directories will NOT be retained in trash. "
+ "Ignore the following TrashPolicyDefault message, if any.", src);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing: Currently we are straight a way deleting files. Since users coming from Hadoop shell, there could be assumptions that files may move to trash if no -skipTrash flag. But we will be deleting instead of trashing them.
User may realize after looking at this log message, but by this time already things happened.
There should be a way to shout out this behavior to users. One way is documenting and any other better way?

Change-Id: I29fcc402b03b8abc2fc0325c9daf568b6e6c2329
@umamaheswararao
Copy link
Contributor

Changes looks good to me. +1
Pending CI-Checks.

@umamaheswararao umamaheswararao self-requested a review July 21, 2020 00:01
@smengcl smengcl merged commit fbd125c into apache:master Jul 21, 2020
@smengcl
Copy link
Contributor Author

smengcl commented Jul 21, 2020

Merged to master. Thanks @umamaheswararao for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants