Skip to content

Conversation

@bharatviswa504
Copy link
Contributor

What changes were proposed in this pull request?

HDDS-3612 introduced bucket links.
After this feature now we don't need this parameter, any volume/bucket can be exposed to S3 via using bucket links.

ozone bucket link srcvol/srcbucket destvol/destbucket

So now to expose any ozone bucket to S3G

For example, the user wants to expose a bucket named bucket1 under volume1 to S3G, they can run below command

ozone bucket link volume1/bucket1 s3v/bucket2
Now, the user can access all the keys in volume/bucket1 using s3v/bucket2 and also ingest data to the volume/bucket1 using using s3v/bucket2

This Jira is opened to remove the config from ozone-default.xml
And also log a warning message to use bucket links, when it does not have default value s3v.

With this configuration it causes trouble to users, to figure which volume is used for their buckets whenever this configuration is changed.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4044

How was this patch tested?

Added UT

Copy link
Contributor

@avijayanhwx avijayanhwx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1.
Minor comment inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. space before 'configurations'.
'configurations' --> 'configuration'
'all S3Gateway use' --> 'all S3Gateway buckets use' ?

Copy link
Contributor Author

@bharatviswa504 bharatviswa504 Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons for this configuration is to use different volumeName across different S3G's. So if someone is Using the configuration with that intention, the logline was to say If different S3G's use different volumes, make sure user creates those volumes before performing those operations. (So, no mention of the bucket in the log)

@elek
Copy link
Member

elek commented Jul 29, 2020

Let's say I use Mapreduce and ofs. I create a /home volume for all the users which can hold the home buckets.

With configuring s3g to use /home instead of /s3v I can expose all the home buckets via s3g without creating a link for each of the home buckets.

I think it's reasonable to keep the configuration even if we have the bucket link functionality.

@arp7
Copy link
Contributor

arp7 commented Jul 29, 2020

I think it's reasonable to keep the configuration even if we have the bucket link functionality.

More flexibility usually comes with more complexity. :) The example Bharat highlighted seems valid - different S3 gateways using a different config key. I think in the future we can support volume level links which will allow pointing /s3g to /home. I am also in favor of at least making this config key undocumented to discourage its use.

@elek
Copy link
Member

elek commented Jul 30, 2020

More flexibility usually comes with more complexity. :)

In general, I can agree with this statement, but in this specific case: the additional complexity of one configuration key seems to be minimal.

The example Bharat highlighted seems valid - different S3 gateways using a different config key. I think in the future we can support volume level links which will allow pointing /s3g to /home. I am also in favor of at least making this config key undocumented to discourage its use.

"Different S3 gateways using a different config key" --> It's also a feature which can be used in some specific cases.

But I agree, flexibility vs complexity can be a balance. Sometimes It can be hard to choose between them without being opinionated.

Here, I feel I have two different views:

When it's part of an open source project: I would prefer to keep the project flexible (within boundaries !!!) but use smart defaults.

When it's part of distribution of a vendor: I see it more reasonable to restrict specific usage patterns. But these restrictions can be specific a usage (vendor/distribution) and others may use the open source project in a different way.

But again, it's more like a philosophy, hard to use technical arguments here.

If you really like to remove the documentation of this configuration (It means to remove the 3rd sentence from here: https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/docs/content/interface/S3.md, I guess), let's do that.

(I am +0 about changing only the docs: I don't like it, but I can accept it if it's your strong opinion)

In general, I prefer to explain the risk and teach the user, instead of hiding information.

@xiaoyuyao
Copy link
Contributor

xiaoyuyao commented Jul 31, 2020

+1 for this change. Keep exposing this can be problematic after OM create default s3v volume based on the same configuration upon start. If this got changed and then all the buckets in the old s3v volume will not be available until manually linked. We may revisit this when the volume level link is added.

@timmylicheng
Copy link
Contributor

How do we handle legacy volumes and buckets?
We used to have volumes mapping to s3 bucket names? So after this change, if a cluster want to support both new buckets and legacy buckets, how does it work?

@bharatviswa504
Copy link
Contributor Author

As mentioned in the PR description user can use bucket links.

For more information refer HDDS-3612.

This feature is totally not removed, making this hidden, as this config is added, at that time we don't have support for bucket link.

@timmylicheng
Copy link
Contributor

As mentioned in the PR description user can use bucket links.

For more information refer HDDS-3612.

This feature is totally not removed, making this hidden, as this config is added, at that time we don't have support for bucket link.

Get it. So not really deprecating it as the title says? @bharatviswa504

@bharatviswa504
Copy link
Contributor Author

As mentioned in the PR description user can use bucket links.
For more information refer HDDS-3612.
This feature is totally not removed, making this hidden, as this config is added, at that time we don't have support for bucket link.

Get it. So not really deprecating it as the title says? @bharatviswa504

yes, we want new users to not use this configuration. As the bucket link feature is available now. As this can use some additional complications, as mentioned in the PR description again :)

@elek
Copy link
Member

elek commented Aug 1, 2020

As the bucket link feature is available now.

As I wrote in my previous comment, bucket link is not a replacement of this configuration. This configuration can expose an other volume and all buckets.

With bucket links you should updated your links in case of the creation of any new bucket.

Get it. So not really deprecating it as the title says? @bharatviswa504

It is deprecating but not removing:

    if (!volumeName.equals(OZONE_S3_VOLUME_NAME_DEFAULT)) {
      LOG.warn(OZONE_S3_VOLUME_NAME + " is deprecated. Consider using bucket " +
          "link command to access volume/buckets created via ozone shell " +
          "through S3Gateway and also to access buckets created by S3Gateway " +
          "version less than or equal to 0.5.0");
    }

Which means that a warning will be printed out without alternatives (linking new bucket every day is not the same). As this feature is used by our early adopters (who started to use s3g with the old scheme) I think it's important to keep this functionality as is, without warning.

As I wrote, I can accept the consensus proposal from @arp7: Let's remove it from the s3g doc page but keep the same code. (Later we can create a separated page about powerful but risky configuration, and it can be added back to there...)

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Aug 1, 2020

As I wrote in my previous comment, bucket link is not a replacement of this configuration. This configuration can expose > >an other volume and all buckets.

Let's say when s3G started all buckets are created under volume s3v.

Now different S3G's started using different volumes, now when the user wants to use the bucket in o3fs, now he is not sure which volume to use. As said in the PR description this can cause this kind of issue. I understood the use case of you, have mentioned, and also we are not totally eliminating this, if someone is advanced user and knows what he is doing, they can still use this, but they will see the warning. (But it is not documented, I think this can be addressed with a new page with your proposal)

And also one more point, using a single volume for S3G buckets came to eliminate a command which was there before ozone s3 path, thinking that is an extra burden and now the user needs to know the volume name from this command. (And also volume names are cryptic) now with this feature, we will be back to the same issue.

So, to avoid this, might be admin need to give one S3G URL and the volume used underneath it to users. (And admin can go ahead and create the volumes and provide appropriate permissions). And also admin needs to use separate config for each S3G.

Okay coming back to the comment, to get this in are you proposing we need to remove the warning which we have added in the code?

      LOG.warn(OZONE_S3_VOLUME_NAME + " is deprecated. Consider using bucket " +
          "link command to access volume/buckets created via ozone shell " +
          "through S3Gateway and also to access buckets created by S3Gateway " +
          "version less than or equal to 0.5.0");

@elek
Copy link
Member

elek commented Aug 2, 2020

Let's say when s3G started all buckets are created under volume s3v.

Now different S3G's started using different volumes, now when the user wants to use the bucket in o3fs, now he is not sure which volume to use.

I might have a different view this, because I think here are multiple, different roles, which are mixed here.

  • The administrators can modify the volume mounting.
  • Users will use s3g

I think it's acceptable, if the administrator modifies some fundamental part of the environment setup, the user should be notified. It's very similar to a DNS name refactor. There are cases where the knowledge of the users should be updated.

I think it's acceptable, and this is the responsibility of the administrator to judge if it's ok or not in a specific environment / company culture.

There are also cases when the administrator would like to start with /home instead of s3v volume names from the day 1. The volume name won't be changed anymore, but in this setup, a different

And also one more point, using a single volume for S3G buckets came to eliminate a command which was there before ozone s3 path, ... now with this feature, we will be back to the same issue.

Similar, but not the same. There are significant differences:

  1. volume names are not cryptic anymore
  2. all buckets use the same volume name (don't need to check for every bucket, enough to remember to the configuration)

Okay coming back to the comment, to get this in are you proposing we need to remove the warning which we have added in the code?

Actually it was suggested by @arp7, but yes. I am fine with that. I think we shold keep it in ozone-default.xml, too. We can assume that if somebody checks the ozone-default.xml, they are power users or developers.

@elek
Copy link
Member

elek commented Aug 2, 2020

Thinking a little more: if I understoof well, you are worried about admins who will start multiple s3g servers with different configs. On the other hand I would prefer to suport customizable volume name (but practically the same for all s3g instances).

The problem is that we couldn't check if different services are started with different settings or not.

We had earlier a plan to do some kind of configuration download during the service startup to simplify the configuration of the services. With such approach, you can be sure that all the services use the same configs (but power users can do any evil actions, anyway...)

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Aug 4, 2020

Removed all the code changes, in the latest commit updated only documentation.
Thank You @elek for the review and @arp7 for offline discussion and consensus with this approach.

@arp7
Copy link
Contributor

arp7 commented Aug 4, 2020

Leaving this key easily discoverable is a bad idea and will result in users unintentionally shooting themselves in the foot. However I don't have the energy to argue this ad-infinitum so in the interests of making forward progress, let's just go ahead.

@ChenSammi ChenSammi merged commit d7ea496 into apache:master Aug 6, 2020
llemec pushed a commit to llemec/hadoop-ozone that referenced this pull request Aug 7, 2020
ChenSammi pushed a commit that referenced this pull request Aug 12, 2020
rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants