Skip to content

Conversation

@xichen01
Copy link
Contributor

What changes were proposed in this pull request?

This PR introduces "feature bitmap" support in OzoneManagerVersion, enabling the Client check OM feature use the "feature bitmap" instead of the version which was the OM latest version.

Issue

If only the latest version is used for feature checks, there is a risk of misjudging compatibility when versions are not sequential. For example, if certain intermediate versions are skipped (e.g., [1, 2, 3, 5]), relying on simple version comparisons might incorrectly assume support for all earlier versions.

We need backport this #7161 patch to ozone-1.4 branch, #7161 introduce the OzoneManagerVersion#LIGHTWEIGHT_LIST_STATUS(8), if we backport the #7161, then the ozone-1.4 OzoneManagerVersion will be [1,2,3,4,8], no the feature [5, 6, 7]. Current logic which base on the latest OzoneManagerVersion#version to judge whether OM support a specific feature, will cause the client think the OM support the features [5, 6, 7], but ozone-1.4 not support.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11596

How was this patch tested?

Existing tests. Newly added unit test.

@xichen01
Copy link
Contributor Author

@adoroszlai @tanvipenumudy @fapifta PTAL thanks.

@adoroszlai
Copy link
Contributor

adoroszlai commented Oct 18, 2024

I think #7161 shouldn't have added a new version. It should have used the first version that was added after #5275 (i.e. OBJECT_TAG(5, ...)). An "alias" could be introduced for clarity:

  public static final OzoneManagerVersion LIGHTWEIGHT_LIST_STATUS = OBJECT_TAG; // alias

Since OBJECT_TAG(5) is not available in 1.4.1, we'd need to backport the check using numbers, or could check against FUTURE_VERSION.

@adoroszlai
Copy link
Contributor

Version numbering is sequential. Limiting to 63 versions, especially encoding this limitation in the proto message, sounds wrong.

If we want to support subsets of features, drop numeric scheme and use feature flags.

In any case, I don't think this is something we can do in a patch release, even if the goal is to make it possible to backport a fix.

Please see my previous comment on how to resolve this both for 1.4.1 and 2.0.0.

@adoroszlai
Copy link
Contributor

Proposed fixes:

I'll open PRs if runs are successful.

@adoroszlai
Copy link
Contributor

Since OBJECT_TAG(5) is not available in 1.4.1, we'd need to backport the check using numbers, or could check against FUTURE_VERSION.

This would fix the compatibility issue, but would disable listStatusLight for client 1.4.1+ and server 1.4.0+ combination.

@xichen01
Copy link
Contributor Author

Since OBJECT_TAG(5) is not available in 1.4.1, we'd need to backport the check using numbers, or could check against FUTURE_VERSION.

This would fix the compatibility issue, but would disable listStatusLight for client 1.4.1+ and server 1.4.0+ combination.

Do you mean that we should set the version of LIGHTWEIGHT_LIST_STATUS to 5 (to be the same as OBJECT_TAG(5)) in the master and ozone-1.4 branches?

but would disable listStatusLight for client 1.4.1+ and server 1.4.0+ combination.

For this issue, user can upgrade the server side to ozone-1.4.1 to fix the issue.

@xichen01
Copy link
Contributor Author

@adoroszlai However, for the current version checking logic to work accurately, it is a prerequisite that the versions must be consecutive, but this is not always guaranteed.

  • If we need to backport a new feature to a specific branch, but some of the previous features have not been backported, then the version numbers may not be consecutive, such as we need backport HBASE_SUPPORT(7) to ozone-1.4.

  • Or in the private Ozone branch maintained by many companies, in order to prevent the proto field id from conflicting with the community branch, they often set a special id, often the proto field id will start from 1000 or 10000, and then the id is not consecutive. This can also happen with service version

I think this PR is a solution to the above problem, for the ID is limited to a maximum of 63, we should be able to achieve this by transmitting multiple bitmaps

@adoroszlai
Copy link
Contributor

Since OBJECT_TAG(5) is not available in 1.4.1, we'd need to backport the check using numbers, or could check against FUTURE_VERSION.

This would fix the compatibility issue, but would disable listStatusLight for client 1.4.1+ and server 1.4.0+ combination.

Do you mean that we should set the version of LIGHTWEIGHT_LIST_STATUS to 5 (to be the same as OBJECT_TAG(5)) in the master and ozone-1.4 branches?

Yes, I thought we could make the feature version backportable that way, but it has the problem I mentioned:

but would disable listStatusLight for client 1.4.1+ and server 1.4.0+ combination.

For this issue, user can upgrade the server side to ozone-1.4.1 to fix the issue.

OM 1.4.x cannot report it is at OzoneManagerVersion = 5: that would fool newer clients to believe OBJECT_TAG(5) is supported. So it must keep reporting version = 4. If client only looks at OM version, it cannot distinguish between 1.4.0 and 1.4.1+ OM.

So we have two choices:

  1. Accept that client uses listStatus instead of listStatusLight for certain versions, despite both client and server having implementation for the latter.
  2. Introduce a new way to indicate supported features.

If we need to backport a new feature to a specific branch

We shouldn't backport features, only bugfixes. LIGHTWEIGHT_LIST_STATUS is a special case: feature is already in 1.4.0, only the version increment was missed.

@xichen01
Copy link
Contributor Author

xichen01 commented Oct 21, 2024

OM 1.4.x cannot report it is at OzoneManagerVersion = 5: that would fool newer clients to believe OBJECT_TAG(5) is supported. So it must keep reporting version = 4. If client only looks at OM version, it cannot distinguish between 1.4.0 and 1.4.1+ OM.

Therefore, whether we set LIGHTWEIGHT_LIST_STATUS to version 4 or version 5, there will be compatibility issues.

@sumitagrawl
Copy link
Contributor

OM 1.4.x cannot report it is at OzoneManagerVersion = 5: that would fool newer clients to believe OBJECT_TAG(5) is supported. So it must keep reporting version = 4. If client only looks at OM version, it cannot distinguish between 1.4.0 and 1.4.1+ OM.

Therefore, whether we set LIGHTWEIGHT_LIST_STATUS to version 4 or version 5, there will be compatibility issues.

If Server and client is already upgraded to 1.4.0 and LIGHTWEIGHT_LIST_STATUS feature is already present, then there is no useful of adding that check explicitly now.

For specific compatibility fix, a client of 1.4.1 can take a decision if there is a failure, fall back to older way as that client specific code for compatibility.

Bitwise map of feature --

  • it will be complex to manage master feature, sub-branch release if its implemented with conflicting index, chance of misuse is there.
  • This is rare case of conflict, and if any such case happens, a specific code is better to manage for that client version.

IMO, keeping versioning simple will help reduce complexity of getting bigger issue later on.

@errose28
Copy link
Contributor

+1 to Sumit's comment. We discussed this at the US community sync yesterday. This issue shows that we already sometimes make mistakes with our current cross compatibility model, so the solution should not be to make it more complicated. Sequential versioning is much easier to manage since changes can depend on each other, and greatly reduces the test matrix.

The simplest fix for this problem looks to be to patch the 1.4.1 client to retry using the old list keys API if list keys light fails.

@adoroszlai
Copy link
Contributor

Closing based on discussion.

@adoroszlai adoroszlai closed this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants