Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Update index.knn setting to be immutable on restore snapshot #17019

Open
anntians opened this issue Jan 14, 2025 · 7 comments
Open

[BUG] Update index.knn setting to be immutable on restore snapshot #17019

anntians opened this issue Jan 14, 2025 · 7 comments
Assignees
Labels
bug Something isn't working Storage:Snapshots

Comments

@anntians
Copy link

Describe the bug

This issue builds on top of this KNN issue: opensearch-project/k-NN#2334. Today, user can update index.knn flag even after an index is created which should not be allowed. For instance, users can close an index, update index.knn, and reopen index. This bug has be resolved by this PR: opensearch-project/k-NN#2348. However, another loophole is users can take a snapshot of an index, delete index, and update index.knn on restore from snapshot.

The focus of this issue is for the bug case when restoring from snapshot. There is an open PR: #16957 attempting to fix the bug, but there are a few implementation considerations that need to be discussed. The current implementation in the PR is not best practice as it introduces plugin-specific settings in core Opensearch. So below are two suggestions on implementation:

  1. Expand functionality of RestoreInProgress class to include settings that are updated from the restore request

Currently in execute of restoreService we store a custom object RestoreInProgress in the clusterState here. We could update RestoreInProgress to include a diff of index settings changed, so that once the clusterState is passed downstream to the IndexEventListener callback methods, we can perform validation checks from within the KNN plugin.

Pro: Perform validation within KNN plugin. Straightforward to expand on an existing custom class that is part of clusterState. Faster implementation/merge timeline.

Con: Doesn't offer great extensibility if in the future we want to add more use cases to access data/updates during restore operations.

  1. Add a new callback method beforeIndexRestore() to IndexEventListener and trigger it within restoreService.

Currently, the reason we cannot use the existing callback methods in IndexEventListener is because they do not have access to the prev clusterState to know which settings were updated. Further the callback methods can be triggered from other non-restore API calls. However, if we create a new callback method that will strictly be called in the restoreService, pass it clusterState and diff of changed settings, then we can perform validation from within the KNN plugin.

Pro: Creates better extensibility for other use cases to build off of this new callback method in the future. There have been several requests for additional access for restore related updates

Con: Will have to go through several approvals and discussions to make this change, as it updates the core repository and affects future customers/contributors use cases. Will be a longer completion date compared to option 1.

I would like to start a discussion on which option to execute on and additional context that may help resolve this bug.

Related component

Storage:Snapshots

To Reproduce

  1. Register Snapshot directory
  2. Create Index
  3. Take Snapshot of Index
  4. Delete Index
  5. Restore Snapshot of index with index.knn setting updated. Will restore index with the updated setting

Expected behavior

index.knn should be immutable on restore. An error should be thrown if the setting is updated on restore api call

Additional Details

Additional context
Original issue: opensearch-project/k-NN#2334
PR for original issue: opensearch-project/k-NN#2348
Open PR for this issue: #16957

@kotwanikunal
Copy link
Member

Thanks @anntians for the detailed description. I think we would benefit with some hooks into the restore and snapshot mechanism.
@andrross / @dblock / @reta - Thoughts here?

@jed326
Copy link
Collaborator

jed326 commented Jan 14, 2025

Thanks @anntians. I have only looked into this briefly, but at a high level I'm unsure any listener hook sort of pattern is the right approach. It looks like today what happens is if we try to update one of these settings then we get a SnapshotRestoreException:

throw new SnapshotRestoreException(
snapshot,
"cannot remove setting [" + ignoredSetting + "] on restore"
);

However, if you were to use a callback listener, then

  1. Most callbacks are not expected to throw exceptions and if they do the exceptions are not supposed to interrupt the execution flow, for example:

@Override
public void beforeIndexCreated(Index index, Settings indexSettings) {
for (IndexEventListener listener : listeners) {
try {
listener.beforeIndexCreated(index, indexSettings);
} catch (Exception e) {
logger.warn("failed to invoke before index created callback", e);
throw e;
}
}
}

  1. If you do throw an exception in a callback listener to interrupt execution then you may end up failing the shard instead of failing the snapshot restore

Taking a step back, it seems like this is really a feature request for making the notion of USER_UNMODIFIABLE_SETTINGS pluggable. Would that be possible through a new Setting property instead?

/**
* Property of the setting
*
* @opensearch.api
*/
@PublicApi(since = "1.0.0")
public enum Property {

@jed326
Copy link
Collaborator

jed326 commented Jan 14, 2025

Going one step further I would actually argue the javadoc for the Final property implies that indices being restored via snapshot should not be able to modify these properties so I could see this more as fixing RestoreService to correctly honor that.

/**
* mark this setting as final, not updateable even when the context is not dynamic
* ie. Setting this property on an index scoped setting will fail update when the index is closed
*/
Final,

@kotwanikunal
Copy link
Member

However, if you were to use a callback listener, then

Most callbacks are not expected to throw exceptions and if they do the exceptions are not supposed to interrupt the execution flow, for example:

@jed326 - That's a fair callout. I was initially thinking of failing the shard, but we still need a way to clean up / prevent the restore completely.

Going one step further I would actually argue the javadoc for the Final property implies that indices being restored via snapshot should not be able to modify these properties so I could see this more as fixing RestoreService to correctly honor that.

This actually makes a lot more sense. It will be easier to fail the request at validation if a final property is being overridden by the user. @anntians Can you give this a try?

@neetikasinghal
Copy link
Contributor

neetikasinghal commented Jan 14, 2025

Going one step further I would actually argue the javadoc for the Final property implies that indices being restored via snapshot should not be able to modify these properties so I could see this more as fixing RestoreService to correctly honor that.

OpenSearch/server/src/main/java/org/opensearch/common/settings/Setting.java

Lines 128 to 132 in 8347d0e

     /** 
      * mark this setting as final, not updateable even when the context is not dynamic 
      * ie. Setting this property on an index scoped setting will fail update when the index is closed 
      */ 
     Final,

+1, I agree with this.

@sohami
Copy link
Collaborator

sohami commented Jan 15, 2025

@jed326 Good callouts.

Going one step further I would actually argue the javadoc for the Final property implies that indices being restored via snapshot should not be able to modify these properties so I could see this more as fixing RestoreService to correctly honor that.

I don't think the javadoc makes any claim on the behavior of RestoreService. From RestoreService perspective there can be 2 cases a) Creating a new index from a snapshot b) Restoring to an existing closed index. Providing an override for a final setting in restore request for case a) should be fine vs case b), since in former we are not really changing setting for an existing context, it is a new context altogether.

Also the change for behavior of final setting in 2.x will not be bwc. If existing final setting was allowed to be updated in restore flow, it could be in use by any plugin (though seems unlikely) and will result in failure with new change.

Instead I was thinking if we introduce another property like UnmodifiableOnRestore similar to NotCopyableOnResize which can explicitly be added to the index settings and during restore doesn't allow override of that setting in both the cases above. Usually such settings will be related to data or partition (like number of shards) which cannot be changed by restoring. Will let others chime in as well.

@anntians
Copy link
Author

Thank you for commenting, I had a few questions following up:

  1. @sohami 's comment mentioned instead of enforcing the final setting behavior on restore, we should add a new property UnmodifiableOnRestore which prevents a setting from updates in both cases of restore: create new index from snapshot, and restore existing closed index. However, the current implementation of restore already enforces that behavior for some final settings such as SETTING_NUMBER_OF_SHARDS via Sets like USER_UNMODIFIABLE_SETTINGS, meaning for both cases if I try to change SETTING_NUMBER_OF_SHARDS when create new index from snapshot and restore an existing closed index, it fails. Meanwhile, for other final settings such as SETTING_REPLICATION_TYPE , immutability is only enforced on restore for certain conditions, meaning if I don't trigger the condition, I can update that setting for both cases of restore. It seems that there isn't a clear/consistent way of enforcing immutability of final settings in restore right now. So is that the main reason for creating a new property instead of building off of final setting? Since if we enforce all cases for final setting, it would break existing restore.

  2. If we go ahead with adding a new property UnmodifiableOnRestore, which of the existing final settings in core opensearch should we add UnmodifiableOnRestore to? Currently the way a few final settings are enforced in restoreService is through these Sets USER_UNMODIFIABLE_SETTINGS, which are checked during the runtime of restore. I'm not sure why the existing implementation doesn't directly check whether settings have Final property with indexScopedSettings.isFinalSetting(setting), but if we were to add UnmodifiableOnRestore to existing core settings and then also add a check with something like indexScopedSettings.isUnmodifiableOnRestoreSetting(setting) then that might create duplicate checks on restore and might cause confusion in the future. If we do not add the new property UnmodifiableOnRestore to existing core settings, how would we add tests for the new property, since core doesn't recognize plugin-specific settings in the tests.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Snapshots
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants