-
Notifications
You must be signed in to change notification settings - Fork 3k
API: Add SupportsRecoveryOperations mixin for FileIO #10711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.iceberg.io; | ||
|
|
||
| /** | ||
| * This interface is intended as an extension for FileIO implementations to provide additional | ||
| * best-effort recovery operations that can be useful for repairing corrupted tables where there are | ||
| * reachable files missing from disk. (e.g. a live manifest points to data file entry which no | ||
| * longer exists on disk) | ||
| */ | ||
| public interface SupportsRecoveryOperations { | ||
amogh-jahagirdar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| /** | ||
| * Perform a best-effort recovery of a file at a given path | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How recovery is performed is left up to implementations but I think the one thing this interface is prescriptive on is that the expectation that it is best effort. It's not guaranteed that just because FileIO implementations implement the mixin that they will recover the file no matter what (e.g. there's a limit to versioning etc); as a result the interface doesn't surface an exception, rather just a boolean indicating success or not. I guess we may want to qualify what it means for a file to be recovered (it may be obvious but since it's an interface, it's better to be clear): It means that assuming no concurrent deletes of a file at that path, If that definition makes sense to folks: I can update @RussellSpitzer @danielcweeks
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other thought I had: We could offer another method which uplevels the manifestFile and attempts to recover the manifest and if that's successful, attempt recovery of the data files. I discarded this case since someone could just traverse the metadata tree on their own and just use
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would want to add to the notes that implementations should do their best not to throw any runtime exceptions and that this method is essentially assumed to be a no-op on false or something like that. I think we also need to describe how this method will be used. For example, "When reading a file if any exception is thrown this method will be called in an attempt to remedy the exception." Although if we use that description should we also passthrough a Throwable? A bigger question I have is whether or not this is needed as part of the general API or if it should be part of a Repair action that we talked about before.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's an interesting point @RussellSpitzer, I think initially we were planning to use this in something like the proposed repair manifests procedure. I would be a little concerned about putting it in the normal read path because it could hide more serious issues if we're trying to repair table state on the fly.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree on the comment that we should note that impls should do their best to avoid throwing runtime exceptions.
I actually don't think this should be recommended for the read path like @danielcweeks said performing recovery's on the fly probably will mask that the table was corrupted in the first place; I feel a user should always determine that their table is corrupted and then explicitly opt-in to recover the file by running the repair procedure. This procedure would indicate "Hey file A was missing but I recovered it". The user can then reason about better how their dataset may change after the recovery.
On the repair action, I'm not sure functions could technically reside specifically because the challenge is the actual implementations will look different for different FileIOs which is what is driving defining an interface. To me even if the interface itself is used primarily in the context of repair actions, there's enough merit to generalizing the interface in the public API because different FileIO's will have different ways of recovering a given file so we probably want to standardize the method for that. |
||
| * | ||
| * @param path Absolute path of file to attempt recovery for | ||
| * @return true if recovery was successful, false otherwise | ||
| */ | ||
| boolean recoverFile(String path); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have any examples for scenarios specific to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any underlying storage that supports some sort of versioning (e.g. S3) or trash concept (HDFS) can provide the ability to restore a file. The interface is necessary because these implementations are somewhat specific to the storage implementation.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://github.com/apache/iceberg/pull/10721/files#diff-322d2e2be6cbad87106f05c0456cf953d99f396c71c99a1d8c7bc281022f8b6bR432 for an example with s3
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any thoughts on having the signature be
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a specific use case you had in mind here? In practice, I've always found that either you have the versioned object to recover or you don't. There may be some cases where there's a permission issue or a reason that the operation can't be invoked, but that tends to be rare.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing in mind, just feels like it would be better to add it now than later. I was thinking the easiest thing would be being able to fail fast if the error is something the IO knows it can't recover from. If no one uses it then it shouldn't be a huge expense either.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the intent is to use this in a normal read path, more like a procedure (the rational I cover in an earlier comment). Since this is a "best effort" operation, I feel it would be adding unnecessary complexity that we can address later if there's a need (e.g. we can always extend with default implementations in the future).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just thinking in generalities since I am not familiar with the planned implementations so I'm ok with that.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I think this interface generalizes in the sense that it provides a common method for recovering a file that different FileIOs can implement. Now does it generalize beyond the Repair action itself? I can see the interface being used as a primitive if someone doesn't want to do the full repair action. For instance if a user already knows a single corrupted manifest with entries of data files some of which may/may not be on disk, then they can just write their own logic to recover it via this interface. |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example you mention (manifest points to non existing data files) isn't specific to a
FileIOimplementation, but eachFileIOimplementation would have to implement it. Such a scenario can be handled in places that know how to deal with manifests, manifest-lists, data-files.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue isn't specific to the
FileIO, but the remediation is specific to theFileIOand each requires a specific implementation.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy basically what Dan said, for example S3FileIO would have it's own implementation for remediation by going through s3 object versions, same principle for GCS/Azure/HDFS. The example provided is providing an example from the perspective of a user of the interfcace, like a repair function.
For example, with this interface:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, can you clarify the javadoc saying that the function's purpose is to do "its best" to recover a single file/object. It can be misinterpreted in a way that one can pass in a manifest-file to repair/recover data files referenced from that manifest-file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that sounds reasonable to me!