-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: fix bug that HadoopFileIO cannot be dynamically loaded #2333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } catch (NoSuchMethodException e) { | ||
| throw new IllegalArgumentException(String.format( | ||
| "Cannot initialize FileIO, missing no-arg constructor: %s", impl), e); | ||
| LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try -> will attempt to use
Just because the current wording makes it sound like the user needs to try to do something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also shouldn't this be "info"? It's not necessarily an error right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
| LOG.error("No-arg constructor not found for {}, try to use Hadoop Configuration constructor", impl, e); | ||
| try { | ||
| ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked(); | ||
| useNoArgConstructor = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just invoke the constructor where we load it and skip the state variable passing through here?
Something like
try {
ctor = ..
fileIO = ctor.newInstance
} catch {
ctor = ... single are ctor
fileIo = ctor.newInstance(hadoopConf)
}To avoid try nesting we could extract that to a helper function and then have something like
try {
fileIO = loadFileIO(impl)
} catch {
Cannot find a 0-arg or 1-arg constructor to initalize FileImplemenation
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional question, do we always want the 0 arg constructor to get priority over a 1-arg? Or should we throw an error if you provide both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we always want the 0 arg constructor to get priority over a 1-arg
I think it is better to let 0 arg constructor to get priority, so it does not block any FileIO that might have both constructors.
To avoid try nesting we could extract that to a helper function
Yeah I tried about that but there was duplicated logic for finding ClassCastException. Using a helper function is probably a way to solve it, let me update with that, thank you!
|
@RussellSpitzer updated based on all the comments, please take a look when you have time, thanks! |
| * a Hadoop config will be passed using {@link Configurable#setConf(Configuration)}. | ||
| * {@link FileIO#initialize(Map properties)} is called to complete the initialization. | ||
| * | ||
| * For compatibility with HadoopFileIO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we starts to support 1-arg constructor as a valid case in the code and prints error messages about it, should we mention this as a fully supported case instead of a backward compatibility support? Especially that the reason for you to choose this option is performance concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I also noticed this is a bit annoying to see a stack trace printed when trying to load it, let me just not print the error.
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
| Configuration configuration = new Configuration(); | ||
| configuration.set("key", "val"); | ||
| FileIO fileIO = CatalogUtil.loadFileIO( | ||
| TestFileIOHadoopConstructor.class.getName(), Maps.newHashMap(), configuration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just load the HadoopFileIO directly ? Here we are trying to load an TestFileIOHadoopConstructor, I'm concerning that we could not detect the breaking case if someone change the HadoopFileIO constructor....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would be possible, since HadoopFileIO is public and the constructor is also public, we cannot simply change it. But this is a good point, I can just load it instead of having this dummy test class.
| * If the class implements {@link Configurable}, | ||
| * a Hadoop config will be passed using {@link Configurable#setConf(Configuration)}. | ||
| * {@link FileIO#initialize(Map properties)} is called to complete the initialization. | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paragraphs should be separated by <p>.
| } | ||
| } catch (NoSuchMethodException e) { | ||
| LOG.info("0-arg constructor not found for {}, will attempt to use Hadoop Configuration constructor", impl); | ||
| ctor = DynConstructors.builder(Catalog.class).impl(impl, Configuration.class).buildChecked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be collapsed into a single DynConstructors call that lists both implementations. The first one that is found will be selected. And the newInstance method should discard arguments that are not needed so you can use the same call for both.
|
@jackye1995, can you elaborate on your performance concerns if the configuration isn't final? I'm not sure that would be a problem. |
|
Sorry, I missed this PR. I can hold 0.11.1 RC for a little longer. |
The issue with that approach is that I still need a boolean to check if I need to call the
I would actually prefer to use this approach of adding no-arg consturctor for |
Thank you Anton! |
|
@jackye1995, let's go with the no-arg constructor and remove |
|
@rdblue I was just going to reply, I did some benchmarking, as expected there is a difference but it was at microsecond level, so I think it should be fine to just remove the final. I think I was overly conservative on this, thanks for the suggestion! |
|
Thanks everyone! I am going to include this change in 0.11.1. |
@aokolnychyi this should probably go to 0.11.1
HadoopFileIO currently does not have a no-arg constructor and could not be loaded by the dynamic FileIO loader. But our documentation claims it can be loaded, so we need to fix it asap. There are 2 ways to fix it,
setConfmethod.To do approach 1, we need to change HadoopFileIO class variable to be not final, and I am a bit hesitated to do that for performance concerns, so I went with approach 2 for now.