-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-25902 HMaster failed to start with NoSuchColumnFamilyException during upgrade from HBase 1.x to HBase 2.x #3287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…during upgrade from HBase 1.x to HBase 2.x
|
🎊 +1 overall
This message was automatically generated. |
| * List of column families that cannot be deleted from the hbase:meta table. | ||
| * They are critical to cluster operation. This is a bit of an odd place to | ||
| * keep this list but then this is the tooling that does add/remove. Keeping | ||
| * it local! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is wrong, right? We don't have any tooling in HConstants.
This should go where table descriptors are defined and managed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will move it back to ModifyTableProcedure
|
|
||
| // Default meta table descriptor, will be used by RegionServer during rolling upgrade until | ||
| // HMaster write latest 2.x meta table descriptor | ||
| private TableDescriptor defaultMetaTableDesc = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when using this "default descriptor" the regionserver attempts to update meta? Can that happen? We are assuming the master will rewrite soon, but is that valid? Regionservers don't run masters, they rely on an operator to do that. Who knows what the operator is doing.
Would it be better to address specific fallbacks where something is missing or not expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RegionServer can't update the meta descriptor,
org.apache.hadoop.hbase.regionserver.HRegionServer#canUpdateTableDescriptor
| TableDescriptor td = getTableDescriptorFromFs(fs, rootdir, TableName.META_TABLE_NAME); | ||
| validateMetaTableDescriptor(td); | ||
| return td; | ||
| } catch (TableInfoMissingException | NoSuchColumnFamilyException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. The table might be missing altogether (what current code handles), or might be missing a column family due to legacy (what this change handles)
| } catch (TableInfoMissingException | NoSuchColumnFamilyException e) { | ||
| // Meta is still in old format, return the default meta table descriptor util we have meta | ||
| // descriptor in HBase 2.x format | ||
| return defaultMetaTableDesc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong.
createMetaTableDescriptorBuilder should return a builder for what the current version of the code expects.
If we need fallbacks to ride over an upgrade case, those fallbacks should be implemented where the errors are happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong.
createMetaTableDescriptorBuildershould return a builder for what the current version of the code expects.If we need fallbacks to ride over an upgrade case, those fallbacks should be implemented where the errors are happening.
In the current version of code createMetaTableDescriptorBuilder is used only during HMaster startup via InitMetaProcedure when table info file is missing.
Agree it is a hack (defaultMetaTableDesc = null), where RegionServer will get the default meta descriptor (createMetaTableDescriptorBuilder) until HMaster rewrite the proper one.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
I think the goal for HBASE-23055 is to give us a way to not need to always upgrade region server first when there is a meta schema change, so I think we should try to do this less hacky. |
|
I think it would be better to catch |
|
Closing as resolved by #3417 , Thanks @pankaj72981 |
HMaste should validate the meta table descriptor during startup and rewrite if any mismatch, meanwhile RegionServer should read the default meta table descriptor to avoid inconsistencies.