Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Changing type(s) in a log schema will break historical search against data using old schema #1208

Open
chunyong-lin opened this issue Mar 24, 2020 · 0 comments

Comments

@chunyong-lin
Copy link
Contributor

chunyong-lin commented Mar 24, 2020

Background

If you have historical search enabled and the file_format is set to parquet, bad news, we will be screwed if we change the type(s) in a log schema and we will get the error HIVE_PARTITION_SCHEMA_MISMATCH error when we try to search historical data across all partitions in the table using the schema we changed.

For example, if we change following timestamp to string, carbonblack_alert_watchlist_hit_feedsearch_bin table partitions will be screwed.

"timestamp": "float",

If we don't change the schema ever, happy life! Unfortunately, this is not the reality 😢

Desired Change

Couple things we can improve.

  1. Standardize Everything on string
    String is larger in memory footprint, but is the most permissive to future changes.

  2. Have a script that can fix this quickly
    Script should drop target table(s) and rebuild them using new schemas, and should recreate partitions. This script may also need to fix underlying data (which might be hard).

  3. Or other solutions we haven't thought about.

@chunyong-lin chunyong-lin added this to the 3.1.1 milestone Mar 24, 2020
@ryandeivert ryandeivert modified the milestones: 3.1.1, 3.1.2, 3.1.3 Mar 31, 2020
@ryandeivert ryandeivert modified the milestones: 3.1.3, 3.3.0 Apr 9, 2020
@ryandeivert ryandeivert modified the milestones: 3.3.0, 3.4.0 Aug 4, 2020
@ryandeivert ryandeivert modified the milestones: 3.4.0, 4.1.0 Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants