Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: format schema cache is not cleared after the schema is dropped #781

Merged
merged 9 commits into from
Jul 5, 2024

Conversation

zliang-min
Copy link
Collaborator

@zliang-min zliang-min commented Jun 25, 2024

PR checklist:

  • Did you run ClangFormat ?
  • Did you separate headers to a different section in existing community code base ?
  • Did you surround proton: starts/ends for new code in existing community code base ?

Please write user-readable short description of the changes:

In this PR, I have ported a few changes related to ProtobufSchemas from CH repo for cleaning up the Protobuf schema cache (please check the Commits tag for what are ported). And based on those changes, I added the logic to DROP FORMAT SCHEMA and CREATE OR REPLACE FORMAT SCHEMA to clear theProtobuf schema cache.

Here is a use case that is blocked by this bug:

  1. Create a protobuf schema, for example
    CREATE FORMAT SCHEMA foo as '
    syntax = "proto3";
    
    message Foo {
      string a = 1;
      int64 b = 2;
    }
    ' TYPE Protobuf;
  2. Use the schema with a Kafka external stream:
    CREATE EXTERNAL STREAM foo (
    a string,
    b int64
    ) SETTINGS type='kafka',...,data_format='ProtobufSingle',format_schema='foo:Foo';
    
    SELECT * FROM foo;
    This works fine.
  3. Replace the schema with a new one ( like the schema is evolved ):
    CREATE FORMAT SCHEMA foo as '
    syntax = "proto3";
    
    message Foo {
      string a = 1;
      int64 b = 2;
      string c = 3;
    }
    ' TYPE Protobuf;
    In this example, we added a new field c.
  4. Use the new schema with an evolved external stream
    CREATE EXTERNAL STREAM foo (
    a string,
    b int64,
    c string
    ) SETTINGS type='kafka',...,data_format='ProtobufSingle',format_schema='foo:Foo';
    
    SELECT * FROM foo;
    This time, the values of column c will always be empty, because the schema is cached, and even after we re-create it, it still uses the old cached schema.

@zliang-min zliang-min added the bug Something isn't working label Jun 25, 2024
@zliang-min zliang-min self-assigned this Jun 25, 2024
@zliang-min zliang-min marked this pull request as ready for review June 26, 2024 10:15
@chenziliang
Copy link
Collaborator

LGTM in general

@zliang-min zliang-min force-pushed the bugfix/schema-cache-invalidation branch from b340636 to 8d9829e Compare July 4, 2024 22:56
@zliang-min zliang-min merged commit 2b723fb into develop Jul 5, 2024
21 checks passed
@zliang-min zliang-min deleted the bugfix/schema-cache-invalidation branch July 5, 2024 05:27
@jovezhong
Copy link
Contributor

(Jove Github Bot) added it to the current sprint.

@jovezhong
Copy link
Contributor

(Jove Github Bot) moved this ticket out of the GitHub project(up to 1200 tickets for one project).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants