Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental table creation has unneccessary indexes, write temp table as a HEAP #180

Open
alittlesliceoftom opened this issue Jul 10, 2023 · 1 comment

Comments

@alittlesliceoftom
Copy link

When you create an incremental table, this is generally a 2 step process:

  1. Create the initial table as a table.
  2. Create the incremental updates as new row inserts (possibly with a delete afterwards).

For both of these the adapter breaks this up into a few steps:

  1. Build a view for what's going on
  2. Create a temp table (not in tempdb)
  3. Move that table to be the final table/ insert from that table into the final table
  4. Clean up.

For the incremental steps, the temp table is created with the index and distribution settings of the final target table.

The best practice advice for transient data is to use a HEAP index. We should test if using this for the intermediate table would make the process faster. As currently there's a slow store to CCI (possibly ordered), but then that's ready and slowly stored to the second CCI.

I think this needs testing, but will potentially be a large speed up for users:

Example step that needs change:

  CREATE TABLE "database"."schema"."users_table_name__dbt_tmp"
    WITH(
      DISTRIBUTION = HASH(HashColumn),
      CLUSTERED COLUMNSTORE INDEX ORDER(OrderColumn)
      )
    AS (SELECT * FROM [schema].[users_table_name__dbt_tmp_temp_view])

Should be:

  CREATE TABLE "database"."schema"."users_table_name__dbt_tmp"
    WITH(
      DISTRIBUTION = /*Unsure here - could take the dist or do RR, RR can load the DMS*/
      HEAP
      )
    AS (SELECT * FROM [schema].[users_table_name__dbt_tmp_temp_view])

@alittlesliceoftom
Copy link
Author

I'd love to understand the necessity for a temp table at all a bit better, why is this needed vs pulling straight from the view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant