Skip to content

Conversation

emirkmo
Copy link

@emirkmo emirkmo commented Aug 14, 2025

Summary

Fix persist docs for replicated cluster setups. (Solves #295)

Split persist docs macro into two alter statements to handle cases where on cluster wouldn't persist comments on other replicas.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG

Note for maintainers

Note that current github integrations tests do not use the cluster test_replica and hence didn't see the issue. The added tests only fail & pass using test_replica cluster (i.e. replicated (and sharded if needed) cluster setups).

Explanation

The previous version of this macro blended the MODIFY COMMENT command for the table comment with the COMMENT COLUMN commands for the column comments, inside the same ALTER TABLE.

This combined syntax does not work with ON CLUSTER for replicated setups. (This was thought to be a clickhouse bug, but that bug is fixed and the behavior remains.)

The revised macro now sends a separate ALTER TABLE for the table comment MODIFY COMMENT, and a separate one for all the COMMENT COLUMN commands.

I've split it up into 2 helper macros that accomplish each task for maintainability. This also works on non-cluster setups so there is no need to further complicate it by combining table and column comment alterations into a single ALTER.

clickhouse__persist_docs now runs the relation (table comment) helper macro, if persist_relations_docs() is set and descriptions exist for the relation, and runs the column comment helper macro if persist_column_docs() is set and columns are defined in the model.

Additionally, the formatting was cleaned up for multiple column comments so that comments for a different column are not appearing on the same line. Just makes it easier to read in clickhouse/dbt logs.

Example:

Previous macro produced comments like:

alter table <table>
[ON CLUSTER] modify comment
$dbt_comment_literal_block$<table comment>,
$dbt_comment_literal_block$, comment column `col1_name`
$dbt_comment_literal_block$<column comment>
$dbt_comment_lieteral_block$, comment column `col2_name`

...

A bit messy, but works on a single node.

Although ON CLUSTER is correctly added, this syntax is incorrect in a replicated setup and ONLY produces table and column comments for the initiator node. Now it does:

alter table <table>
[ON CLUSTER] modify comment
$dbt_comment_literal_block$ <table comment>
$dbt_comment_literal_block$

alter table <table>
[ON CLUSTER]
comment column `col1_name`
 $dbt_comment_literal_block$<column multi
linecomment>$dbt_comment_literal_block$,
comment column `col2_name`
 $dbt_comment_literal_block$ <column comment>$dbt_comment_literal_block$,

...

Edit: Removed dev test leftovers.
Fixes: #295

@CLAassistant
Copy link

CLAassistant commented Aug 14, 2025

CLA assistant check
All committers have signed the CLA.

@emirkmo emirkmo changed the title Fix persist docs for Replicated cluster setups Fix: persist docs for Replicated cluster setups Aug 14, 2025
@emirkmo
Copy link
Author

emirkmo commented Aug 14, 2025

Note: for the added test to catch regressions in CI you should add the following to test_matrix.yaml:

      - name: Run HTTP tests on replicated cluster
        env:
          DBT_CH_TEST_CLUSTER: test_replica
          DBT_CH_TEST_CLUSTER_MODE: True

        run: |
          PYTHONPATH="${PYTHONPATH}:dbt"
          pytest tests/  
        # or just run the comment test: 
        # pytest tests/integration/adapter/clickhouse/test_clickhouse_comments.py

failure requires `test_replica` cluster to be used.
The previous version of this macro blended the
`MODIFY COMMENT` command for the table comment
with the `COMMENT COLUMN` commands for the column comments,
inside the same `ALTER TABLE`.

This combined syntax **does not work** with `ON CLUSTER`.
(This was thought to be a clickhouse bug, but that bug is fixed
and the behavior remains.)

The revised macro now sends a separate `ALTER TABLE` for
the table comment `MODIFY COMMENT`, and a separate one for
all the `COMMENT COLUMN` commands.

I've split it up into 2 helper macros that accomplish each
task for maintainability. This also works on non-cluster
setups so there is no need to further complicate it by combining
table and column comment alterations into a single ALTER.

`clickhouse__persist_docs` now runs the relation (table comment)
helper macro if `persist_relations_docs()` is set and descriptions exist
for the relation and the column comment helper macro if
`persist_column_docs()` and columns are defined in the model.

Additionally, the formatting was cleaned up for multiple column
comments so that comments for a different column are not appearing
on the same line. Just makes it easier to read in clickhouse/dbt logs.

Example:

It produced comments like:

```sql
alter table <table>
[ON CLUSTER] modify comment
$dbt_comment_literal_block$<table comment>
$dbt_comment_literal_block$, comment column `col1_name`
$dbt_comment_literal_block$<column comment>
$dbt_comment_lieteral_block$, comment column `col2_name`

...
```

A bit messy, but works on a single node.

Although `ON CLUSTER` is correctly added, this syntax is
incorrect and ONLY produces table and column comments for
the initiator node. Now it does:

```sql
alter table <table>
[ON CLUSTER] modify comment
$dbt_comment_literal_block$ <table comment>
$dbt_comment_literal_block$

alter table <table>
[ON CLUSTER]
comment column `col1_name`
 $dbt_comment_literal_block$<column multi
linecomment>$dbt_comment_literal_block$,
comment column `col2_name`
 $dbt_comment$ <column comment>$dbt_comment$,

...
```
@emirkmo emirkmo force-pushed the fix-persist-docs-for-cluster branch from 4841279 to 671f7d0 Compare August 14, 2025 16:21
@emirkmo emirkmo marked this pull request as draft August 28, 2025 11:56
@emirkmo emirkmo marked this pull request as ready for review August 28, 2025 11:56
@emirkmo
Copy link
Author

emirkmo commented Aug 28, 2025

Tried to retrigger it for attention. Could I please get the workflows approved or a review?

@emirkmo
Copy link
Author

emirkmo commented Sep 6, 2025

@BentsiLeviav Sorry for tagging, but any chance you're willing to approve the workflows so I can get a review on this PR? You were the only one showing any maintainer activity recently.

It solves a bug that's been around for a while (and imo adds better separation of concerns to the macro). I've shown with tests why the bug was missed and how the change resolves it.

Github Copilot Review

This pull request introduces significant refactoring and improvements to the way documentation comments are persisted in ClickHouse tables and columns within the dbt-clickhouse project. Here’s a concise review, highlighting the main changes, positives, and some considerations:


Summary of Changes

1. Refactoring persist_docs Macro

  • The macro clickhouse__persist_docs is refactored to delegate table and column comment persistence to new helper macros:
    • _persist_table_comment: Handles escaping and altering table-level comments.
    • _persist_column_comments: Handles escaping and altering all eligible column comments in a batch.
  • This improves modularity, readability, and maintainability.

2. Implementation Improvements

  • Escaping: Ensures comments are escaped for ClickHouse compatibility.
  • Selective Execution: Only attempts to alter comments for columns that exist in the table.
  • Batching: Groups column comment changes into a single ALTER TABLE statement when possible.

3. Test Suite Enhancements

  • New Test Model: Adds replicated_table_comment.sql to cover ReplicatedMergeTree engine with comments.
  • Expanded YAML: Updates schema.yml to define comments for the new model and its columns.
  • Test Coverage: Updates test parameterization to include the new model.

Strengths

  • Cleaner Logic: Extracting logic into helper macros makes the code easier to read and extend.
  • DRY Principle: Reduces code duplication for comment persistence.
  • Correctness: Uses adapter.get_columns_in_relation to avoid trying to comment on non-existent columns.
  • Better Test Coverage: Tests now include replicated tables, not just standard tables and views.

Potential Considerations

  1. Atomicity and Error Handling
    • The macro batches column comment changes, but if one fails, the whole statement may fail. Is this the desired behavior?
  2. Performance
    • Batching is good for performance, but very large tables with many columns might hit statement size limits—unlikely, but worth noting.
  3. Backward Compatibility
    • The macro signatures remain unchanged, so there should be no breaking changes for downstream macros/projects.
  4. Test Matrix
    • The PR adds coverage for replicated tables, which is good. Consider edge cases (e.g., columns present in YAML but missing in SQL model).

Style & Clarity

  • The use of Jinja comments ({# ... #}) is helpful.
  • Helper macro names are clear and self-explanatory.
  • The logic for escaping and building statements is explicit and robust.

Summary

This is a well-structured refactor that improves the maintainability and extensibility of documentation persistence in dbt-clickhouse, with thoughtful updates to tests. The approach reduces duplication, handles edge cases, and should be safe for consumers.

Recommendation:
Approve, pending any project-specific code review requirements or CI results.

If you’d like a deeper dive on a specific part or want to see generated SQL for a sample model, let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table and Column Comments set up misbehaviour
2 participants