feat(core): introduce data control attributes for wren MDL base by goldmedal · Pull Request #1014 · Canner/wren-engine

goldmedal · 2024-12-26T04:04:05Z

Describe

Introduce the RowLevelSecurity and ColumnLevelSecurity for MDL.
Now, we can define a column in the MDL JSON like

        {
          "name": "rls_orderkey",
          "type": "integer",
          "expression": "o_orderkey",
          "rls": {
            "name": "SESSION_STATUS",
            "operator": "EQUALS"
          }
        },
        {
          "name": "cls_orderkey",
          "type": "integer",
          "expression": "o_orderkey",
            "cls": {
                "name": "SESSION_LEVEL",
                "operator": "EQUALS",
                "threshold": "'NORMAL'"
            }
        }

TODO works

Implement the functionality of the data control in the planner.

coderabbitai · 2024-12-26T04:04:12Z

Walkthrough

The pull request introduces a comprehensive enhancement to the data model security and expression normalization in the Rust codebase. New procedural macros and structs are added to support row-level and column-level security features. The changes include the implementation of comparison operators, normalized expression handling, and methods for evaluating security constraints. These modifications enable more granular access control and expression validation within the data modeling framework, with support for both Python and non-Python configurations.

Changes

File	Change Summary
`wren-core-base/manifest-macro/src/lib.rs`	Added procedural macros for row/column level security and normalized expressions
`wren-core-base/src/mdl/builder.rs`	Added methods `row_level_security` and `column_level_security` to `ColumnBuilder`
`wren-core-base/src/mdl/cls.rs`	Implemented `ColumnLevelSecurity` evaluation and `NormalizedExpr` with comparison methods
`wren-core-base/src/mdl/manifest.rs`	Registered new macros for Python and non-Python bindings
`wren-core-base/src/mdl/mod.rs`	Added public module declaration for `cls`
`wren-core-base/tests/data/mdl.json`	Added new models with security features and relationships

Sequence Diagram

sequenceDiagram
    participant ColumnBuilder
    participant NormalizedExpr
    participant ColumnLevelSecurity

    ColumnBuilder->>NormalizedExpr: Create normalized expression
    NormalizedExpr-->>ColumnBuilder: Return normalized expr
    ColumnBuilder->>ColumnLevelSecurity: Set security parameters
    ColumnLevelSecurity->>ColumnLevelSecurity: Evaluate input
    ColumnLevelSecurity-->>ColumnBuilder: Return evaluation result

Poem

🐰 Secure and swift, our data flows,
With macros dancing in neat rows,
Row and column, now controlled,
Security's story sweetly told!
Rabbit's code, a fortress bold! 🔒

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

wren-core-base/src/mdl/cls.rs (2)

26-37: Consider handling parse errors gracefully.
Right now, numeric parsing inside eval can panic if input_expr or the threshold cannot be converted to a float. This might be okay for strictly validated inputs, but in production, it’s safer to handle potential parsing errors.

72-82: Avoid using unwrap() in Numeric comparisons.
Using unwrap() may cause unwanted panics on invalid numeric inputs. Consider using parse().ok() to handle errors gracefully or to return false when parsing fails.

wren-core-base/src/mdl/builder.rs (2)

210-216: Encourage stronger validation for rls config.
The row_level_security method sets name and operator; consider verifying that name is not empty if that is a requirement.

218-230: Check threshold usage in column_level_security.
Currently, NormalizedExpr::new(threshold) will panic on empty strings. Consider handling an empty or invalid numeric threshold gracefully.

wren-core-base/tests/data/mdl.json (4)

67-71: Consider performance implications of nested aggregation

The totalcost calculation performs a nested aggregation through relationships (sum(customer.orders.o_totalprice)). This could impact performance with large datasets. Consider:

Adding appropriate indexes

Implementing materialization if the calculation is frequently used

Adding filters to limit the aggregation scope

113-117: Document the purpose of hash_orderkey

The purpose of hashing the order key isn't clear. If this is for security purposes, MD5 might not be the best choice as it's cryptographically broken.

Consider adding a comment explaining the intended use case.

156-159: Consider explicit column selection in view

Using SELECT * is generally discouraged as it:

Makes the view sensitive to schema changes

Might expose unnecessary columns

Could impact performance

Consider explicitly listing required columns.

161-162: Consider specifying MySQL version compatibility

Adding a minimum supported MySQL version would help ensure compatibility with features used in the schema (especially for security features and functions like MD5).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f79314f and ea8bc72.

📒 Files selected for processing (6)

wren-core-base/manifest-macro/src/lib.rs (2 hunks)
wren-core-base/src/mdl/builder.rs (6 hunks)
wren-core-base/src/mdl/cls.rs (1 hunks)
wren-core-base/src/mdl/manifest.rs (3 hunks)
wren-core-base/src/mdl/mod.rs (1 hunks)
wren-core-base/tests/data/mdl.json (1 hunks)

🔇 Additional comments (14)

wren-core-base/src/mdl/cls.rs (3)

40-59: Validate string detection logic.
is_string relies on single quotes as a strict delimiter. This could be prone to edge cases (e.g., escaped quotes). Consider allowing or validating possible alternative string formats, or re-checking if this is the intended strict usage.

96-102: Check logical consistency of gte and lte.
gte invokes gt or eq; lte invokes lt or eq. This is correct but verify that the type mismatch logic aligns with your overall error handling strategy (i.e., no parse fallback or different data types yield false).

122-264: Thorough test coverage.
The test suite appears robust, covering string vs. numeric types, boundary conditions, and scenarios with mismatched types. This is a good practice to ensure correctness.

wren-core-base/src/mdl/manifest.rs (2)

Line range hint 27-52: Macros usage looks consistent.
All newly introduced macros (e.g., column_level_operator, column_level_security) align with the existing pattern. Their usage in Python and non-Python bindings is coherent.

47-52: Validate serde_with usage.
DeserializeFromStr and SerializeDisplay are used for advanced serialization logic. Be sure to confirm that it doesn’t conflict with existing custom deserializers.

wren-core-base/manifest-macro/src/lib.rs (3)

172-173: New fields rls and cls in Column.
Adding optional security fields is a minimal intrusiveness approach. Confirm that existing code handles these new fields gracefully in older manifests or partial updates.

348-367: Row-level security macro.
Generates a straightforward struct with name and operator. Implementation looks consistent with the pattern used in other macros.

447-467: NormalizedExpr macro approach is flexible.
Using SerializeDisplay and DeserializeFromStr in the generated struct ensures smooth serialization and deserialization. Good job.

wren-core-base/src/mdl/builder.rs (2)

398-399: Test coverage for row_level_security.
The test test_column_roundtrip includes a row-level security scenario. This further confirms that RLS is integrated into the serialization/deserialization flow.

693-708: Full integration test for RLS and CLS.
Verifying the behavior in mdl.json merges row-level and column-level security with expression-based logic. This is a comprehensive integration test.

wren-core-base/src/mdl/mod.rs (1)

21-21: Public cls module export.
Exposing cls publicly is consistent with the code changes for column-level security.

wren-core-base/tests/data/mdl.json (3)

1-4: LGTM!

The schema configuration follows standard patterns.

12-41: LGTM!

The customer model is well-structured with:

Appropriate column types
Clear calculated column definition
Well-documented properties

141-154: LGTM!

Relationships are well-defined with:

Clear join conditions using primary keys
Appropriate join types (ONE_TO_MANY, ONE_TO_ONE)

wren-core-base/tests/data/mdl.json

add rls and cls to wren-core mdl base

ea8bc72

coderabbitai bot reviewed Dec 26, 2024

View reviewed changes

wren-core-base/tests/data/mdl.json Show resolved Hide resolved

wren-core-base/tests/data/mdl.json Show resolved Hide resolved

goldmedal requested a review from wwwy3y3 December 26, 2024 05:07

wwwy3y3 approved these changes Dec 26, 2024

View reviewed changes

goldmedal merged commit cd89fd1 into Canner:main Dec 26, 2024

goldmedal deleted the feature/add-cls-rls branch December 26, 2024 05:19

coderabbitai bot mentioned this pull request Dec 27, 2024

feat(core): sync DataFusion for the latest SQL syntax #1017

Merged

coderabbitai bot mentioned this pull request Apr 30, 2025

feat(core): introduce the row-level access control for the model #1161

Merged

coderabbitai bot mentioned this pull request Jul 21, 2025

feat(core): apply default nulls last policy for ordering #1262

Merged

coderabbitai bot mentioned this pull request Mar 6, 2026

feat(mcp-server): embed MCP server in Docker image with skills and quickstart guide #1425

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): introduce data control attributes for wren MDL base#1014

feat(core): introduce data control attributes for wren MDL base#1014
goldmedal merged 1 commit intoCanner:mainfrom
goldmedal:feature/add-cls-rls

goldmedal commented Dec 26, 2024 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 26, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

goldmedal commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe

TODO works

Uh oh!

coderabbitai bot commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

goldmedal commented Dec 26, 2024 •

edited

Loading

coderabbitai bot commented Dec 26, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)