feat(core): introduce data control attributes for wren MDL base#1014
feat(core): introduce data control attributes for wren MDL base#1014goldmedal merged 1 commit intoCanner:mainfrom
Conversation
WalkthroughThe pull request introduces a comprehensive enhancement to the data model security and expression normalization in the Rust codebase. New procedural macros and structs are added to support row-level and column-level security features. The changes include the implementation of comparison operators, normalized expression handling, and methods for evaluating security constraints. These modifications enable more granular access control and expression validation within the data modeling framework, with support for both Python and non-Python configurations. Changes
Sequence DiagramsequenceDiagram
participant ColumnBuilder
participant NormalizedExpr
participant ColumnLevelSecurity
ColumnBuilder->>NormalizedExpr: Create normalized expression
NormalizedExpr-->>ColumnBuilder: Return normalized expr
ColumnBuilder->>ColumnLevelSecurity: Set security parameters
ColumnLevelSecurity->>ColumnLevelSecurity: Evaluate input
ColumnLevelSecurity-->>ColumnBuilder: Return evaluation result
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (8)
wren-core-base/src/mdl/cls.rs (2)
26-37: Consider handling parse errors gracefully.
Right now, numeric parsing insideevalcan panic ifinput_expror the threshold cannot be converted to a float. This might be okay for strictly validated inputs, but in production, it’s safer to handle potential parsing errors.
72-82: Avoid usingunwrap()inNumericcomparisons.
Usingunwrap()may cause unwanted panics on invalid numeric inputs. Consider usingparse().ok()to handle errors gracefully or to returnfalsewhen parsing fails.wren-core-base/src/mdl/builder.rs (2)
210-216: Encourage stronger validation for rls config.
Therow_level_securitymethod setsnameandoperator; consider verifying thatnameis not empty if that is a requirement.
218-230: Check threshold usage incolumn_level_security.
Currently,NormalizedExpr::new(threshold)will panic on empty strings. Consider handling an empty or invalid numeric threshold gracefully.wren-core-base/tests/data/mdl.json (4)
67-71: Consider performance implications of nested aggregationThe
totalcostcalculation performs a nested aggregation through relationships (sum(customer.orders.o_totalprice)). This could impact performance with large datasets. Consider:
- Adding appropriate indexes
- Implementing materialization if the calculation is frequently used
- Adding filters to limit the aggregation scope
113-117: Document the purpose of hash_orderkeyThe purpose of hashing the order key isn't clear. If this is for security purposes, MD5 might not be the best choice as it's cryptographically broken.
Consider adding a comment explaining the intended use case.
156-159: Consider explicit column selection in viewUsing
SELECT *is generally discouraged as it:
- Makes the view sensitive to schema changes
- Might expose unnecessary columns
- Could impact performance
Consider explicitly listing required columns.
161-162: Consider specifying MySQL version compatibilityAdding a minimum supported MySQL version would help ensure compatibility with features used in the schema (especially for security features and functions like MD5).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
wren-core-base/manifest-macro/src/lib.rs(2 hunks)wren-core-base/src/mdl/builder.rs(6 hunks)wren-core-base/src/mdl/cls.rs(1 hunks)wren-core-base/src/mdl/manifest.rs(3 hunks)wren-core-base/src/mdl/mod.rs(1 hunks)wren-core-base/tests/data/mdl.json(1 hunks)
🔇 Additional comments (14)
wren-core-base/src/mdl/cls.rs (3)
40-59: Validate string detection logic.
is_string relies on single quotes as a strict delimiter. This could be prone to edge cases (e.g., escaped quotes). Consider allowing or validating possible alternative string formats, or re-checking if this is the intended strict usage.
96-102: Check logical consistency of gte and lte.
gte invokes gt or eq; lte invokes lt or eq. This is correct but verify that the type mismatch logic aligns with your overall error handling strategy (i.e., no parse fallback or different data types yield false).
122-264: Thorough test coverage.
The test suite appears robust, covering string vs. numeric types, boundary conditions, and scenarios with mismatched types. This is a good practice to ensure correctness.
wren-core-base/src/mdl/manifest.rs (2)
Line range hint 27-52: Macros usage looks consistent.
All newly introduced macros (e.g., column_level_operator, column_level_security) align with the existing pattern. Their usage in Python and non-Python bindings is coherent.
47-52: Validate serde_with usage.
DeserializeFromStr and SerializeDisplay are used for advanced serialization logic. Be sure to confirm that it doesn’t conflict with existing custom deserializers.
wren-core-base/manifest-macro/src/lib.rs (3)
172-173: New fields rls and cls in Column.
Adding optional security fields is a minimal intrusiveness approach. Confirm that existing code handles these new fields gracefully in older manifests or partial updates.
348-367: Row-level security macro.
Generates a straightforward struct with name and operator. Implementation looks consistent with the pattern used in other macros.
447-467: NormalizedExpr macro approach is flexible.
Using SerializeDisplay and DeserializeFromStr in the generated struct ensures smooth serialization and deserialization. Good job.
wren-core-base/src/mdl/builder.rs (2)
398-399: Test coverage for row_level_security.
The test test_column_roundtrip includes a row-level security scenario. This further confirms that RLS is integrated into the serialization/deserialization flow.
693-708: Full integration test for RLS and CLS.
Verifying the behavior in mdl.json merges row-level and column-level security with expression-based logic. This is a comprehensive integration test.
wren-core-base/src/mdl/mod.rs (1)
21-21: Public cls module export.
Exposing cls publicly is consistent with the code changes for column-level security.
wren-core-base/tests/data/mdl.json (3)
1-4: LGTM!
The schema configuration follows standard patterns.
12-41: LGTM!
The customer model is well-structured with:
- Appropriate column types
- Clear calculated column definition
- Well-documented properties
141-154: LGTM!
Relationships are well-defined with:
- Clear join conditions using primary keys
- Appropriate join types (ONE_TO_MANY, ONE_TO_ONE)
Describe
Introduce the
RowLevelSecurityandColumnLevelSecurityfor MDL.Now, we can define a column in the MDL JSON like
{ "name": "rls_orderkey", "type": "integer", "expression": "o_orderkey", "rls": { "name": "SESSION_STATUS", "operator": "EQUALS" } }, { "name": "cls_orderkey", "type": "integer", "expression": "o_orderkey", "cls": { "name": "SESSION_LEVEL", "operator": "EQUALS", "threshold": "'NORMAL'" } }TODO works