feat(core): implement column-level access control for the model#1211
feat(core): implement column-level access control for the model#1211douenergy merged 11 commits intoCanner:mainfrom
Conversation
WalkthroughThis update introduces column-level access control (CLAC) based on session properties throughout the codebase. It extends model definitions, context creation, and SQL transformation logic to support CLAC rules, propagates session properties, and updates test cases to validate CLAC enforcement. Several APIs and function signatures are modified to accept and utilize session properties. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API/Router
participant Rewriter
participant SessionContext
participant Core/Manifest
Client->>API/Router: POST /query (with optional session headers)
API/Router->>Rewriter: rewrite(sql, properties)
Rewriter->>SessionContext: get_session_context(manifest, path, properties)
SessionContext->>Core/Manifest: analyze(manifest, properties)
Core/Manifest->>SessionContext: analyzed_mdl
Rewriter->>SessionContext: transform_sql(sql)
SessionContext->>Core/Manifest: transform_sql_with_ctx(sql, properties)
Core/Manifest->>Core/Manifest: Filter columns via CLAC rules (using properties)
Core/Manifest->>SessionContext: rewritten_sql
SessionContext->>Rewriter: rewritten_sql
Rewriter->>API/Router: rewritten_sql
API/Router->>Client: Query result (columns filtered by CLAC)
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
wren-core-py/tests/test_modeling_core.py (1)
354-366: Effective CLAC test implementation with minor style issues.The test correctly verifies that the
c_namecolumn is filtered out whensession_levelis "2" (not equal to the threshold "1"). The assertion confirms that onlyc_custkeyremains in the rewritten SQL.Consider addressing the static analysis issues:
+def test_clac(): + """Test column-level access control filtering based on session properties.""" headers = { "session_level": "2", } properties_hashable = frozenset(headers.items()) if headers else None session_context = SessionContext(manifest_str, None, properties_hashable) sql = "SELECT * FROM my_catalog.my_schema.customer" rewritten_sql = session_context.transform_sql(sql) + expected_sql = ( + "SELECT customer.c_custkey FROM (SELECT customer.c_custkey FROM " + "(SELECT __source.c_custkey AS c_custkey FROM main.customer AS __source) " + "AS customer) AS customer" + ) - assert ( - rewritten_sql - == "SELECT customer.c_custkey FROM (SELECT customer.c_custkey FROM (SELECT __source.c_custkey AS c_custkey FROM main.customer AS __source) AS customer) AS customer" - ) + assert rewritten_sql == expected_sql🧰 Tools
🪛 Pylint (3.3.7)
[convention] 365-365: Line too long (172/100)
(C0301)
[convention] 354-354: Missing function or method docstring
(C0116)
ibis-server/app/mdl/rewriter.py (2)
136-149: LGTM! Correct implementation for session properties handling.The conversion to
frozensetof tuples is appropriate for passing to the RustSessionContext. Consider adding a docstring to explain the return type change.Add a docstring to clarify the purpose and return type:
def get_session_properties(self, properties: dict) -> frozenset | None: + """ + Extract and process session properties from the request headers. + + Filters properties with X_WREN_VARIABLE_PREFIX and returns them as a + frozenset of (key, value) tuples for compatibility with Rust bindings. + + Args: + properties: Dictionary of request properties + + Returns: + frozenset of (key, value) tuples or None if no properties + """ if properties is None: return None🧰 Tools
🪛 Pylint (3.3.7)
[convention] 136-136: Missing function or method docstring
(C0116)
133-134: Consider explicit exception chaining for better debugging.The static analysis tool correctly identifies that exception context would be preserved with explicit chaining.
except Exception as e: - raise RewriteError(str(e)) + raise RewriteError(str(e)) from e🧰 Tools
🪛 Pylint (3.3.7)
[warning] 134-134: Consider explicitly re-raising using 'raise RewriteError(str(e)) from e'
(W0707)
wren-core-py/src/context.rs (1)
125-159: Well-implemented properties parsing with good error handling.The conversion from Python
frozensetto RustHashMapis handled correctly with proper validation of tuple structure. The error messages are clear and helpful.Consider extracting the properties parsing logic into a separate helper function for better testability and reusability:
fn parse_properties_from_frozenset( py: Python, properties: Option<PyObject> ) -> PyResult<HashMap<String, Option<String>>> { // Move lines 126-158 here }wren-core/core/src/mdl/mod.rs (1)
194-210: Consider simplifying the column filtering logicWhile the current implementation is correct, you could simplify it by combining the filtering operations:
-let available_columns = model - .columns - .iter() - .map(|column| { - if validate_clac_rule(column, &properties)? { - Ok(Some(Arc::clone(column))) - } else { - Ok(None) - } - }) - .collect::<Result<Vec<_>>>()?; -let fields: Vec<_> = available_columns - .into_iter() - .filter(|c| c.is_some()) - .filter_map(|column| { - Self::infer_source_column(&column.unwrap()).ok().flatten() - }) +let fields: Vec<_> = model + .columns + .iter() + .filter_map(|column| { + match validate_clac_rule(column, &properties) { + Ok(true) => Self::infer_source_column(column).ok().flatten(), + _ => None, + } + })However, this would lose error propagation. The current approach correctly handles errors from
validate_clac_rule.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
wren-core-py/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
ibis-server/app/mdl/core.py(1 hunks)ibis-server/app/mdl/rewriter.py(1 hunks)ibis-server/tests/routers/v3/connector/postgres/test_query.py(2 hunks)wren-core-base/Cargo.toml(1 hunks)wren-core-base/manifest-macro/src/lib.rs(3 hunks)wren-core-base/src/mdl/builder.rs(4 hunks)wren-core-base/src/mdl/cls.rs(2 hunks)wren-core-base/src/mdl/manifest.rs(5 hunks)wren-core-py/Cargo.toml(1 hunks)wren-core-py/src/context.rs(7 hunks)wren-core-py/tests/test_modeling_core.py(3 hunks)wren-core/benchmarks/src/tpch/run.rs(2 hunks)wren-core/core/src/logical_plan/analyze/access_control.rs(8 hunks)wren-core/core/src/logical_plan/analyze/plan.rs(8 hunks)wren-core/core/src/logical_plan/context_provider.rs(0 hunks)wren-core/core/src/logical_plan/mod.rs(0 hunks)wren-core/core/src/logical_plan/utils.rs(0 hunks)wren-core/core/src/mdl/context.rs(6 hunks)wren-core/core/src/mdl/mod.rs(71 hunks)wren-core/core/src/mdl/utils.rs(9 hunks)wren-core/wren-example/examples/calculation-invoke-calculation.rs(2 hunks)wren-core/wren-example/examples/datafusion-apply.rs(1 hunks)wren-core/wren-example/examples/plan-sql.rs(1 hunks)wren-core/wren-example/examples/row-level-access-control.rs(1 hunks)wren-core/wren-example/examples/view.rs(1 hunks)
💤 Files with no reviewable changes (3)
- wren-core/core/src/logical_plan/utils.rs
- wren-core/core/src/logical_plan/mod.rs
- wren-core/core/src/logical_plan/context_provider.rs
🧰 Additional context used
🧠 Learnings (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: ibis-server/app/routers/v3/connector.py:78-83
Timestamp: 2025-05-05T02:27:29.829Z
Learning: The row-level access control implementation in Wren Engine filters headers with the prefix `X_WREN_VARIABLE_PREFIX` in `EmbeddedEngineRewriter.get_session_properties` and validates session property expressions in `access_control.rs` to ensure they only contain literal values, preventing SQL injection.
🧬 Code Graph Analysis (6)
wren-core/benchmarks/src/tpch/run.rs (2)
wren-core/core/src/mdl/mod.rs (5)
mdl(189-216)new(121-176)new(445-447)analyze(73-79)default(61-69)wren-core/benchmarks/src/tpch/mod.rs (1)
tpch_manifest(35-154)
wren-core/wren-example/examples/calculation-invoke-calculation.rs (2)
wren-core-py/src/context.rs (1)
new(79-186)wren-core/core/src/logical_plan/analyze/plan.rs (6)
new(70-83)new(110-125)new(704-706)new(821-915)new(969-1016)new(1075-1077)
wren-core-py/tests/test_modeling_core.py (2)
wren-core-py/src/context.rs (1)
transform_sql(190-209)wren-core/core/src/mdl/mod.rs (1)
transform_sql(351-365)
wren-core/core/src/mdl/utils.rs (2)
wren-core/core/src/mdl/context.rs (1)
new(257-274)wren-core/core/src/mdl/mod.rs (5)
new(121-176)new(445-447)analyze(73-79)mdl(189-216)default(61-69)
ibis-server/app/mdl/rewriter.py (3)
wren-core/core/src/mdl/context.rs (1)
properties(74-80)ibis-server/app/mdl/core.py (1)
get_session_context(7-10)wren-core-py/src/context.rs (1)
transform_sql(190-209)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (2)
ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
connection_info(37-44)
🪛 Pylint (3.3.7)
wren-core-py/tests/test_modeling_core.py
[convention] 365-365: Line too long (172/100)
(C0301)
[convention] 354-354: Missing function or method docstring
(C0116)
ibis-server/app/mdl/rewriter.py
[warning] 134-134: Consider explicitly re-raising using 'raise RewriteError(str(e)) from e'
(W0707)
[convention] 136-136: Missing function or method docstring
(C0116)
ibis-server/tests/routers/v3/connector/postgres/test_query.py
[convention] 544-544: Missing function or method docstring
(C0116)
[warning] 544-544: Redefining name 'manifest_str' from outer scope (line 123)
(W0621)
🔇 Additional comments (42)
wren-core-py/Cargo.toml (1)
12-12: LGTM! Consistent dependency version update.This version update aligns with the same change in
wren-core-base/Cargo.toml, maintaining consistency across the codebase for the CLAC feature support.wren-core/wren-example/examples/datafusion-apply.rs (1)
81-82: LGTM! Correct adaptation to new session properties type.The change from
HashMap::new()toHashMap::new().into()properly converts the empty map to the expectedSessionPropertiesReftype (likelyArc<HashMap<_, _>>) required by the updatedtransform_sql_with_ctxfunction. This aligns with the CLAC feature implementation that uses reference-counted session properties.wren-core/wren-example/examples/calculation-invoke-calculation.rs (1)
84-84: LGTM! Consistent session properties type conversion.Both calls to
transform_sql_with_ctxcorrectly useHashMap::new().into()to convert empty maps to the expectedSessionPropertiesReftype. This maintains consistency with the CLAC feature's session properties handling and aligns with similar changes across other example files.Also applies to: 110-110
wren-core/wren-example/examples/view.rs (2)
15-18: LGTM! Session properties correctly added to analyze call.The addition of
Arc::new(HashMap::default())as the second parameter toAnalyzedWrenMDL::analyzeproperly supports the new session properties parameter for column-level access control. Using an empty HashMap is appropriate for this basic example.
26-26: Proper conversion for updated transform_sql_with_ctx signature.The
.into()conversion correctly adapts to the updatedtransform_sql_with_ctxfunction signature that now expects session properties in a wrapped form (likelySessionPropertiesRef).wren-core/wren-example/examples/row-level-access-control.rs (1)
114-115: Correct API update for session properties handling.The change from passing
propertiesdirectly to usingproperties.into()properly adapts to the updatedtransform_sql_with_ctxsignature. This maintains the existing row-level access control functionality while supporting the new column-level access control infrastructure.wren-core/benchmarks/src/tpch/run.rs (2)
53-56: Proper session properties integration for benchmark code.The addition of
Arc::new(HashMap::default())correctly provides empty session properties to theAnalyzedWrenMDL::analyzecall. This is appropriate for TPC-H benchmark scenarios that don't require access control policies.
68-68: Correct API adaptation for transform_sql_with_ctx.The
.into()conversion properly adapts to the updated function signature that expects session properties in wrapped form, maintaining benchmark functionality while supporting the new CLAC infrastructure.wren-core/wren-example/examples/plan-sql.rs (2)
13-16: Consistent session properties integration.The update to include
Arc::new(HashMap::default())as the second parameter maintains consistency with other example files and properly supports the new session properties requirement for column-level access control.
24-24: Proper function signature adaptation.The
.into()conversion correctly handles the updatedtransform_sql_with_ctxsignature, ensuring the example works with the new CLAC-enabled API while maintaining its core functionality.wren-core/core/src/logical_plan/analyze/plan.rs (3)
10-11: LGTM: Clean type alias to avoid naming conflicts.The renaming of
ColumntoDFColumnclearly distinguishes between DataFusion'sColumntype and the customColumntype used in the manifest. This improves code clarity and prevents naming conflicts.
237-241: Consistent usage of the renamed type alias.The update to use
DFColumn::from_qualified_nameis consistent with the import alias change and maintains the same functionality.
406-406: Function signature updated consistently.All function signatures that previously used
&Columnhave been correctly updated to use&DFColumn, maintaining consistency with the type alias change.ibis-server/app/mdl/core.py (1)
7-10: Well-designed extension for session properties support.The addition of the
propertiesparameter with a default value ofNonemaintains backward compatibility while enabling session-based access control. Usingfrozensetis appropriate since the function is cached and requires hashable parameters.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 7-7: Missing function or method docstring
(C0116)
wren-core-base/src/mdl/manifest.rs (2)
28-31: Consistent CLAC integration across binding types.The addition of
column_level_access_controlimport and macro invocation is properly implemented for both non-Python and Python bindings, ensuring feature parity across different usage contexts.Also applies to: 66-69
300-306: Well-implemented accessor method for CLAC.The
column_level_access_control()method follows Rust conventions by returningOption<Arc<ColumnLevelAccessControl>>and properly handles the optional field with appropriate cloning of the Arc.wren-core-py/tests/test_modeling_core.py (2)
29-43: Comprehensive CLAC configuration in test manifest.The column-level access control configuration is well-structured with:
- Clear naming (
c_name_access)- Proper session property definition
- Appropriate operator (
EQUALS) and threshold ("1")This configuration effectively demonstrates the CLAC feature.
312-315: Improved session properties handling.The change from passing
headersdirectly to usingfrozenset(headers.items())and removing the redundantheadersparameter fromtransform_sqlis correct and aligns with the new session properties architecture.wren-core-base/src/mdl/cls.rs (1)
19-21: LGTM: Import update correctly includes ColumnLevelAccessControl.The import statement properly includes the new
ColumnLevelAccessControltype alongside the existing types.wren-core/core/src/mdl/utils.rs (9)
46-46: LGTM: Explicit handling of ControlFlow return value.Good practice to explicitly ignore the
ControlFlowreturn value fromvisit_expressionsrather than letting it be implicitly dropped.
117-117: LGTM: Consistent explicit handling of ControlFlow return value.Consistently applying the same pattern of explicitly ignoring
ControlFlowreturn values across the codebase.
184-184: LGTM: Consistent explicit handling of ControlFlow return value.Maintaining consistency in explicitly ignoring
ControlFlowreturn values.
248-248: LGTM: Consistent explicit handling of ControlFlow return value.Final instance of the consistent pattern for handling
ControlFlowreturn values.
261-261: LGTM: Added HashMap import for session properties.Correctly added import for
HashMapto support the default session properties parameter.
281-282: LGTM: Updated function signature to include session properties.Correctly updated the test to pass an empty
HashMapas session properties to match the newAnalyzedWrenMDL::analyzesignature. This aligns with the broader CLAC implementation that requires session properties for analysis.
310-311: LGTM: Consistent session properties parameter.Maintaining consistency across all test functions by providing the required session properties parameter.
360-361: LGTM: Consistent session properties parameter.Another instance of the consistent pattern for providing session properties to the analyze function.
381-382: LGTM: Consistent session properties parameter.Final test function correctly updated to include the required session properties parameter.
wren-core-base/manifest-macro/src/lib.rs (3)
187-189: LGTM: Proper migration strategy from CLS to CLAC.The deprecation of the existing
clsfield while introducing the newcolumn_level_access_controlfield provides a clear migration path. The new field correctly usesArc<ColumnLevelAccessControl>for efficient memory sharing.
471-471: LGTM: Consistent deprecation of legacy macro.Properly marking the old
column_level_securitymacro as deprecated to guide users toward the new CLAC implementation.
493-517: LGTM: Well-structured CLAC macro implementation.The new
column_level_access_controlmacro follows the established patterns and correctly includes:
- Conditional Python binding support
- Proper serde configuration with camelCase
- All required fields:
name,required_properties,operator, andthreshold- Consistent structure with other access control macros
The inclusion of
required_propertiesasVec<SessionProperty>is crucial for the session-based access control functionality.ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
73-87: LGTM: Well-structured CLAC configuration.The
columnLevelAccessControlconfiguration is properly structured with:
- Clear naming (
c_name_access)- Correct session property specification (
session_level)- Appropriate operator (
EQUALS) and threshold (1)- Proper boolean value for
required: FalseThis configuration will effectively test the CLAC functionality where access to the
c_namecolumn depends on thesession_levelsession property being equal to "1".wren-core-base/src/mdl/builder.rs (2)
32-32: LGTM! Clean implementation of CLAC builder support.The changes follow the established builder pattern consistently. The
column_level_access_controlfield is properly initialized asNoneand the builder method correctly creates anArc<ColumnLevelAccessControl>with all necessary parameters.Also applies to: 210-210, 271-286
460-465: Good test coverage for CLAC builder.The test properly verifies that columns with CLAC can be serialized and deserialized correctly.
wren-core/core/src/logical_plan/analyze/access_control.rs (3)
273-305: Solid CLAC validation implementation.The validation logic correctly handles session properties and default expressions. However, the limitation to one required property per CLAC rule (lines 285-290) needs clarification.
Is the single property limitation a design requirement? If so, consider documenting this constraint in the PR description or code comments to help future maintainers understand this architectural decision.
248-250: Good improvement to error messages.The more concise error messages are clearer and easier to read.
Also applies to: 397-397, 407-407, 417-417, 532-532
41-41: Correct handling of visitor return values.Properly ignoring the
ControlFlowreturn values from visitor functions since they're not needed in this context.Also applies to: 131-131
wren-core-py/src/context.rs (2)
50-50: LGTM! Properties field properly integrated.The
propertiesfield is correctly added asArc<HashMap<String, Option<String>>>and properly initialized in the default implementation.Also applies to: 65-65
189-209: Good simplification of transform_sql method.Removing the
propertiesparameter and using the stored field simplifies the API. The method correctly passes the stored properties totransform_sql_with_ctx.Is the
env_logger::try_init()call (line 194) intentionally placed here? It seems unrelated to the CLAC changes and might be better placed in module initialization.wren-core/core/src/mdl/context.rs (2)
72-81: Good practice: Normalizing property keys to lowercaseThis ensures case-insensitive handling of session properties, which improves robustness and user experience.
257-274: Well-implemented CLAC filtering logicThe column filtering implementation correctly:
- Validates each column against CLAC rules using the provided session properties
- Preserves Arc references for valid columns
- Properly handles error propagation
- Efficiently filters out None values using
flatten()wren-core/core/src/mdl/mod.rs (1)
2328-2551: Excellent test coverage for CLAC functionalityThe test suite comprehensively covers:
- Required session properties validation and error handling
- Optional properties with default values
- CLAC rules on calculated fields with proper dependency handling
- Column visibility based on session property values
The use of
assert_snapshot!makes the expected SQL transformations clear and maintainable.
|
Thanks @goldmedal |
Description
This PR introduces the column-level access control(CLAC) for the model in the Rust implementation. CLAC is used to apply a rule to control whether a column is accessible according to the session properties.
Spec
{ "name": "c_name", "type": "varchar", "columnLevelAccessControl": { "name": "c_name_access", "requiredProperties": [ { "name": "session_level", "required": False, } ], "operator": "EQUALS", "threshold": "1", }, },columnLevelAccessControlis an attribute of a column.name: The display name of the rule.requiredProperties: The required properties for this rule. It's the same with feat(core): introduce the row-level access control for the model #1161.operator: The operator is used for the rule. The supported operators:EQUALSNOT_EQUALSGREATER_THANLESS_THANGREATER_THAN_OR_EQUALSLESS_THAN_OR_EQUALSthreshold: The threshold of this rule.Example
The above case shows how to define a CLAC for the column
c_name. If the session properties isWren core will check the following formula:
If it's passed, the column is accessible. Otherwise, the column won't be found.
Summary by CodeRabbit