Skip to content

feat(core): introduce the row-level access control for the model#1161

Merged
douenergy merged 30 commits intoCanner:mainfrom
goldmedal:feat/rls-refactor
May 6, 2025
Merged

feat(core): introduce the row-level access control for the model#1161
douenergy merged 30 commits intoCanner:mainfrom
goldmedal:feat/rls-refactor

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented Apr 24, 2025

Description

This PR introduces the row-level access control (RLAC) for the model in the Rust implementation. RLAC is used to apply some filters to a model. According to the carried session properties, we can change the filter condition dynamically.

Spec

[
	{ // model object
		"name": "user",
		"columns": [ .. ],
		"rowLevelAccessControls": [
			{
			    "name": "level rolw region rule",
			    "requiredProperties": [
			        {
			            "name": "SESION_LEVEL",
			            "required": false,
			            "default_expr": "1"
			        },
			        {
			            "name": "SESSION_ROLE",
			            "required": true
			        },
			        {
			            "name": "SESSION_REGION",
			            "required": false
			        }
			    ],
			    "condition": "level = @SESSION_LEVEL AND role = @SESSION_ROLE OR region = @SESSION_REGION"
			},
		]
	}
]
  • name : the display rule name
    • To carry the session property, the user needs to carry the prefix X-Wren-Variable- for it.
  • requiredProperties : the required properties of this rule.
    • name : The name of the property. It’s case-insensitive.
    • required :
      • If true, it’s a required session property. The request will be rejected without the property.
      • If false, it’s an optional session property. The request will be ignored without the property.
    • default_expr : The default value of the property. If it’s an optional property, the default value will be applied if the session property is missing.
    • If any required property is missing in the list, the request is rejected.
    • If any optional property without the default is missing in the list, the rule is ignored.
  • condition: The condition will be applied to the model. It should be a boolean expression. It will be appended to the WHERE clause of the model.
    • Any identifier is started with @ means that it’s a session property. The engine will replace the value in the header for it.

Session Property Header

  • To carry the session property, the user must carry the prefix X-Wren-Variable-. For example, given a required session property called SESSION_LEVEL. The input header should be named X-WREN-VARIABLE-SESSION_LEVEL.
  • The value is only allowed to be a literal expression.

Example

There are some examples to show how it works

All property

X-Wren-Variable-SESION_LEVEL: 2
X-Wren-Variable-SESSION_ROLE: 'manager'
X-Wren-Variable-SESSION_REGION: 'US'

SELECT * FROM orders
# Result: 
SELECT * FROM orders WHERE level = 2 AND role = 'manager' OR region = 'US'

Missing Optional with default

apply the default value

X-Wren-Variable-SESSION_ROLE: 'manager'
X-Wren-Variable-SESSION_REGION: 'US'

SELECT * FROM orders
# Result: 
SELECT * FROM orders WHERE level = 1 AND role = 'manager' OR region = 'US'

Missing Optional without default

  • ignore the rule
X-Wren-Variable-SESION_LEVEL: 2
X-Wren-Variable-SESSION_ROLE: 'manager'


SELECT * FROM orders
# Result: 
SELECT * FROM orders

Missing the required property

  • The request fails
X-Wren-Variable-SESION_LEVEL: 2
X-Wren-Variable-SESSION_REGION: 'US'

SELECT * FROM orders
# Result:
sql failure

The Rust example

Check wren-example/examples/row-level-access-control.rs for an executable example.

cargo run --package wren-example --example row-level-access-control

Known Issues

  • The condition should consider CompoundIdentifier and CompoundFieldAccess

Summary by CodeRabbit

  • New Features

    • Introduced row-level access control (RLAC) for models, enabling fine-grained data filtering based on session properties.
    • Added support for defining RLAC rules and session properties in manifests.
    • Session properties from request headers are now integrated into the query rewriting and execution pipeline to enforce RLAC.
    • Added a new API header prefix to pass session variables for RLAC.
    • Provided a new example demonstrating multi-tenant RLAC usage with DataFusion.
  • Bug Fixes

    • Enhanced handling of case-insensitive session property names in RLAC.
  • Tests

    • Added extensive tests validating RLAC behavior with required and optional session properties and verifying correct SQL rewriting.
  • Documentation

    • Updated manifests and examples to illustrate RLAC configuration and usage.
  • Chores

    • Updated dependencies and test utilities to support new RLAC features.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 24, 2025

Walkthrough

This set of changes introduces comprehensive support for row-level access control (RLAC) based on session properties throughout the Wren modeling and query stack. The Rust modeling core, procedural macros, and builder patterns are extended to allow models to declare RLAC rules with required or optional session properties and filter conditions. The logical plan analysis and generation layers are refactored to propagate session properties, validate RLAC rules, and inject filter expressions into query plans. Python bindings and server-side components are updated to pass header-derived session properties into the rewriting and transformation pipeline. Extensive tests and new examples are added to verify RLAC behavior, including error cases, property case-insensitivity, and integration with DataFusion. Supporting utilities and constants are introduced in both Python and Rust codebases.

Changes

File(s) Change Summary
ibis-server/app/mdl/rewriter.py, ibis-server/app/routers/v3/connector.py, ibis-server/app/dependencies.py Extended rewriter and router logic to accept and propagate session properties from headers. Added utility for detecting RLAC headers.
ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py, ibis-server/tests/routers/v3/connector/postgres/test_query.py Added tests for RLAC header presence and RLAC query filtering with session properties in v3 API.
wren-core-base/manifest-macro/src/lib.rs, wren-core-base/src/mdl/builder.rs, wren-core-base/src/mdl/manifest.rs Introduced RLAC structures, macros, and builder methods. Marked old RLS macros as deprecated. Added accessors for RLAC rules.
wren-core-base/tests/data/mdl.json Added RLAC rules to the "customer" model for test coverage.
wren-core-py/src/context.rs, wren-core-py/src/manifest.rs Updated session context and SQL transformation to handle optional session properties. Adjusted tests for new RLAC field.
wren-core-py/tests/test_modeling_core.py Added RLAC test using session properties and validated SQL rewriting.
wren-core/Cargo.toml, wren-core/core/Cargo.toml Added insta as a dev dependency for snapshot testing.
wren-core/benchmarks/src/tpch/run.rs, wren-core/sqllogictest/src/test_context.rs, wren-core/wren-example/examples/calculation-invoke-calculation.rs, wren-core/wren-example/examples/datafusion-apply.rs, wren-core/wren-example/examples/plan-sql.rs, wren-core/wren-example/examples/to-many-calculation.rs, wren-core/wren-example/examples/view.rs Updated function calls to pass empty session properties as required by new signatures.
wren-core/wren-example/examples/row-level-access-control.rs Added new example demonstrating RLAC in a multi-tenant scenario.
wren-core/core/src/logical_plan/analyze/access_control.rs New module implementing RLAC condition parsing, validation, and filter expression construction. Includes comprehensive tests.
wren-core/core/src/logical_plan/analyze/mod.rs Declared new access_control module.
wren-core/core/src/logical_plan/analyze/model_anlayze.rs, wren-core/core/src/logical_plan/analyze/model_generation.rs, wren-core/core/src/logical_plan/analyze/plan.rs, wren-core/core/src/logical_plan/analyze/relation_chain.rs Refactored to propagate session properties, validate RLAC rules, and inject RLAC filters into logical plans.
wren-core/core/src/mdl/context.rs, wren-core/core/src/mdl/mod.rs Added session properties type and propagated through context creation and SQL transformation. Added RLAC tests using snapshot assertions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API_Server
    participant Rewriter
    participant EmbeddedEngine
    participant SessionContext
    participant WrenEngine

    Client->>API_Server: POST /query with headers (session props)
    API_Server->>Rewriter: rewrite(sql, properties=dict(headers))
    Rewriter->>EmbeddedEngine: rewrite(manifest, sql, properties)
    EmbeddedEngine->>SessionContext: transform_sql(sql, session_properties)
    SessionContext->>WrenEngine: transform_sql_with_ctx(..., session_properties)
    WrenEngine-->>SessionContext: rewritten SQL (with RLAC filter)
    SessionContext-->>EmbeddedEngine: rewritten SQL
    EmbeddedEngine-->>Rewriter: rewritten SQL
    Rewriter-->>API_Server: rewritten SQL
    API_Server-->>Client: Query result
Loading

Suggested reviewers

  • wwwy3y3

Poem

In the warren where the data flows,
A new control on rows now grows—
With session keys and headers neat,
Only the right rabbits get their treat.
RLAC hops in, so clever and spry,
Guarding the carrots as queries go by!
🥕✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15e307b and 1c2b05d.

📒 Files selected for processing (1)
  • ibis-server/app/routers/v3/connector.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/routers/v3/connector.py
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo check
  • GitHub Check: test
  • GitHub Check: ci
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core ibis dependencies Pull requests that update a dependency file rust Pull requests that update Rust code python Pull requests that update Python code labels Apr 24, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (4)
wren-core/core/src/mdl/context.rs (1)

67-69: ⚠️ Potential issue

Fix cargo fmt error.

The pipeline reports a formatting issue on this line.

Run cargo fmt on this file to fix the formatting issue.

🧰 Tools
🪛 GitHub Actions: Rust

[error] 69-69: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.

ibis-server/app/routers/v3/connector.py (1)

304-313: 🛠️ Refactor suggestion

Missing RLAC fallback check in model_substitute endpoint.

Unlike other endpoints, the model_substitute endpoint doesn't check for RLAC headers before falling back to v2.

Add the same RLAC fallback check logic to maintain consistency:

         try:
             sql = ModelSubstitute(data_source, dto.manifest_str, headers).substitute(
                 dto.sql
             )
             Connector(data_source, dto.connection_info).dry_run(
                 await Rewriter(
                     dto.manifest_str,
                     data_source=data_source,
                     experiment=True,
+                    properties=dict(headers),
                 ).rewrite(sql)
             )
             return sql
         except Exception as e:
+            # because the v2 API doesn't support row-level access control,
+            # we don't fallback to v2 if the header include row-level access control properties.
+            if exist_wren_variables_header(headers):
+                raise e
+            else:
                 logger.warning(
                     "Failed to execute v3 model-substitute, fallback to v2: {}", str(e)
                 )
                 return await v2.connector.model_substitute(
                     data_source, dto, headers, java_engine_connector
                 )
wren-core-base/manifest-macro/src/lib.rs (1)

1-555: ⚠️ Potential issue

cargo fmt is failing – please format the macro crate

The Rust workflow reports a cargo fmt mismatch.
Run cargo fmt --all before merging to unblock the pipeline.

🧰 Tools
🪛 GitHub Actions: Rust

[error] 358-360: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.

wren-core/core/src/logical_plan/analyze/access_control.rs (1)

1-544: ⚠️ Potential issue

cargo fmt is failing for this file

CI indicates formatting differences.
Run cargo fmt --all to keep the style consistent.

🧰 Tools
🪛 GitHub Actions: Rust

[error] 93-93: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.

🧹 Nitpick comments (16)
wren-core/wren-example/examples/plan-sql.rs (1)

17-24: Updated function call to support session properties.

The transform_sql_with_ctx call has been properly updated to include the new HashMap parameter for session properties, which is required for the row-level access control feature. This empty map is appropriate for this example since no specific session properties are needed.

Would it be helpful to add an example that demonstrates the use of session properties with row-level access control? This would provide developers with a clear usage pattern for the new RLAC feature.

wren-core-py/tests/test_modeling_core.py (1)

290-300: Test provides good coverage of RLAC functionality.

This test validates that row-level access control conditions are correctly applied during SQL transformation, verifying that the condition "c_name = 'test_user'" is added to the WHERE clause. The test confirms RLAC is properly integrated with the SQL transformation pipeline.

However, consider adding a negative test case where the session_user header is missing to verify the behavior when optional RLAC properties are not provided.

+ def test_rlac_without_properties():
+     session_context = SessionContext(manifest_str, None)
+     sql = "SELECT * FROM my_catalog.my_schema.customer"
+     rewritten_sql = session_context.transform_sql(sql, {})
+     assert (
+         rewritten_sql
+         == "SELECT customer.c_custkey, customer.c_name FROM (SELECT customer.c_custkey, customer.c_name FROM (SELECT __source.c_custkey AS c_custkey, __source.c_name AS c_name FROM main.customer AS __source) AS customer) AS customer"
+     )
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

432-463: Thorough RLAC integration testing with header case sensitivity checks.

The test effectively validates that:

  1. RLAC filters are properly applied when querying data via the API
  2. The headers are case-insensitive (testing both lowercase and uppercase variants)
  3. The filtered data contains only the matching rows

This provides excellent coverage of the RLAC functionality at the API level.

Consider adding a test case for when the session variable is not provided to verify the system's behavior with optional RLAC properties.

+ async def test_rlac_query_without_session(client, manifest_str, connection_info):
+     response = await client.post(
+         url=f"{base_url}/query",
+         json={
+             "connectionInfo": connection_info,
+             "manifestStr": manifest_str,
+             "sql": "SELECT c_name FROM customer",
+         },
+     )
+     # Since session_user is optional, the query should return all rows
+     assert response.status_code == 200
+     result = response.json()
+     assert len(result["data"]) > 1
wren-core/wren-example/examples/row_level_access_control.rs.rs (3)

75-85: Consider simplifying session property setup.

The session properties setup is verbose with repeated insertion patterns.

Consider using a more concise approach:

-    let mut properties = HashMap::new();
-    properties.insert(
-        "session_tenant_id".to_string(),
-        Some("'1acdef01-aaaa-aaaa-aaaa-aaaaaaaaaaaa'".to_string()),
-    );
-    properties.insert(
-        "session_department".to_string(),
-        Some("'engineering'".to_string()),
-    );
-    properties.insert("session_user_id".to_string(), Some("'1003-u3'".to_string()));
-    properties.insert("session_role".to_string(), Some("'ADMIN'".to_string()));
+    let properties: HashMap<String, Option<String>> = [
+        ("session_tenant_id", "'1acdef01-aaaa-aaaa-aaaa-aaaaaaaaaaaa'"),
+        ("session_department", "'engineering'"),
+        ("session_user_id", "'1003-u3'"),
+        ("session_role", "'ADMIN'"),
+    ]
+    .iter()
+    .map(|(k, v)| (k.to_string(), Some(v.to_string())))
+    .collect();

87-111: Extract property printing to a helper function.

The code for printing session properties is repetitive.

Consider refactoring to a helper function:

-    println!("#####################");
-    println!(
-        "session_tenant_id: {}",
-        &properties
-            .get("session_tenant_id")
-            .unwrap()
-            .clone()
-            .unwrap()
-    );
-    println!(
-        "session_department: {}",
-        &properties
-            .get("session_department")
-            .unwrap()
-            .clone()
-            .unwrap()
-    );
-    println!(
-        "session_user_id: {}",
-        &properties.get("session_user_id").unwrap().clone().unwrap()
-    );
-    println!(
-        "session_role: {}",
-        &properties.get("session_role").unwrap().clone().unwrap()
-    );
+    println!("#####################");
+    for key in &["session_tenant_id", "session_department", "session_user_id", "session_role"] {
+        println!(
+            "{}: {}",
+            key,
+            properties.get(*key).unwrap().clone().unwrap()
+        );
+    }

1-1: File name has duplicate extension.

The file is named "row_level_access_control.rs.rs" with a duplicate ".rs" extension.

Rename the file to "row_level_access_control.rs" to follow standard Rust naming conventions.

wren-core/core/src/logical_plan/analyze/model_generation.rs (1)

86-108: Combine RLAC filters with Iterator::flatten().reduce() for clearer intent

The current reduce chain works, but the nested Option handling and the unwrap() call make the control-flow a bit hard to follow and slightly brittle. A more idiomatic approach is to:

  1. filter_map away the Nones first,
  2. then call reduce(|acc, f| acc.and(f)).

That avoids the manual is_none/unwrap juggling and the risk of accidentally moving values out of the accumulator.

-let filters: Vec<Option<Expr>> = model_plan
-    .model
-    .row_level_access_controls()
-    .iter()
-    .map(|rule| {
-        self.generate_row_level_access_control_filter(
-            Arc::clone(&model_plan.model),
-            rule,
-        )
-    })
-    .collect::<Result<_>>()?;
-let rls_filter = filters
-    .into_iter()
-    .reduce(|acc, filter| {
-        if acc.is_none() {
-            filter
-        } else if let Some(filter) = filter {
-            Some(acc.unwrap().and(filter))
-        } else {
-            acc
-        }
-    })
-    .flatten();
+let rls_filter = model_plan
+    .model
+    .row_level_access_controls()
+    .iter()
+    .filter_map(|rule| {
+        // `ok()` converts Err to None, so error is still bubbled up by `?`
+        self.generate_row_level_access_control_filter(
+            Arc::clone(&model_plan.model),
+            rule,
+        ).transpose()
+    })
+    .flatten()
+    .reduce(|acc, f| acc.and(f));

Cleaner, easier to reason about, and no unwrap().

wren-core/core/src/mdl/mod.rs (2)

358-366: Property key normalisation happens late – guard against mismatched cases up-front

properties_ref is constructed and passed into create_ctx_with_mdl,
which performs normalisation. Any code paths that access
SessionPropertiesRef before this normalisation (now or in future
refactors) would silently fail look-ups due to case mismatch.

Mitigation idea: normalise the HashMap right here (e.g. convert all
keys to lowercase) so every subsequent consumer gets a consistent view
regardless of call site.


2125-2133: Tiny helper can be simplified with collect()

build_headers manually populates a HashMap. Using iterator tooling
keeps it concise and avoids an intermediate mutable map:

-fn build_headers(
-    field: &[(String, Option<String>)],
-) -> HashMap<String, Option<String>> {
-    let mut headers = HashMap::new();
-    for (key, value) in field {
-        headers.insert(key.clone(), value.clone());
-    }
-    headers
-}
+fn build_headers(
+    fields: &[(String, Option<String>)],
+) -> HashMap<String, Option<String>> {
+    fields
+        .iter()
+        .cloned()
+        .collect()
+}

Nip-level, but reduces repetitive code.

wren-core/core/src/logical_plan/analyze/plan.rs (1)

376-392: Duplicate required_fields entries may inflate planning time

add_required_columns_from_session_properties blindly extends the
vector with every column mentioned in RLAC conditions. Although later
deduplication via a BTreeSet eventually removes duplicates, the
intermediate vectors can grow quadratically when many overlapping RLAC
rules exist.

A low-cost improvement is to insert into a HashSet first or to
retain only new expressions before extending, e.g.:

let mut seen = HashSet::new();
for expr in collect_condition(model, &rule.condition)?.0 {
    if seen.insert(expr.to_string()) {   // or a custom key
        required_fields.push(expr);
    }
}

This shaves unnecessary Expr cloning and stringification work during
large manifests.

wren-core-base/manifest-macro/src/lib.rs (2)

360-383: Missing serde defaults & potential deserialization pitfalls

  1. RowLevelAccessControl.required_properties already has #[serde(default)], great.
    However, in many manifests the condition field will be the only mandatory one. To guard against accidental omission of required_properties you may also want
    #[serde(default)] pub required_properties: Vec<SessionProperty>, in the macro for completeness (mirrors other structs).

  2. SessionProperty struct (see below) does not mark default_expr as #[serde(default)]. Deserialising a manifest that omits that field will fail unless the user provides null. Consider:

-pub default_expr: Option<String>,
+#[serde(default)]
+pub default_expr: Option<String>,
  1. Remember to re-run cargo fmt; the CI formatting check is currently failing.

386-406: Expose helper ctors inside the macro output

The tests rely on SessionProperty::new_required/optional, but these helpers are not generated here.
If they live elsewhere, ignore this note; otherwise, add simple inherent impls to make authoring manifests easier:

impl SessionProperty {
    pub fn new_required(name: impl Into<String>) -> Self {
        Self { name: name.into(), required: true, default_expr: None }
    }
    pub fn new_optional(name: impl Into<String>, default_expr: Option<String>) -> Self {
        Self { name: name.into(), required: false, default_expr }
    }
}

This keeps the public API symmetric with the newly introduced struct.

wren-core/core/src/logical_plan/analyze/access_control.rs (4)

31-32: Typo in variable name

seesion_propertiessession_properties. While harmless, typos hinder readability and can propagate to log messages.

-let mut seesion_properties: HashSet<String> = HashSet::new();
+let mut session_properties: HashSet<String> = HashSet::new();

49-54: Potential duplicate column expressions

When the same column appears multiple times in a condition, it is pushed repeatedly into conditions.
Using a HashSet avoids redundant work later:

-let mut conditions = vec![];
+let mut seen = HashSet::new();
+let mut conditions = Vec::new();
 ...
-if !value.starts_with("@") {
-    ...
-    conditions.push(Expr::Column(...));
+if !value.starts_with("@") {
+    ...
+    if seen.insert(value) {
+        conditions.push(Expr::Column(...));
+    }

93-140: Repeated parsing of identical session values

parse_expr(property_value) is invoked for every occurrence of the same @property token, causing unnecessary work.
Caching the parsed Expr once per property will speed up large expressions:

-visit_expressions_mut(&mut expr, |expr| {
+let mut cache = HashMap::new();
+visit_expressions_mut(&mut expr, |expr| {
     ...
-    match parse_expr(property_value) {
+    let parsed = cache.entry(property_name.clone())
+        .or_insert_with(|| parse_expr(property_value));
+    match parsed {
🧰 Tools
🪛 GitHub Actions: Rust

[error] 93-93: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.


150-152: Double parsing round-trip may be avoidable

expr.to_string()create_logical_expr() re-parses the SQL.
If datafusion::sql::planner::SqlToRel exposes a method that converts an AST Expr directly, you could pass the already-parsed tree and skip one parse cycle, improving performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c31174 and 997c425.

⛔ Files ignored due to path filters (4)
  • wren-core-py/Cargo.lock is excluded by !**/*.lock
  • wren-core/wren-example/data/company/documents.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/tenants.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/users.csv is excluded by !**/*.csv
📒 Files selected for processing (30)
  • ibis-server/app/mdl/rewriter.py (4 hunks)
  • ibis-server/app/routers/v3/connector.py (6 hunks)
  • ibis-server/app/util.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (2 hunks)
  • wren-core-base/manifest-macro/src/lib.rs (6 hunks)
  • wren-core-base/src/mdl/builder.rs (8 hunks)
  • wren-core-base/src/mdl/manifest.rs (5 hunks)
  • wren-core-base/tests/data/mdl.json (1 hunks)
  • wren-core-py/src/context.rs (3 hunks)
  • wren-core-py/src/manifest.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
  • wren-core/Cargo.toml (1 hunks)
  • wren-core/benchmarks/src/tpch/run.rs (2 hunks)
  • wren-core/core/Cargo.toml (1 hunks)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/mod.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5 hunks)
  • wren-core/core/src/logical_plan/analyze/model_generation.rs (7 hunks)
  • wren-core/core/src/logical_plan/analyze/plan.rs (15 hunks)
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs (6 hunks)
  • wren-core/core/src/mdl/context.rs (7 hunks)
  • wren-core/core/src/mdl/mod.rs (39 hunks)
  • wren-core/sqllogictest/src/test_context.rs (2 hunks)
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs (2 hunks)
  • wren-core/wren-example/examples/datafusion-apply.rs (1 hunks)
  • wren-core/wren-example/examples/plan-sql.rs (2 hunks)
  • wren-core/wren-example/examples/row_level_access_control.rs.rs (1 hunks)
  • wren-core/wren-example/examples/to-many-calculation.rs (1 hunks)
  • wren-core/wren-example/examples/view.rs (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (9)
wren-core-py/src/manifest.rs (1)
wren-core-base/src/mdl/manifest.rs (1)
  • row_level_access_controls (271-273)
wren-core/benchmarks/src/tpch/run.rs (1)
wren-core/core/src/mdl/mod.rs (3)
  • transform_sql_with_ctx (350-386)
  • new (117-172)
  • new (426-428)
wren-core/wren-example/examples/view.rs (1)
wren-core/core/src/mdl/mod.rs (3)
  • transform_sql_with_ctx (350-386)
  • new (117-172)
  • new (426-428)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (2)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
  • connection_info (37-44)
wren-core/core/src/logical_plan/analyze/model_anlayze.rs (2)
wren-core/core/src/mdl/context.rs (1)
  • properties (72-78)
wren-core/sqllogictest/src/test_context.rs (1)
  • analyzed_wren_mdl (104-106)
wren-core/core/src/logical_plan/analyze/relation_chain.rs (4)
wren-core/core/src/mdl/context.rs (1)
  • properties (72-78)
wren-core/core/src/mdl/utils.rs (1)
  • quoted (209-211)
wren-core/core/src/logical_plan/analyze/model_generation.rs (1)
  • model_plan (86-96)
wren-core/core/src/logical_plan/analyze/plan.rs (1)
  • plan_name (84-86)
wren-core-base/src/mdl/builder.rs (3)
wren-core-base/src/mdl/manifest.rs (4)
  • name (245-247)
  • name (278-280)
  • name (289-291)
  • name (295-297)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)
  • required_properties (173-190)
wren-core-base/src/mdl/cls.rs (1)
  • new (41-55)
wren-core/core/src/mdl/context.rs (3)
wren-core/core/src/logical_plan/analyze/model_generation.rs (1)
  • new (38-48)
wren-core/core/src/logical_plan/analyze/plan.rs (6)
  • new (69-82)
  • new (109-124)
  • new (649-651)
  • new (766-860)
  • new (914-961)
  • new (1020-1022)
wren-core/core/src/logical_plan/analyze/model_anlayze.rs (2)
  • new (69-79)
  • new (794-802)
wren-core-base/manifest-macro/src/lib.rs (2)
wren-core-base/src/mdl/manifest.rs (5)
  • row_level_access_controls (271-273)
  • name (245-247)
  • name (278-280)
  • name (289-291)
  • name (295-297)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)
  • required_properties (173-190)
🪛 GitHub Actions: Rust
wren-core/core/src/mdl/context.rs

[error] 69-69: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.

wren-core/core/src/logical_plan/analyze/access_control.rs

[error] 93-93: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.

wren-core-base/manifest-macro/src/lib.rs

[error] 358-360: cargo fmt check failed due to formatting differences. Code style does not match expected formatting.

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: test
  • GitHub Check: ci
🔇 Additional comments (61)
wren-core/core/Cargo.toml (1)

27-27: Good addition of snapshot testing capability.

Adding the insta crate is appropriate for implementing snapshot tests for the new row-level access control features. This will enable more robust testing of the RLAC functionality.

wren-core/core/src/logical_plan/analyze/mod.rs (1)

1-1: Appropriate module addition for row-level access control.

Adding the access_control module is a clean way to encapsulate the new row-level access control functionality within the logical plan analysis system.

wren-core-py/src/manifest.rs (1)

46-47: Updated test models with row-level access control field.

The test models have been correctly updated to include empty vectors for the new row_level_access_controls field, ensuring the tests remain compatible with the updated Model struct definition.

Also applies to: 57-58

wren-core/wren-example/examples/plan-sql.rs (1)

2-2: Added HashMap import for session properties support.

Good addition of the HashMap import to support the new session properties parameter in RLAC-related functions.

ibis-server/app/util.py (1)

110-115: Well-implemented utility function for RLAC header detection

The new exist_wren_variables_header function correctly checks for the presence of row-level access control variable headers. The implementation is simple and efficient, first checking if headers are None before using a generator expression with any() to search for keys with the "x-wren-variables-" prefix.

wren-core/wren-example/examples/calculation-invoke-calculation.rs (2)

84-84: Correctly updated function call to support RLAC

The function call has been properly updated to include the new empty HashMap parameter required for the row-level access control feature. This matches the updated signature of transform_sql_with_ctx as shown in the relevant code snippets.


110-110: Correctly updated function call to support RLAC

The function call has been properly updated to include the new empty HashMap parameter required for the row-level access control feature. This matches the updated signature of transform_sql_with_ctx.

wren-core/benchmarks/src/tpch/run.rs (2)

6-6: Correctly added HashMap import

The import for HashMap has been properly added to support the usage of HashMap::new() in the updated function call.


61-68: Correctly updated transform_sql_with_ctx call to support RLAC

The function call has been properly updated to include the new empty HashMap parameter required for the row-level access control feature. The update maintains the existing functionality while adapting to the new function signature.

wren-core/wren-example/examples/datafusion-apply.rs (1)

81-82: Correctly updated transform_sql_with_ctx call to support RLAC

The function call has been properly updated to include the new empty HashMap parameter required for the row-level access control feature. The code has been properly reformatted across two lines to maintain readability.

wren-core/wren-example/examples/to-many-calculation.rs (1)

79-80: Function call updated to support row-level access control.

The call to create_ctx_with_mdl has been properly modified to pass an empty HashMap of session properties, which is required for the new row-level access control (RLAC) functionality. This change ensures the example code aligns with the updated API while maintaining its original behavior.

ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1)

160-170: LGTM - Good test coverage for RLAC headers.

This test effectively verifies that fallback to v2 API is prevented when RLAC headers are present, confirming proper isolation of row-level access control functionality. The test structure is consistent with other tests in the file and correctly asserts the expected 422 status code.

wren-core/wren-example/examples/view.rs (2)

1-1: Appropriate import added for new HashMap parameter.

The HashMap import is correctly added to support the updated function call signature.


19-26: Function call updated to support row-level access control.

The call to transform_sql_with_ctx has been properly updated to include the new HashMap parameter required for row-level access control support. This change aligns with the function signature modification shown in the relevant code snippet from wren-core/core/src/mdl/mod.rs.

wren-core/sqllogictest/src/test_context.rs (2)

304-310: Function call updated to support row-level access control.

The call to create_ctx_with_mdl in the register_ecommerce_mdl function has been properly updated with an empty Arc<HashMap> as the session properties parameter. This modification correctly implements the interface changes for RLAC support while preserving test behavior.


540-546: Function call updated to support row-level access control.

The call to create_ctx_with_mdl in the register_tpch_mdl function has been properly updated with an empty Arc<HashMap> as the session properties parameter, consistent with the changes throughout the codebase for RLAC support.

wren-core-py/tests/test_modeling_core.py (1)

28-39: Row-level access controls are well configured in the manifest.

The RLAC configuration properly defines a named rule "customer_access" with a condition that filters rows based on the session_user variable. The required property is correctly marked as optional (required: False), allowing queries to execute even when session_user isn't provided.

wren-core-base/tests/data/mdl.json (1)

37-69: Comprehensive RLAC rule definitions with good test coverage.

The implementation includes three different RLAC rules demonstrating various configurations:

  1. Rule1: With a required property (mandatory)
  2. Rule2: With an optional property
  3. Rule3: With an optional property and a default expression

This provides excellent test coverage for different RLAC scenarios, particularly the default expression handling which is important for fallback behavior.

wren-core-base/src/mdl/manifest.rs (4)

23-24: Appropriate deprecation handling with compiler directives.

The #[allow(deprecated)] annotations are correctly applied to suppress warnings when using deprecated features. This is good practice during transitions while maintaining backward compatibility.

Also applies to: 60-61


29-31: Macro imports properly organized.

The imports of the new macros row_level_access_control and session_property are well-organized alongside related macros, making the code organized and maintainable.

Also applies to: 66-68


49-51: Consistent macro invocation pattern.

The macro invocations follow the same pattern as other macro calls, with boolean parameters indicating whether Python bindings should be generated. This maintains consistency with the codebase's established patterns.

Also applies to: 88-90


271-273: Public accessor for RLAC follows established patterns.

The row_level_access_controls() method adheres to the existing accessor pattern in the Model implementation, providing a consistent API for retrieving RLAC rules. This ensures proper encapsulation while exposing the necessary functionality.

ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

81-92: Well-defined RLAC configuration for integration testing.

The RLAC configuration in the test manifest properly defines a rule with a condition that filters customer rows based on the session_user variable. This provides a good foundation for testing the end-to-end RLAC functionality with the Postgres connector.

wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5)

3-3: Adding SessionPropertiesRef for row-level access control support

This import introduces the SessionPropertiesRef type, which is a key component for implementing row-level access control (RLAC) in the modeling engine.


37-37: New properties field for session-level RLAC

Adding the properties field to ModelAnalyzeRule enables the rule to access and apply session-specific access control policies during model analysis.


72-72: Constructor updated to accept session properties

The constructor now correctly initializes the properties field from the provided parameter, maintaining the pattern of using Arc for shared ownership.

Also applies to: 77-77


456-457: Passing session properties to ModelPlanNode in table scan analysis

Session properties are now passed to ModelPlanNode during table scan analysis, ensuring that row-level access controls will be applied.


513-514: Passing session properties to ModelPlanNode in subquery alias model analysis

Session properties are consistently passed to ModelPlanNode in the subquery alias analysis path, ensuring that row-level access controls will be applied regardless of how the model is referenced.

wren-core-py/src/context.rs (4)

23-23: Add HashMap import for session properties

This import supports the new session properties functionality which uses a HashMap to store key-value pairs.


127-134: Initialize context with empty session properties

The context creation now includes an empty HashMap wrapped in an Arc for session properties, aligning with the updated core API.


145-153: Update transform_sql signature to support session properties

The method now accepts optional SQL and session properties, with proper null checking for the SQL parameter. This enables passing RLAC properties for SQL transformation.


161-162: Pass session properties to transform_sql_with_ctx

Session properties are now correctly passed to the core transformation function, with a sensible default to an empty map when not provided.

wren-core/core/src/logical_plan/analyze/relation_chain.rs (5)

9-9: Add SessionPropertiesRef import for RLAC support

This import adds the session properties reference type needed for row-level access control.


67-67: Suppress too_many_arguments warning

Appropriate use of the Clippy allowance for a method with many parameters that are all necessary for the function's purpose.


76-76: Add SessionPropertiesRef parameter to with_chain method

This parameter enables passing session properties through the relation chain construction, necessary for RLAC enforcement.


108-109: Pass session properties to ModelPlanNode

Session properties are now correctly passed when constructing model plan nodes in the relation chain, ensuring RLAC rules will be applied.


203-204: Update to use plan_name() method instead of direct field access

This reflects an architectural improvement where ModelPlanNode now holds a reference to the full Model object rather than just storing its name as a string.

Also applies to: 253-254

wren-core-base/src/mdl/builder.rs (6)

25-29: Add deprecated annotations for security-related imports

Appropriately marking deprecated security components while transitioning to the new RLAC model.


114-114: Initialize row_level_access_controls in Model struct

Adding the new field for storing RLAC rules in the Model, initialized as an empty vector.


154-167: Add row-level access control builder method

This method enables adding RLAC rules to a model with a clear API for specifying the rule name, required properties, and condition.


174-189: Implement SessionProperty constructors

These constructors provide a clean API for creating required and optional session properties, with support for default expressions.


495-512: Add RLAC rules to test model

The test now includes multiple RLAC rules demonstrating different property configurations: required, optional without default, and optional with default.


700-717: Add RLAC rules to JSON serialization test

The test now verifies that RLAC rules with different property types are correctly serialized and deserialized in the JSON format.

wren-core/core/src/mdl/context.rs (5)

45-45: Proper type alias definition for session properties.

This new type definition appropriately uses Arc to enable thread-safe sharing of session properties across contexts. The Option value allows representing both present and absent values.


71-78: Good practice to normalize property keys to lowercase.

This ensures case-insensitive lookups for session properties, which is important for consistent user experience when dealing with HTTP headers that might arrive with different casing.


48-53: API extension to support session properties.

Well-structured function signature update to include the new session properties parameter for row-level access control.


104-108: Consistent propagation of session properties.

The updated function signature ensures that session properties are passed through the analyzer rule creation pipeline.


136-140: Consistent parameter extension across related functions.

Good consistency in propagating the session properties to both local runtime and unparsing analyzer rules.

ibis-server/app/mdl/rewriter.py (4)

27-41: Store session properties in the Rewriter class.

Good addition of the properties parameter to the constructor. This allows for session context to be carried through the rewriting process.


59-59: Pass properties to the underlying rewriter.

The Rewriter.rewrite method correctly forwards the properties to the underlying rewriter implementation.


120-129: EmbeddedEngineRewriter correctly handles session properties.

The function properly passes filtered session properties to the transform_sql method.


133-143: Well-implemented session properties filtering.

Good implementation of property filtering to extract RLAC-specific headers. The method handles null properties and focuses only on headers with the "x-wren-variables-" prefix.

It would be helpful to add a docstring explaining that this method:

  1. Filters headers starting with "x-wren-variables-"
  2. Removes this prefix from the keys before passing them to the engine
wren-core/wren-example/examples/row_level_access_control.rs.rs (2)

1-131: Well-structured example demonstrating row-level access control.

This example clearly shows how to implement RLAC in a multi-tenant scenario. It demonstrates:

  1. Setting up session properties for tenant, department, user, and role
  2. Applying RLAC rules based on these properties
  3. Transforming and executing SQL with access controls applied

167-177: Clear row-level access control rule definitions.

The RLAC rules are well-defined with clear comments explaining their purpose:

  1. Multi-tenant isolation ensuring users only see documents in their tenant
  2. Role-based access control with different permissions for members vs. admins

The use of SessionProperty objects to define required and optional parameters is well-structured.

ibis-server/app/routers/v3/connector.py (5)

71-75: Pass request headers as properties to Rewriter.

The Rewriter class now receives the request headers as properties, enabling session context to influence query rewriting.


142-162: Prevent fallback to v2 API when RLAC headers are present.

Good conditional logic to prevent fallback to v2 API when row-level access control headers are present, as the v2 API doesn't support this feature.


175-177: Consistent properties propagation across endpoints.

The dry_plan endpoint correctly passes headers as properties to the Rewriter.


207-212: Properties passed to Rewriter in dry_plan_for_data_source.

The endpoint properly initializes Rewriter with session properties from headers.


246-251: Properties passed to Rewriter in validate endpoint.

The validate endpoint correctly initializes the Rewriter with properties from headers.

wren-core/core/src/logical_plan/analyze/model_generation.rs (1)

110-119: Potential double–aliasing: confirm that duplicate aliases can’t leak upstream

builder.alias(model_plan.plan_name())? is only executed when an RLAC
filter exists.
If the same model is referenced multiple times (e.g. in sub-queries that
also add an alias with the same plan_name()), DataFusion may raise
“duplicate relation” errors.

Please double-check complex query plans where:

  1. the same model appears multiple times and
  2. at least one occurrence carries an RLAC rule.

If you spot any collisions, consider appending a monotonically-increasing
suffix from AliasGenerator.

wren-core/core/src/logical_plan/analyze/access_control.rs (1)

127-140: No sanitisation of session-supplied SQL fragments

property_value is fed straight into the SQL parser.
While DataFusion’s parser will error on multiple statements, malicious input could still lead to confusing planner errors.

If possible, restrict property_value to literals (numbers, quoted strings) before parsing, e.g.:

if !matches!(property_value.chars().next(), Some('\'' | '\"' | '0'..='9')) {
    error = Some(plan_err!("Only literal values are allowed for session property {property_name}"));
    return ControlFlow::Break(());
}

@goldmedal goldmedal marked this pull request as draft April 24, 2025 09:22
@github-actions github-actions bot added the ci label Apr 24, 2025
@goldmedal goldmedal marked this pull request as ready for review April 25, 2025 04:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
ibis-server/app/dependencies.py (2)

19-24: Consider renaming the function for grammatical correctness.

The function implementation correctly and efficiently checks for the presence of RLAC headers with proper null-safety. However, the function name exist_wren_variables_header should be exists_wren_variables_header (with an 's') to be grammatically correct since it's checking if at least one header with the prefix exists.

-def exist_wren_variables_header(
+def exists_wren_variables_header(
    headers: Headers,
) -> bool:
    if headers is None:
        return False
    return any(key.startswith(X_WREN_VARIABLE_PREFIX) for key in headers.keys())

22-24: Consider adding case-insensitive header matching.

Since HTTP headers are case-insensitive, it would be more robust to perform case-insensitive matching when checking for headers with the RLAC prefix. The current implementation assumes headers are already lowercase, which might not always be the case depending on how clients send them.

    if headers is None:
        return False
-    return any(key.startswith(X_WREN_VARIABLE_PREFIX) for key in headers.keys())
+    return any(key.lower().startswith(X_WREN_VARIABLE_PREFIX) for key in headers.keys())
wren-core/wren-example/examples/row-level-access-control.rs (1)

87-112: Reduce repetitive println! blocks

The four back-to-back println! calls repeat the same lookup / clone / unwrap pattern and will panic if the map changes in the future.

-println!("session_tenant_id: {}", properties.get("session_tenant_id").unwrap().as_deref().unwrap());
-println!("session_department: {}", properties.get("session_department").unwrap().as_deref().unwrap());
-println!("session_user_id: {}", properties.get("session_user_id").unwrap().as_deref().unwrap());
-println!("session_role: {}", properties.get("session_role").unwrap().as_deref().unwrap());
+for (k, v) in &properties {
+    println!("{k}: {}", v.as_deref().unwrap_or("<unset>"));
+}

Benefits:
• Eliminates four panics → one safe path.
• Printing automatically adapts to added/removed properties.

wren-core/core/src/logical_plan/analyze/access_control.rs (1)

154-156: Parsing expression via expr.to_string() causes double parsing & loses SQL dialect nuances

create_logical_expr(&expr.to_string(), …) serialises the already-parsed AST back to SQL only to parse it again. This:

• adds unnecessary CPU cost,
• may re-introduce quoting/dialect edge-cases (the unparsers are lossy for some dialect-specific constructs).

Instead pass the Expr directly:

-    session_state.read().create_logical_expr(&expr.to_string(), &df_schema)
+    session_state.read().create_logical_expr_from_ast(expr.clone(), &df_schema)

(DataFusion exposes create_logical_expr_from_ast behind the planner module; if unavailable, consider extending the API or keeping the original Expr until plan creation.)

This keeps the pipeline purely AST-based and avoids fragile round-tripping.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 997c425 and 5f29c50.

📒 Files selected for processing (13)
  • .github/actions/rust/setup-rust-runtime/action.yaml (0 hunks)
  • ibis-server/app/dependencies.py (2 hunks)
  • ibis-server/app/mdl/rewriter.py (5 hunks)
  • ibis-server/app/routers/v3/connector.py (6 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (3 hunks)
  • wren-core-base/manifest-macro/src/lib.rs (6 hunks)
  • wren-core-base/src/mdl/builder.rs (8 hunks)
  • wren-core-base/src/mdl/manifest.rs (5 hunks)
  • wren-core/core/Cargo.toml (1 hunks)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/mdl/context.rs (7 hunks)
  • wren-core/wren-example/examples/row-level-access-control.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • .github/actions/rust/setup-rust-runtime/action.yaml
✅ Files skipped from review due to trivial changes (1)
  • wren-core/core/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (8)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py
  • wren-core-base/src/mdl/manifest.rs
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py
  • ibis-server/app/mdl/rewriter.py
  • ibis-server/app/routers/v3/connector.py
  • wren-core-base/src/mdl/builder.rs
  • wren-core-base/manifest-macro/src/lib.rs
  • wren-core/core/src/mdl/context.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-core/wren-example/examples/row-level-access-control.rs (3)
wren-core/core/src/mdl/mod.rs (2)
  • transform_sql_with_ctx (350-386)
  • analyze_with_tables (77-90)
wren-core/core/src/mdl/context.rs (1)
  • properties (73-79)
wren-core-base/src/mdl/builder.rs (2)
  • new_required (175-181)
  • new_optional (182-188)
🔇 Additional comments (1)
ibis-server/app/dependencies.py (1)

7-7: Good choice for header prefix constant.

Using a dedicated constant for the session property header prefix is a good practice for maintainability. Since HTTP headers are case-insensitive, using lowercase for the constant value aligns with common practices in HTTP header handling.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)

102-108: Optional property fallback ignores empty default values

The fallback fetches default_expr for optional properties but does not verify that the default itself is Some(...)/non-empty before passing it further.
If the manifest mistakenly sets default_expr: None, the rule silently behaves as required and triggers the not found error – surprising for users.

-                        .map(|r| &r.default_expr)
+                        .filter_map(|r| r.default_expr.as_ref())

This guarantees we only replace missing properties with existing defaults.

🧹 Nitpick comments (4)
wren-core/core/src/logical_plan/analyze/access_control.rs (4)

34-34: Typo: seesion_properties should be session_properties

There's a typo in the variable name that might cause confusion when reading the code.

-let mut seesion_properties: HashSet<String> = HashSet::new();
+let mut session_properties: HashSet<String> = HashSet::new();

60-62: Same typo in variable name used consistently

The same typo appears here. Make sure to fix it in all occurrences.

-                if !seesion_properties.contains(&session_property) {
-                    seesion_properties.insert(session_property);
+                if !session_properties.contains(&session_property) {
+                    session_properties.insert(session_property);

74-74: Typo in variable name

The final reference to the misspelled variable name needs to be fixed as well.

-        seesion_properties.into_iter().collect::<Vec<_>>(),
+        session_properties.into_iter().collect::<Vec<_>>(),

217-222: Potential inconsistency in optional property validation

The validation for optional properties checks if the property exists in headers or if it has a default expression, but doesn't verify if the default expression itself is a valid, non-empty string. This could lead to inconsistent behavior if a default expression is set to Some("") or similar.

For better consistency, consider enhancing the validation to also check the content of the default expression:

-            if exist || property.default_expr.is_some() {
+            if exist || property.default_expr.as_ref().is_some_and(|expr| !expr.is_empty()) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f29c50 and 5cb6f53.

📒 Files selected for processing (2)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/wren-example/examples/row-level-access-control.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-core/wren-example/examples/row-level-access-control.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (7)
wren-core-base/src/mdl/builder.rs (3)
  • condition (298-301)
  • new_required (175-181)
  • new_optional (182-188)
wren-core/core/src/logical_plan/analyze/model_generation.rs (3)
  • new (38-48)
  • name (303-305)
  • analyze (290-301)
wren-core/core/src/mdl/context.rs (2)
  • new (255-258)
  • properties (73-79)
wren-core/core/src/logical_plan/analyze/plan.rs (11)
  • new (69-82)
  • new (109-124)
  • new (649-651)
  • new (766-860)
  • new (914-961)
  • new (1020-1022)
  • name (43-47)
  • name (710-712)
  • name (870-872)
  • name (971-973)
  • name (1032-1034)
wren-core/core/src/mdl/mod.rs (5)
  • new (117-172)
  • new (426-428)
  • analyze (71-75)
  • build_headers (2125-2133)
  • default (59-67)
wren-core/core/src/logical_plan/analyze/model_anlayze.rs (2)
  • expr (589-599)
  • session_state (81-83)
wren-core-base/src/mdl/manifest.rs (4)
  • name (245-247)
  • name (278-280)
  • name (289-291)
  • name (295-297)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: test
  • GitHub Check: cargo test (win64)
  • GitHub Check: ci
  • GitHub Check: cargo check
🔇 Additional comments (6)
wren-core/core/src/logical_plan/analyze/access_control.rs (6)

167-193: Good security practice for session property validation

The prevent_invalid_expr function is a strong security measure that restricts session property values to only literals and structured data like arrays and maps, preventing injection of arbitrary code or functions. This is an excellent approach for maintaining the security boundaries of the system.


229-239: Good case-insensitive property matching

The implementation properly handles case-insensitive property matching by converting property names to lowercase before lookup. This is a good user experience enhancement that makes the system more forgiving of case variations in property names.


593-613: Comprehensive validation of valid property values

The test covers a wide range of valid property value types including integers, strings, floats, booleans, arrays, maps, and intervals. This thorough testing ensures the system correctly handles various data types that might be used as session properties.


615-641: Strong security validation in tests

The test thoroughly validates that potentially dangerous expressions like function calls, subqueries, operations, and other non-literal values are properly rejected. This is crucial for preventing SQL injection and maintaining the security of the system.


551-576: Effective case-insensitive property matching test

The test verifies that property names are matched case-insensitively, which is important for user experience and consistency. The system correctly handles properties regardless of the case used in the condition or in the headers.


42-43:

Details

✅ Verification successful

TODO comment about CompoundIdentifier

The current implementation only handles simple identifiers and not compound identifiers or field access patterns. While the TODO comment acknowledges this limitation, it's worth noting that this might cause issues with more complex SQL expressions that reference fields from nested structures.

Consider addressing this in a follow-up task to ensure proper handling of all identifier types.


🏁 Script executed:

#!/bin/bash
# Check if there are any compound identifiers used in existing model conditions
# that would not be correctly handled by the current implementation

echo "Searching for potential compound identifiers in model definitions..."
rg -A 3 "condition:" --type-add 'rust:*.rs' --type rust | rg -A 2 "CompoundIdentifier|CompoundFieldAccess|\."

Length of output: 5492


Acknowledge simple‐identifier limitation
The current matcher only handles ast::Expr::Identifier and won’t cover CompoundIdentifier or CompoundFieldAccess patterns (e.g. table.column). We searched existing model conditions and found no compound identifiers in use today, so this does not break any current rules.

• File: wren-core/core/src/logical_plan/analyze/access_control.rs at the if let ast::Expr::Identifier… check
• No occurrences of CompoundIdentifier or CompoundFieldAccess found in model definitions

Consider scheduling a follow-up task to add full support for nested/compound identifiers when needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)

102-108: 🛠️ Refactor suggestion

Optional property handling improvement needed

The fallback for optional properties has a potential issue. When a property is missing, you're trying to get its default_expr, but you're not checking if the default itself exists before passing it further. If default_expr is None for optional properties, the code will proceed and then fail at line 116 with a confusing "property not found" error rather than correctly skipping the rule.

- .map(|r| &r.default_expr)
+ .filter_map(|r| r.default_expr.as_ref())

This ensures we only pass through non-null default values.

🧹 Nitpick comments (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)

216-223: Rule validation logic could be more explicit

The code on line 218 combines two conditions (property existence vs default value existence) in a way that might be confusing to future maintainers.

- if exist || property.default_expr.as_ref().is_some_and(|expr| !expr.is_empty()) {
+ // If property exists in headers or has a valid default value, the rule applies
+ let has_valid_default = property.default_expr.as_ref().is_some_and(|expr| !expr.is_empty());
+ if exist || has_valid_default {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cb6f53 and c6bbe2d.

📒 Files selected for processing (2)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/mdl/mod.rs (39 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo check
  • GitHub Check: ci
  • GitHub Check: test
🔇 Additional comments (8)
wren-core/core/src/logical_plan/analyze/access_control.rs (3)

169-193: Security validation looks good

The function correctly restricts session property expressions to literal values only, preventing SQL injection and other security risks through user-provided values. The recursive validation ensures even nested literals in arrays and objects are safe.


229-239: Clean property presence check with case-insensitive handling

The function effectively handles case-insensitive property matching and correctly validates both the existence and non-emptiness of values.


551-576: Comprehensive case-insensitivity testing

The test correctly verifies that property names are matched case-insensitively, which is important for a robust user experience when working with headers that might come from different systems with varying casing conventions.

wren-core/core/src/mdl/mod.rs (5)

362-365: Clean property passing implementation

The implementation effectively wraps the properties in an Arc and passes them to the context creation function, which is a clean way to integrate RLAC support without disrupting the existing code flow.


1590-1782: Comprehensive RLAC testing with required properties

The test suite thoroughly covers multiple scenarios for required properties, including:

  • Header presence verification
  • Multiple rule application
  • Cross-model conditions
  • Error cases for missing properties
  • Case sensitivity handling

This ensures the RLAC implementation is robust and behaves as expected across a variety of use cases.


1784-1945: Well-structured optional property RLAC testing

The tests comprehensively verify optional property behavior including:

  1. Default value fallback when properties are missing
  2. Correct handling when defaults are missing or empty
  3. Mixed required and optional property scenarios

This ensures that users have flexibility in defining access rules with varying levels of strictness.


1947-2068: Good testing of complex calculated field scenarios

The tests verify RLAC works correctly with calculated fields in relationships, which is a more complex scenario. The TODO comments around line 2014 correctly note future improvements for propagating RLAC rules through relationships.

Consider creating a tracking issue for the TODO comment about applying RLAC rules to models used in calculated fields via relationships, as this would enhance the security model's completeness.


2173-2181: Useful test helper functions

These functions make the tests more readable and maintainable by abstracting common operations.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
wren-core/core/src/mdl/mod.rs (1)

2251-2267: Consider extracting build_headers/batches_to_string into a reusable helper to avoid test-only duplication

Both helpers are now implemented here and in
logical_plan/analyze/access_control.rs test-suite.
Duplicating even small utilities makes maintenance harder (one side may evolve
silently and the other side will miss bug-fixes, e.g. the lower-case
normalisation logic).

Suggestion:

-// repeated in two different test modules
-fn build_headers(...) -> HashMap<_, _> { ... }
-fn batches_to_string(...) -> String { ... }
+// tests/common.rs (or crate::test_util)
+pub(crate) fn build_headers(...) -> HashMap<_, _> { ... }
+pub(crate) fn batches_to_string(...) -> String { ... }

Then #[cfg(test)]-modules can simply use crate::test_util::*;.

wren-core/core/src/logical_plan/analyze/access_control.rs (2)

169-193: prevent_invalid_expr error message prints the entire expression – can leak PII

When a user supplies an invalid literal (e.g. a phone number or e-mail) the
error returned embeds the literal verbatim:

The session property  allow only literal value

Two follow-ups:

  1. Include the property name but not the full literal to avoid accidental
    PII leakage in logs.
  2. Consider a dedicated AccessControlError::InvalidLiteral enum so callers
    can pattern-match instead of string-matching error text.

453-461: Helper duplication again (build_headers) – extract to shared test util

Same note as in mdl/mod.rs: the exact same build_headers helper now lives
in two test modules. Moving it to a central tests/util.rs reduces drift and
keeps test code DRY.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6bbe2d and 2cb2e26.

⛔ Files ignored due to path filters (3)
  • wren-core/wren-example/data/company/documents.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/tenants.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/users.csv is excluded by !**/*.csv
📒 Files selected for processing (30)
  • ibis-server/app/dependencies.py (2 hunks)
  • ibis-server/app/mdl/rewriter.py (5 hunks)
  • ibis-server/app/routers/v3/connector.py (6 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (3 hunks)
  • wren-core-base/manifest-macro/src/lib.rs (6 hunks)
  • wren-core-base/src/mdl/builder.rs (8 hunks)
  • wren-core-base/src/mdl/manifest.rs (5 hunks)
  • wren-core-base/tests/data/mdl.json (1 hunks)
  • wren-core-py/src/context.rs (3 hunks)
  • wren-core-py/src/manifest.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
  • wren-core/Cargo.toml (1 hunks)
  • wren-core/benchmarks/src/tpch/run.rs (2 hunks)
  • wren-core/core/Cargo.toml (1 hunks)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/mod.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5 hunks)
  • wren-core/core/src/logical_plan/analyze/model_generation.rs (7 hunks)
  • wren-core/core/src/logical_plan/analyze/plan.rs (15 hunks)
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs (6 hunks)
  • wren-core/core/src/mdl/context.rs (7 hunks)
  • wren-core/core/src/mdl/mod.rs (40 hunks)
  • wren-core/sqllogictest/src/test_context.rs (2 hunks)
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs (2 hunks)
  • wren-core/wren-example/examples/datafusion-apply.rs (1 hunks)
  • wren-core/wren-example/examples/plan-sql.rs (2 hunks)
  • wren-core/wren-example/examples/row-level-access-control.rs (1 hunks)
  • wren-core/wren-example/examples/to-many-calculation.rs (1 hunks)
  • wren-core/wren-example/examples/view.rs (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • wren-core/Cargo.toml
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs
🚧 Files skipped from review as they are similar to previous changes (26)
  • wren-core/core/Cargo.toml
  • wren-core-py/src/manifest.rs
  • wren-core/wren-example/examples/plan-sql.rs
  • wren-core/wren-example/examples/datafusion-apply.rs
  • wren-core-py/src/context.rs
  • wren-core/wren-example/examples/to-many-calculation.rs
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs
  • wren-core-base/src/mdl/manifest.rs
  • wren-core/core/src/logical_plan/analyze/model_generation.rs
  • wren-core-base/tests/data/mdl.json
  • wren-core/benchmarks/src/tpch/run.rs
  • wren-core-base/src/mdl/builder.rs
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py
  • ibis-server/app/dependencies.py
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs
  • wren-core/wren-example/examples/view.rs
  • wren-core/core/src/logical_plan/analyze/mod.rs
  • wren-core/sqllogictest/src/test_context.rs
  • ibis-server/app/routers/v3/connector.py
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py
  • ibis-server/app/mdl/rewriter.py
  • wren-core/core/src/mdl/context.rs
  • wren-core/core/src/logical_plan/analyze/plan.rs
  • wren-core-py/tests/test_modeling_core.py
  • wren-core/wren-example/examples/row-level-access-control.rs
  • wren-core-base/manifest-macro/src/lib.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
wren-core/core/src/mdl/mod.rs (8)
wren-core/core/src/mdl/context.rs (3)
  • properties (73-79)
  • new (255-258)
  • create_ctx_with_mdl (48-103)
wren-core-py/src/context.rs (5)
  • runtime (169-175)
  • new (75-142)
  • remote_functions (81-84)
  • functions (93-96)
  • default (58-64)
wren-core/core/src/logical_plan/analyze/model_generation.rs (2)
  • new (38-48)
  • analyze (290-301)
wren-core/core/src/logical_plan/analyze/plan.rs (6)
  • new (69-82)
  • new (109-124)
  • new (649-651)
  • new (766-860)
  • new (914-961)
  • new (1020-1022)
wren-core/core/src/logical_plan/analyze/model_anlayze.rs (3)
  • new (69-79)
  • new (794-802)
  • analyze (47-61)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
  • new (99-103)
  • default (93-95)
wren-core-base/src/mdl/builder.rs (3)
  • default (38-40)
  • new_required (175-181)
  • new_optional (182-188)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)
  • build_headers (453-461)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: test
  • GitHub Check: cargo check
  • GitHub Check: ci

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
wren-core/core/src/logical_plan/analyze/access_control.rs (2)

102-126: Nested-Option handling remains hard to read (see previous review)

The double let Some(property_value) pattern and two-level Option type make this block difficult to follow and easy to miss error paths.
A flatter flow also allows a single ownership of the value and eliminates the second match.

The same refactor was suggested earlier and is still applicable.


41-66: Compound identifiers and field access still unsupported

collect_condition only matches ast::Expr::Identifier, meaning conditions
such as nation.id = @x or "user"."role" = @role are silently ignored.
This can lead to incorrect “column not found” errors or, worse, missed column
references.

A TODO comment exists, but given RLAC is security-sensitive, consider handling
CompoundIdentifier and CompoundFieldAccess sooner rather than later.

🧹 Nitpick comments (3)
wren-core/core/src/mdl/mod.rs (2)

330-345: Consider accepting &HashMap/Arc<HashMap> instead of taking ownership

transform_sql / transform_sql_with_ctx now take properties: HashMap<String, Option<String>>, which is immediately wrapped into an Arc (Arc::new(properties)).
For a call-site this means:

  1. A full clone of the map is performed each time (potentially large).
  2. The API is now move-only, forcing callers to relinquish ownership or clone.

Passing impl Into<Arc<HashMap<_, _>>> or simply Arc<HashMap<_, _>> keeps the allocation cost at one place and avoids an extra copy per invocation.

-pub fn transform_sql(
-    analyzed_mdl: Arc<AnalyzedWrenMDL>,
-    remote_functions: &[RemoteFunction],
-    properties: HashMap<String, Option<String>>,
+pub fn transform_sql(
+    analyzed_mdl: Arc<AnalyzedWrenMDL>,
+    remote_functions: &[RemoteFunction],
+    properties: Arc<HashMap<String, Option<String>>>,

Call-sites that have a plain HashMap can still call Arc::new(hmap).


363-365: Redundant allocation when wrapping properties into Arc

let properties_ref = Arc::new(properties);

Because HashMap was moved in, Arc::new(...) is unavoidable, but keep in mind that the entire table is cloned earlier if the caller already owns an Arc<HashMap<_,_>> (see previous comment).
Adopting an Arc in the function signature will eliminate both the clone and this extra allocation.

wren-core/core/src/logical_plan/analyze/access_control.rs (1)

187-193: Nit: typo & wording in error message

  1. filedsfields
  2. "The session property {} allow only literal value"
    should be “allows”.
- ast::Expr::Dictionary(fileds) => {
+ ast::Expr::Dictionary(fields) => {
     for field in fields {
-         prevent_invalid_expr(&field.value)?;
+         prevent_invalid_expr(&field.value)?;
     }
     …
- _ => plan_err!("The session property {} allow only literal value", expr),
+ _ => plan_err!("The session property {} allows only literal values", expr),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cb2e26 and dede1c4.

📒 Files selected for processing (2)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/mdl/mod.rs (42 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo check
  • GitHub Check: test
  • GitHub Check: ci

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
ibis-server/app/routers/v3/connector.py (2)

346-351: ⚠️ Potential issue

Missing properties parameter in model_substitute.

Unlike all other endpoints, the Rewriter in model_substitute doesn't pass the properties from headers, which will prevent RLAC from working here.

Add the properties parameter to be consistent with other endpoints:

            Connector(data_source, dto.connection_info).dry_run(
                await Rewriter(
                    dto.manifest_str,
                    data_source=data_source,
                    experiment=True,
+                   properties=dict(headers),
                ).rewrite(sql)
            )

358-359: ⚠️ Potential issue

Missing RLAC fallback check in model_substitute.

Unlike all other endpoints, the model_substitute endpoint is missing the check for RLAC headers when determining whether to fall back to v2 API. This will cause inconsistent behavior.

Add the check for RLAC headers:

            is_fallback_disable = bool(
                headers.get(X_WREN_FALLBACK_DISABLE)
                and safe_strtobool(headers.get(X_WREN_FALLBACK_DISABLE, "false"))
            )
-           if is_fallback_disable:
+           # because the v2 API doesn't support row-level access control,
+           # we don't fallback to v2 if the header include row-level access control properties.
+           if is_fallback_disable or exist_wren_variables_header(headers):
                raise e
♻️ Duplicate comments (2)
wren-core/core/src/logical_plan/analyze/access_control.rs (2)

234-241: ⚠️ Potential issue

Case-insensitive lookup relies on pre-lowercased header keys.

The function only lowercases the property name but not the keys in the headers map. This means if headers contain uppercase keys, the case-insensitive lookup will fail.

- .get(&property_name.to_lowercase())
+ .iter()
+     .find(|(k, _)| k.eq_ignore_ascii_case(&property_name))
+     .map(|(_, v)| v)

126-133: ⚠️ Potential issue

Inconsistent whitespace handling between validation functions.

This function uses trim().is_empty() while is_property_present on line 239 uses just is_empty(). This inconsistency means a property with only whitespace will be rejected here but might pass validation in validate_rule.

// Also update is_property_present on line 239 to match:
- .map(|v| v.as_ref().is_some_and(|value| !value.is_empty()))
+ .map(|v| v.as_ref().is_some_and(|value| !value.trim().is_empty()))
🧹 Nitpick comments (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (1)

102-116: Consider additional error handling for session property variables in SQL injection scenarios.

The code correctly handles missing properties by checking against both direct properties and default values from required_properties. However, consider adding a safelist validation for property names to prevent potential SQL injection vectors through user-controlled session property names.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dede1c4 and 1c7c615.

⛔ Files ignored due to path filters (3)
  • wren-core/wren-example/data/company/documents.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/tenants.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/users.csv is excluded by !**/*.csv
📒 Files selected for processing (31)
  • ibis-server/app/dependencies.py (2 hunks)
  • ibis-server/app/mdl/rewriter.py (5 hunks)
  • ibis-server/app/routers/v3/connector.py (8 hunks)
  • ibis-server/app/util.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (3 hunks)
  • wren-core-base/manifest-macro/src/lib.rs (6 hunks)
  • wren-core-base/src/mdl/builder.rs (8 hunks)
  • wren-core-base/src/mdl/manifest.rs (5 hunks)
  • wren-core-base/tests/data/mdl.json (1 hunks)
  • wren-core-py/src/context.rs (3 hunks)
  • wren-core-py/src/manifest.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
  • wren-core/Cargo.toml (1 hunks)
  • wren-core/benchmarks/src/tpch/run.rs (2 hunks)
  • wren-core/core/Cargo.toml (1 hunks)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/mod.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5 hunks)
  • wren-core/core/src/logical_plan/analyze/model_generation.rs (7 hunks)
  • wren-core/core/src/logical_plan/analyze/plan.rs (15 hunks)
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs (6 hunks)
  • wren-core/core/src/mdl/context.rs (7 hunks)
  • wren-core/core/src/mdl/mod.rs (42 hunks)
  • wren-core/sqllogictest/src/test_context.rs (2 hunks)
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs (2 hunks)
  • wren-core/wren-example/examples/datafusion-apply.rs (1 hunks)
  • wren-core/wren-example/examples/plan-sql.rs (2 hunks)
  • wren-core/wren-example/examples/row-level-access-control.rs (1 hunks)
  • wren-core/wren-example/examples/to-many-calculation.rs (1 hunks)
  • wren-core/wren-example/examples/view.rs (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • wren-core/core/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (25)
  • wren-core/core/src/logical_plan/analyze/mod.rs
  • wren-core/wren-example/examples/plan-sql.rs
  • wren-core/Cargo.toml
  • wren-core/benchmarks/src/tpch/run.rs
  • wren-core/wren-example/examples/to-many-calculation.rs
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs
  • wren-core/wren-example/examples/view.rs
  • wren-core-py/src/manifest.rs
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs
  • wren-core/sqllogictest/src/test_context.rs
  • wren-core/wren-example/examples/datafusion-apply.rs
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py
  • wren-core-base/src/mdl/manifest.rs
  • wren-core-base/tests/data/mdl.json
  • wren-core-py/src/context.rs
  • wren-core-base/src/mdl/builder.rs
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs
  • wren-core/core/src/logical_plan/analyze/model_generation.rs
  • ibis-server/app/mdl/rewriter.py
  • wren-core/core/src/logical_plan/analyze/plan.rs
  • wren-core/core/src/mdl/context.rs
  • wren-core/wren-example/examples/row-level-access-control.rs
  • wren-core-py/tests/test_modeling_core.py
  • wren-core-base/manifest-macro/src/lib.rs
🧰 Additional context used
🧠 Learnings (1)
wren-core/core/src/logical_plan/analyze/access_control.rs (2)
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: wren-core/core/src/logical_plan/analyze/access_control.rs:0-0
Timestamp: 2025-04-30T01:15:15.009Z
Learning: In the row-level access control implementation, separate error checks are maintained for different failure modes (missing property vs null vs empty) to provide more precise and actionable error messages, even if it means slightly more verbose code with multiple Option checks.
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: wren-core/core/src/logical_plan/analyze/access_control.rs:0-0
Timestamp: 2025-04-30T01:18:21.776Z
Learning: In the `collect_condition` function of the row-level access control implementation, compound identifiers are intentionally ignored rather than causing failures when processing expressions. This is by design as confirmed by the team.
🧬 Code Graph Analysis (2)
ibis-server/app/util.py (1)
ibis-server/app/dependencies.py (1)
  • exist_wren_variables_header (42-47)
ibis-server/app/dependencies.py (1)
ibis-server/app/util.py (1)
  • exist_wren_variables_header (152-157)
🪛 Ruff (0.8.2)
ibis-server/app/dependencies.py

39-39: Undefined name request

(F821)

ibis-server/app/routers/v3/connector.py

35-35: Redefinition of unused exist_wren_variables_header from line 12

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo check
  • GitHub Check: ci
  • GitHub Check: test
🔇 Additional comments (23)
wren-core/core/src/logical_plan/analyze/access_control.rs (8)

42-62: Well-implemented column and session property extraction.

The code effectively extracts and validates model columns while collecting session properties from the condition expression. Good use of a HashSet to avoid duplicate entries for session properties.


161-167: Good security practice with expression validation.

The parsing of expressions with subsequent validation via prevent_invalid_expr is an excellent security practice that prevents injection of arbitrary code through session properties.


171-195: Well-structured expression validation with recursion for complex data types.

The implementation properly handles validation for nested data structures (arrays, maps, dictionaries) by recursively checking each element. This thorough validation ensures that only literal values are accepted for session properties.


209-220: Comprehensive rule validation with clear handling of optional properties.

The logic for checking required vs. optional properties with default expressions is well-structured. It's particularly good to see the explicit check for empty default expressions.


266-295: Thorough test case for session property collection.

The test properly verifies extraction of session properties from various condition formats, including with comments and newlines. Good verification of model column extraction as well.


308-462: Comprehensive test suite for rule validation.

These tests cover all important scenarios:

  • Required properties (present, missing, empty)
  • Optional properties with defaults (present, missing)
  • Optional properties without defaults
  • Multiple property combinations

This level of test coverage gives high confidence in the implementation.


474-554: Good test cases for filter expression building.

The tests cover the key scenarios of building filter expressions with both required and optional properties, including error cases. The snapshot testing provides a clear expected output for regression testing.


589-654: Thorough validation of property value restrictions.

The tests comprehensively verify that only literal values are allowed in session properties, preventing any form of code injection. Good to see both valid and invalid cases tested.

wren-core/core/src/mdl/mod.rs (7)

330-345: Good addition of session properties parameter.

The transform_sql function has been properly updated to accept session properties, maintaining backward compatibility by not making it a required parameter with a default value.


347-366: Efficient handling of session properties.

The properties are properly wrapped in an Arc for efficient sharing across the async context. The approach ensures session properties are propagated appropriately through the transformation pipeline.


1688-1802: Comprehensive test coverage for required RLAC properties.

The tests thoroughly verify all aspects of required property handling:

  • Property presence validation
  • Filter application
  • Multiple rules
  • Join queries
  • Error handling for missing properties

This gives good confidence in the implementation's correctness.


1883-2043: Good testing of optional properties with different default scenarios.

The tests cover important cases for optional properties:

  • With values provided
  • With default values when not provided
  • Without default values
  • Empty default values
  • Multiple optional properties

This testing ensures the feature works reliably across different configurations.


2169-2198: Case insensitivity test ensures property name flexibility.

The test verifies that session property names are matched case-insensitively, making the API more user-friendly. This is a good usability feature.


2046-2166: Test for RLAC with calculated fields and join relationship.

Testing RLAC on a calculated field that relates to another table through a relationship is important for complex query scenarios. The commented TODO section indicates awareness of additional scenarios to be handled in the future.


2271-2279: Useful helper function for creating test headers.

The build_headers function simplifies test setup by providing a clean way to construct the session property map for different test scenarios.

ibis-server/app/dependencies.py (1)

8-8: New constant for RLAC header prefix added.

This constant X_WREN_VARIABLE_PREFIX defines the prefix for headers that will carry session properties for row-level access control (RLAC).

ibis-server/app/routers/v3/connector.py (7)

155-158: RLAC header check added to prevent fallback to v2 API.

This addition prevents fallback to the v2 API when RLAC headers are present, as the v2 API doesn't support row-level access control. This is an important safety check.


188-190: Passing header properties to Rewriter for dry-plan.

Consistent with the other endpoints, properties from headers are now passed to the Rewriter. This ensures RLAC is applied consistently across all API functions.


196-199: Consistent RLAC header check pattern.

The same pattern to prevent v2 API fallback when RLAC headers exist is applied here, ensuring consistent behavior across endpoints.


228-233: Properties from headers passed to Rewriter for dry-plan with data source.

Consistent with the other endpoints, properties from headers are passed to the Rewriter for data source-specific dry plans.


239-242: RLAC header check for data source-specific validation.

The consistent pattern to prevent v2 API fallback when RLAC headers exist is applied here as well.


275-280: Properties from headers passed to Rewriter for validation.

Session properties are now passed to the Rewriter for validation functions, ensuring RLAC is consistently applied.


289-292: RLAC header check for validation endpoint.

The consistent pattern to prevent v2 API fallback when RLAC headers exist is maintained here.

@goldmedal goldmedal force-pushed the feat/rls-refactor branch from dd689c7 to 15e307b Compare May 6, 2025 01:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
wren-core/core/src/logical_plan/analyze/access_control.rs (3)

42-43: Consider implementing CompoundIdentifier handling as mentioned in TODO

The TODO comment correctly identifies a limitation in the current implementation. Not handling CompoundIdentifier means qualified column references like table.column will be ignored in the condition parsing.

Based on retrieved learnings, I understand this is intentional behavior as confirmed by the team. This approach of ignoring compound identifiers rather than failing is a deliberate design choice.


126-133: Inconsistency between validation and filter construction whitespace handling

The function uses trim().is_empty() to check for whitespace-only values, while is_property_present at line 239 uses !value.is_empty() without trimming.

This creates a potential issue where a whitespace-only header value (" ") would pass validation in validate_rule but fail here during filter construction.


234-241: Case-insensitive lookup relies on header keys already being lowercased

The function converts property_name to lowercase but performs a direct HashMap lookup, which assumes the header keys are already normalized to lowercase elsewhere.

To ensure truly case-insensitive matching, either normalize headers at creation time or use an approach that isn't dependent on exact case matching in the HashMap keys.

🧹 Nitpick comments (1)
wren-core/core/src/mdl/mod.rs (1)

2112-2165: Consider implementing relationship inheritance for RLAC rules

The commented-out test case suggests a future enhancement where RLAC rules would automatically apply to related models when accessed through calculated fields. This would be a valuable addition for consistent security enforcement across related entities.

When implemented, this would allow a rule on a parent model (e.g., customer) to automatically apply when that model is accessed through a relationship (e.g., orders.customer), which would create more consistent access control.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd689c7 and 15e307b.

⛔ Files ignored due to path filters (3)
  • wren-core/wren-example/data/company/documents.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/tenants.csv is excluded by !**/*.csv
  • wren-core/wren-example/data/company/users.csv is excluded by !**/*.csv
📒 Files selected for processing (30)
  • ibis-server/app/dependencies.py (2 hunks)
  • ibis-server/app/mdl/rewriter.py (5 hunks)
  • ibis-server/app/routers/v3/connector.py (7 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (3 hunks)
  • wren-core-base/manifest-macro/src/lib.rs (6 hunks)
  • wren-core-base/src/mdl/builder.rs (8 hunks)
  • wren-core-base/src/mdl/manifest.rs (5 hunks)
  • wren-core-base/tests/data/mdl.json (1 hunks)
  • wren-core-py/src/context.rs (3 hunks)
  • wren-core-py/src/manifest.rs (2 hunks)
  • wren-core-py/tests/test_modeling_core.py (2 hunks)
  • wren-core/Cargo.toml (1 hunks)
  • wren-core/benchmarks/src/tpch/run.rs (2 hunks)
  • wren-core/core/Cargo.toml (1 hunks)
  • wren-core/core/src/logical_plan/analyze/access_control.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/mod.rs (1 hunks)
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs (5 hunks)
  • wren-core/core/src/logical_plan/analyze/model_generation.rs (7 hunks)
  • wren-core/core/src/logical_plan/analyze/plan.rs (15 hunks)
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs (6 hunks)
  • wren-core/core/src/mdl/context.rs (7 hunks)
  • wren-core/core/src/mdl/mod.rs (42 hunks)
  • wren-core/sqllogictest/src/test_context.rs (2 hunks)
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs (2 hunks)
  • wren-core/wren-example/examples/datafusion-apply.rs (1 hunks)
  • wren-core/wren-example/examples/plan-sql.rs (2 hunks)
  • wren-core/wren-example/examples/row-level-access-control.rs (1 hunks)
  • wren-core/wren-example/examples/to-many-calculation.rs (1 hunks)
  • wren-core/wren-example/examples/view.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (28)
  • wren-core-py/tests/test_modeling_core.py
  • wren-core/core/Cargo.toml
  • wren-core-py/src/manifest.rs
  • wren-core/core/src/logical_plan/analyze/mod.rs
  • wren-core/Cargo.toml
  • wren-core/benchmarks/src/tpch/run.rs
  • wren-core/wren-example/examples/to-many-calculation.rs
  • ibis-server/app/dependencies.py
  • wren-core/wren-example/examples/plan-sql.rs
  • wren-core/wren-example/examples/datafusion-apply.rs
  • wren-core/sqllogictest/src/test_context.rs
  • wren-core/wren-example/examples/calculation-invoke-calculation.rs
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py
  • wren-core-base/tests/data/mdl.json
  • wren-core/wren-example/examples/view.rs
  • wren-core-base/src/mdl/manifest.rs
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py
  • wren-core/core/src/logical_plan/analyze/model_anlayze.rs
  • wren-core/core/src/logical_plan/analyze/relation_chain.rs
  • ibis-server/app/routers/v3/connector.py
  • wren-core-base/src/mdl/builder.rs
  • wren-core/wren-example/examples/row-level-access-control.rs
  • wren-core/core/src/logical_plan/analyze/model_generation.rs
  • wren-core-py/src/context.rs
  • wren-core/core/src/mdl/context.rs
  • ibis-server/app/mdl/rewriter.py
  • wren-core-base/manifest-macro/src/lib.rs
  • wren-core/core/src/logical_plan/analyze/plan.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: ibis-server/app/routers/v3/connector.py:78-83
Timestamp: 2025-05-05T02:27:29.829Z
Learning: The row-level access control implementation in Wren Engine filters headers with the prefix `X_WREN_VARIABLE_PREFIX` in `EmbeddedEngineRewriter.get_session_properties` and validates session property expressions in `access_control.rs` to ensure they only contain literal values, preventing SQL injection.
wren-core/core/src/logical_plan/analyze/access_control.rs (2)
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: wren-core/core/src/logical_plan/analyze/access_control.rs:0-0
Timestamp: 2025-04-30T01:15:15.009Z
Learning: In the row-level access control implementation, separate error checks are maintained for different failure modes (missing property vs null vs empty) to provide more precise and actionable error messages, even if it means slightly more verbose code with multiple Option checks.
Learnt from: goldmedal
PR: Canner/wren-engine#1161
File: wren-core/core/src/logical_plan/analyze/access_control.rs:0-0
Timestamp: 2025-04-30T01:18:21.776Z
Learning: In the `collect_condition` function of the row-level access control implementation, compound identifiers are intentionally ignored rather than causing failures when processing expressions. This is by design as confirmed by the team.
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: cargo test (amd64)
  • GitHub Check: check Cargo.toml formatting
  • GitHub Check: clippy
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: ci
  • GitHub Check: test
🔇 Additional comments (13)
wren-core/core/src/logical_plan/analyze/access_control.rs (6)

33-34: Good use of HashSet for de-duplication

Using HashSet for both conditions and session_properties ensures unique entries are collected, avoiding redundant query processing and making the code more efficient.


102-110: Property fallback logic handles case-insensitive matching correctly

The property name handling converts to lowercase and uses eq_ignore_ascii_case when matching against required properties, ensuring case-insensitive matching works properly.


171-195: Strong security protection against SQL injection

The prevent_invalid_expr function is well-designed to allow only literal values in session properties, properly preventing SQL injection attacks by rejecting any dynamic or computed expressions.


209-225: Well-structured rule validation with clear semantics

The validation logic clearly differentiates between required and optional properties, with comprehensive checks that align with the documented behavior in the function comment. The error messages are specific and actionable.


585-587: Comprehensive testing of case insensitivity

The test verifies that session property matching works correctly with mixed case in properties, both in conditions (@SESSION_NAME) and required properties (SESSION_ID), ensuring the feature is robust against case variations.


599-654: Thorough validation of property value security constraints

The tests comprehensively verify that only literal expressions are accepted as session property values, with proper rejection of any potentially dangerous expressions like functions or subqueries that could enable SQL injection.

wren-core/core/src/mdl/mod.rs (7)

332-345: Modified transform_sql to include session properties parameter

The transform_sql function now accepts session properties as a parameter, propagating them to the underlying asynchronous implementation. This maintains the existing synchronous API while adding RLAC support.


351-366: Property integration with context creation

The function now wraps properties in an Arc and passes them to context creation, ensuring session properties are available throughout the query analysis and execution pipeline.


1689-1822: Comprehensive testing of required property validation

The tests thoroughly verify that RLAC rules with required properties:

  1. Correctly filter when properties are present
  2. Fail appropriately when required properties are missing
  3. Work with multiple rules and properties
  4. Apply correctly in JOIN scenarios

This ensures the feature is robust for real-world usage.


1883-2043: Thorough testing of optional properties with fallback behavior

The tests verify that optional property handling works correctly for:

  1. Properties with default values
  2. Properties without default values
  3. Mixed required and optional properties
  4. Empty string default values

This ensures the feature supports flexible access control policies.


2046-2108: Testing RLAC on calculated fields with relationships

The tests verify that RLAC works correctly when applied to calculated fields involving JOINs, ensuring that the feature integrates properly with Wren's relationship model.


2169-2198: Case-insensitive property name matching verified

The test confirms that property names are matched case-insensitively, allowing HTTP headers like "SESSION_NATION" to match properties defined as "session_nation", which improves usability across different client environments.


2272-2279: Utility function for test header creation

The build_headers utility function creates a consistent format for session properties used in tests, simplifying test setup and making the code more maintainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci core dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants