fix(security): enforce datasource access control in get_samples() by rusackas · Pull Request #36550 · apache/superset

rusackas · 2025-12-11T23:14:37Z

SUMMARY

This PR fixes a security vulnerability (issue #31944) where users with "can samples on Datasource" permission could read data samples from datasets they don't have proper access to.

Root Cause:
The get_samples() function in superset/views/datasource/utils.py was creating QueryContext instances and calling get_payload() directly without first enforcing access control through raise_for_access().

This allowed users who only had the "can samples on Datasource" permission to bypass datasource-level security checks and read samples from datasets they shouldn't have access to.

Fix:
Added raise_for_access() calls on both the samples and count_star query contexts before fetching any data. This ensures users must have proper datasource access (schema access, datasource access, or ownership) before samples can be retrieved.

Code Change:

try:
    # Enforce access control before fetching data.
    # This prevents users with "can samples on Datasource" permission from
    # reading samples from datasets they don't have access to.
    samples_instance.raise_for_access()
    count_star_instance.raise_for_access()

    count_star_data = count_star_instance.get_payload()["queries"][0]
    # ... rest of the function

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A - This is a security fix with no UI changes.

TESTING INSTRUCTIONS

Create a user with only "can samples on Datasource" permission
Create a dataset that the user does NOT have access to (no datasource_access, schema_access, or ownership)
Before this fix: User could access /datasource/samples?datasource_id=<id>&datasource_type=table and retrieve samples
After this fix: User receives a 403/401 error when attempting to access samples from datasets they don't have permission to access

Unit Tests:
The PR includes comprehensive unit tests that verify:

raise_for_access() is called before fetching data
Security exceptions are properly raised when access is denied
Both samples and count_star query contexts are checked for access

ADDITIONAL INFORMATION

Has associated issue: Fixes get_samples() doesn't raise for access #31944
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

🤖 Generated with Claude Code

This commit fixes a security vulnerability (issue #31944) where users with "can samples on Datasource" permission could read data samples from datasets they don't have access to. The get_samples() function was creating QueryContext instances and calling get_payload() directly without first enforcing access control. This allowed users to bypass datasource-level security checks. The fix adds raise_for_access() calls on both the samples and count_star query contexts before fetching any data. This ensures users must have proper datasource access before samples can be retrieved. Fixes #31944 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The original tests tried to patch Flask's current_app proxy, but this caused issues with MagicMock returning unexpected values (coroutines instead of integers) when comparing in get_limit_clause. By mocking get_limit_clause directly, we avoid Flask app context complexities and make the tests more focused on testing the actual access control logic. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes a critical security vulnerability where users with only "can samples on Datasource" permission could bypass datasource access controls to read data samples from datasets they shouldn't have access to. The fix adds raise_for_access() calls to enforce proper access control before any data is retrieved.

Key changes:

Added access control enforcement via raise_for_access() on both samples and count_star query contexts before fetching data
Introduced comprehensive unit tests to verify the security fix works correctly

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
superset/views/datasource/utils.py	Added `raise_for_access()` calls to enforce datasource access control before executing queries
tests/unit_tests/views/datasource/utils_test.py	Added unit tests verifying that access control is properly enforced and security exceptions are raised when access is denied
tests/unit_tests/views/datasource/init.py	Added package initialization file with Apache license header

Copilot · 2026-01-06T18:58:38Z

+def test_get_samples_calls_raise_for_access_on_both_contexts(
+    mock_get_limit_clause: MagicMock,
+):
+    """
+    Test that get_samples() calls raise_for_access() on both the samples
+    and count_star query contexts before fetching data.
+    """
+    mock_get_limit_clause.return_value = {"row_offset": 0, "row_limit": 100}
+
+    mock_datasource = MagicMock()
+    mock_datasource.type = "table"
+    mock_datasource.id = 1
+    mock_datasource.columns = []
+
+    mock_samples_context = MagicMock()
+    mock_count_context = MagicMock()
+
+    # Set up successful access check
+    mock_samples_context.raise_for_access.return_value = None
+    mock_count_context.raise_for_access.return_value = None
+
+    # Set up successful payload responses
+    mock_count_context.get_payload.return_value = {
+        "queries": [{"data": [{"COUNT(*)": 100}], "status": "success"}]
+    }
+    mock_samples_context.get_payload.return_value = {
+        "queries": [
+            {
+                "data": [{"col1": "val1"}],
+                "status": "success",
+                "cache_key": "test_key",
+            }
+        ]
+    }
+
+    with (
+        patch(
+            "superset.views.datasource.utils.DatasourceDAO.get_datasource",
+            return_value=mock_datasource,
+        ),
+        patch(
+            "superset.views.datasource.utils.QueryContextFactory"
+        ) as mock_factory_class,
+    ):
+        mock_factory = MagicMock()
+        mock_factory_class.return_value = mock_factory
+
+        # Return different mock contexts for samples vs count queries
+        mock_factory.create.side_effect = [mock_samples_context, mock_count_context]
+
+        from superset.views.datasource.utils import get_samples
+
+        result = get_samples(
+            datasource_type="table",
+            datasource_id=1,
+            force=False,
+            page=1,
+            per_page=100,
+        )
+
+        # Verify both contexts had raise_for_access called
+        mock_samples_context.raise_for_access.assert_called_once()
+        mock_count_context.raise_for_access.assert_called_once()
+
+        # Verify the result contains expected data
+        assert result["data"] == [{"col1": "val1"}]
+        assert result["total_count"] == 100
+


The test suite should include a test case that verifies access control enforcement when the payload parameter is provided. When payload is not None, get_samples() follows a different code path (DRILL_DETAIL result type instead of SAMPLES), and this security-critical path should also be tested to ensure raise_for_access() is called correctly in both scenarios.

bito-code-review · 2026-01-06T19:00:31Z

The suggestion is valid. The current tests verify access control when payload is None, but to ensure raise_for_access() is called correctly in the DRILL_DETAIL code path when payload is not None, a test case should be added.

dpgaspar

Nice!

…6550) Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit 861e5cd)

…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>

…ache#36550) Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit 861e5cd)

- development-setup.md: update Node.js prerequisite from v20 to v22 LTS (PR #37223 — project minimum Node version upgraded) - importing-exporting-datasources.mdx: document that masked_encrypted_extra (sensitive connection parameters like service account JSON) is now included in database import/export (PR #38078) - security.mdx: add note that get_samples() enforces datasource-level access control, closing prior gap where unprivileged users could fetch sample rows (PR #36550) - exploring-data.mdx: document natural language time range expressions including new "first of" expressions (first day/week of month/quarter/year) supported in the Custom time range picker (PR #37098) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>

pull-request-size Bot added the size/L label Dec 11, 2025

dosubot Bot added the authentication:access-control Rlated to access control label Dec 11, 2025

rusackas requested review from betodealmeida, dpgaspar, michael-s-molina and mistercrunch December 15, 2025 18:32

rusackas requested a review from Copilot January 6, 2026 18:55

Copilot started reviewing on behalf of rusackas January 6, 2026 18:56 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

dpgaspar approved these changes Jan 7, 2026

View reviewed changes

rusackas merged commit 861e5cd into master Jan 7, 2026
75 of 76 checks passed

rusackas deleted the fix-31944-samples-access-control branch January 7, 2026 16:54

sadpandajoe pushed a commit that referenced this pull request Jan 7, 2026

fix(security): enforce datasource access control in get_samples() (#3…

c4fd575

…6550) Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit 861e5cd)

sadpandajoe added the v6.0 Label added by the release manager to track PRs to be included in the 6.0 branch label Jan 7, 2026

aminghadersohi pushed a commit to aminghadersohi/superset that referenced this pull request Jan 7, 2026

fix(security): enforce datasource access control in get_samples() (ap…

7ea756b

…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>

jesperct pushed a commit to jesperct/superset that referenced this pull request Jan 8, 2026

fix(security): enforce datasource access control in get_samples() (ap…

61d79e8

…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>

Vitor-Avila mentioned this pull request Apr 9, 2026

fix: Drill to Detail for Embedded #39214

Merged

9 tasks

rusackas mentioned this pull request Apr 17, 2026

docs: Superset 6.1 documentation catch-up — batch 3 #39445

Merged

6 tasks

qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026

fix(security): enforce datasource access control in get_samples() (ap…

4f9976e

…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): enforce datasource access control in get_samples()#36550

fix(security): enforce datasource access control in get_samples()#36550
rusackas merged 2 commits into
masterfrom
fix-31944-samples-access-control

rusackas commented Dec 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

bito-code-review Bot commented Jan 6, 2026

Uh oh!

dpgaspar left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rusackas commented Dec 11, 2025

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

bito-code-review Bot commented Jan 6, 2026

Uh oh!

dpgaspar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants