Skip to content

fix(security): enforce datasource access control in get_samples()#36550

Merged
rusackas merged 2 commits into
masterfrom
fix-31944-samples-access-control
Jan 7, 2026
Merged

fix(security): enforce datasource access control in get_samples()#36550
rusackas merged 2 commits into
masterfrom
fix-31944-samples-access-control

Conversation

@rusackas
Copy link
Copy Markdown
Member

SUMMARY

This PR fixes a security vulnerability (issue #31944) where users with "can samples on Datasource" permission could read data samples from datasets they don't have proper access to.

Root Cause:
The get_samples() function in superset/views/datasource/utils.py was creating QueryContext instances and calling get_payload() directly without first enforcing access control through raise_for_access().

This allowed users who only had the "can samples on Datasource" permission to bypass datasource-level security checks and read samples from datasets they shouldn't have access to.

Fix:
Added raise_for_access() calls on both the samples and count_star query contexts before fetching any data. This ensures users must have proper datasource access (schema access, datasource access, or ownership) before samples can be retrieved.

Code Change:

try:
    # Enforce access control before fetching data.
    # This prevents users with "can samples on Datasource" permission from
    # reading samples from datasets they don't have access to.
    samples_instance.raise_for_access()
    count_star_instance.raise_for_access()

    count_star_data = count_star_instance.get_payload()["queries"][0]
    # ... rest of the function

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A - This is a security fix with no UI changes.

TESTING INSTRUCTIONS

  1. Create a user with only "can samples on Datasource" permission
  2. Create a dataset that the user does NOT have access to (no datasource_access, schema_access, or ownership)
  3. Before this fix: User could access /datasource/samples?datasource_id=<id>&datasource_type=table and retrieve samples
  4. After this fix: User receives a 403/401 error when attempting to access samples from datasets they don't have permission to access

Unit Tests:
The PR includes comprehensive unit tests that verify:

  • raise_for_access() is called before fetching data
  • Security exceptions are properly raised when access is denied
  • Both samples and count_star query contexts are checked for access

ADDITIONAL INFORMATION

  • Has associated issue: Fixes get_samples() doesn't raise for access #31944
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

🤖 Generated with Claude Code

This commit fixes a security vulnerability (issue #31944) where users
with "can samples on Datasource" permission could read data samples
from datasets they don't have access to.

The get_samples() function was creating QueryContext instances and
calling get_payload() directly without first enforcing access control.
This allowed users to bypass datasource-level security checks.

The fix adds raise_for_access() calls on both the samples and count_star
query contexts before fetching any data. This ensures users must have
proper datasource access before samples can be retrieved.

Fixes #31944

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dosubot dosubot Bot added the authentication:access-control Rlated to access control label Dec 11, 2025
The original tests tried to patch Flask's current_app proxy, but this
caused issues with MagicMock returning unexpected values (coroutines
instead of integers) when comparing in get_limit_clause.

By mocking get_limit_clause directly, we avoid Flask app context
complexities and make the tests more focused on testing the actual
access control logic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical security vulnerability where users with only "can samples on Datasource" permission could bypass datasource access controls to read data samples from datasets they shouldn't have access to. The fix adds raise_for_access() calls to enforce proper access control before any data is retrieved.

Key changes:

  • Added access control enforcement via raise_for_access() on both samples and count_star query contexts before fetching data
  • Introduced comprehensive unit tests to verify the security fix works correctly

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
superset/views/datasource/utils.py Added raise_for_access() calls to enforce datasource access control before executing queries
tests/unit_tests/views/datasource/utils_test.py Added unit tests verifying that access control is properly enforced and security exceptions are raised when access is denied
tests/unit_tests/views/datasource/init.py Added package initialization file with Apache license header

Comment on lines +89 to +156
def test_get_samples_calls_raise_for_access_on_both_contexts(
mock_get_limit_clause: MagicMock,
):
"""
Test that get_samples() calls raise_for_access() on both the samples
and count_star query contexts before fetching data.
"""
mock_get_limit_clause.return_value = {"row_offset": 0, "row_limit": 100}

mock_datasource = MagicMock()
mock_datasource.type = "table"
mock_datasource.id = 1
mock_datasource.columns = []

mock_samples_context = MagicMock()
mock_count_context = MagicMock()

# Set up successful access check
mock_samples_context.raise_for_access.return_value = None
mock_count_context.raise_for_access.return_value = None

# Set up successful payload responses
mock_count_context.get_payload.return_value = {
"queries": [{"data": [{"COUNT(*)": 100}], "status": "success"}]
}
mock_samples_context.get_payload.return_value = {
"queries": [
{
"data": [{"col1": "val1"}],
"status": "success",
"cache_key": "test_key",
}
]
}

with (
patch(
"superset.views.datasource.utils.DatasourceDAO.get_datasource",
return_value=mock_datasource,
),
patch(
"superset.views.datasource.utils.QueryContextFactory"
) as mock_factory_class,
):
mock_factory = MagicMock()
mock_factory_class.return_value = mock_factory

# Return different mock contexts for samples vs count queries
mock_factory.create.side_effect = [mock_samples_context, mock_count_context]

from superset.views.datasource.utils import get_samples

result = get_samples(
datasource_type="table",
datasource_id=1,
force=False,
page=1,
per_page=100,
)

# Verify both contexts had raise_for_access called
mock_samples_context.raise_for_access.assert_called_once()
mock_count_context.raise_for_access.assert_called_once()

# Verify the result contains expected data
assert result["data"] == [{"col1": "val1"}]
assert result["total_count"] == 100

Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite should include a test case that verifies access control enforcement when the payload parameter is provided. When payload is not None, get_samples() follows a different code path (DRILL_DETAIL result type instead of SAMPLES), and this security-critical path should also be tested to ensure raise_for_access() is called correctly in both scenarios.

Copilot uses AI. Check for mistakes.
@bito-code-review
Copy link
Copy Markdown
Contributor

The suggestion is valid. The current tests verify access control when payload is None, but to ensure raise_for_access() is called correctly in the DRILL_DETAIL code path when payload is not None, a test case should be added.

Copy link
Copy Markdown
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@rusackas rusackas merged commit 861e5cd into master Jan 7, 2026
75 of 76 checks passed
@rusackas rusackas deleted the fix-31944-samples-access-control branch January 7, 2026 16:54
sadpandajoe pushed a commit that referenced this pull request Jan 7, 2026
…6550)

Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit 861e5cd)
@sadpandajoe sadpandajoe added the v6.0 Label added by the release manager to track PRs to be included in the 6.0 branch label Jan 7, 2026
aminghadersohi pushed a commit to aminghadersohi/superset that referenced this pull request Jan 7, 2026
jesperct pushed a commit to jesperct/superset that referenced this pull request Jan 8, 2026
isaac-jaynes-imperva pushed a commit to isaac-jaynes-imperva/superset that referenced this pull request Mar 3, 2026
…ache#36550)

Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit 861e5cd)
rusackas pushed a commit that referenced this pull request Apr 17, 2026
- development-setup.md: update Node.js prerequisite from v20 to v22 LTS
  (PR #37223 — project minimum Node version upgraded)
- importing-exporting-datasources.mdx: document that masked_encrypted_extra
  (sensitive connection parameters like service account JSON) is now included
  in database import/export (PR #38078)
- security.mdx: add note that get_samples() enforces datasource-level access
  control, closing prior gap where unprivileged users could fetch sample rows
  (PR #36550)
- exploring-data.mdx: document natural language time range expressions
  including new "first of" expressions (first day/week of month/quarter/year)
  supported in the Custom time range picker (PR #37098)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authentication:access-control Rlated to access control size/L v6.0 Label added by the release manager to track PRs to be included in the 6.0 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_samples() doesn't raise for access

4 participants