fix(security): enforce datasource access control in get_samples()#36550
Conversation
This commit fixes a security vulnerability (issue #31944) where users with "can samples on Datasource" permission could read data samples from datasets they don't have access to. The get_samples() function was creating QueryContext instances and calling get_payload() directly without first enforcing access control. This allowed users to bypass datasource-level security checks. The fix adds raise_for_access() calls on both the samples and count_star query contexts before fetching any data. This ensures users must have proper datasource access before samples can be retrieved. Fixes #31944 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
The original tests tried to patch Flask's current_app proxy, but this caused issues with MagicMock returning unexpected values (coroutines instead of integers) when comparing in get_limit_clause. By mocking get_limit_clause directly, we avoid Flask app context complexities and make the tests more focused on testing the actual access control logic. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical security vulnerability where users with only "can samples on Datasource" permission could bypass datasource access controls to read data samples from datasets they shouldn't have access to. The fix adds raise_for_access() calls to enforce proper access control before any data is retrieved.
Key changes:
- Added access control enforcement via
raise_for_access()on both samples and count_star query contexts before fetching data - Introduced comprehensive unit tests to verify the security fix works correctly
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| superset/views/datasource/utils.py | Added raise_for_access() calls to enforce datasource access control before executing queries |
| tests/unit_tests/views/datasource/utils_test.py | Added unit tests verifying that access control is properly enforced and security exceptions are raised when access is denied |
| tests/unit_tests/views/datasource/init.py | Added package initialization file with Apache license header |
| def test_get_samples_calls_raise_for_access_on_both_contexts( | ||
| mock_get_limit_clause: MagicMock, | ||
| ): | ||
| """ | ||
| Test that get_samples() calls raise_for_access() on both the samples | ||
| and count_star query contexts before fetching data. | ||
| """ | ||
| mock_get_limit_clause.return_value = {"row_offset": 0, "row_limit": 100} | ||
|
|
||
| mock_datasource = MagicMock() | ||
| mock_datasource.type = "table" | ||
| mock_datasource.id = 1 | ||
| mock_datasource.columns = [] | ||
|
|
||
| mock_samples_context = MagicMock() | ||
| mock_count_context = MagicMock() | ||
|
|
||
| # Set up successful access check | ||
| mock_samples_context.raise_for_access.return_value = None | ||
| mock_count_context.raise_for_access.return_value = None | ||
|
|
||
| # Set up successful payload responses | ||
| mock_count_context.get_payload.return_value = { | ||
| "queries": [{"data": [{"COUNT(*)": 100}], "status": "success"}] | ||
| } | ||
| mock_samples_context.get_payload.return_value = { | ||
| "queries": [ | ||
| { | ||
| "data": [{"col1": "val1"}], | ||
| "status": "success", | ||
| "cache_key": "test_key", | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| with ( | ||
| patch( | ||
| "superset.views.datasource.utils.DatasourceDAO.get_datasource", | ||
| return_value=mock_datasource, | ||
| ), | ||
| patch( | ||
| "superset.views.datasource.utils.QueryContextFactory" | ||
| ) as mock_factory_class, | ||
| ): | ||
| mock_factory = MagicMock() | ||
| mock_factory_class.return_value = mock_factory | ||
|
|
||
| # Return different mock contexts for samples vs count queries | ||
| mock_factory.create.side_effect = [mock_samples_context, mock_count_context] | ||
|
|
||
| from superset.views.datasource.utils import get_samples | ||
|
|
||
| result = get_samples( | ||
| datasource_type="table", | ||
| datasource_id=1, | ||
| force=False, | ||
| page=1, | ||
| per_page=100, | ||
| ) | ||
|
|
||
| # Verify both contexts had raise_for_access called | ||
| mock_samples_context.raise_for_access.assert_called_once() | ||
| mock_count_context.raise_for_access.assert_called_once() | ||
|
|
||
| # Verify the result contains expected data | ||
| assert result["data"] == [{"col1": "val1"}] | ||
| assert result["total_count"] == 100 | ||
|
|
There was a problem hiding this comment.
The test suite should include a test case that verifies access control enforcement when the payload parameter is provided. When payload is not None, get_samples() follows a different code path (DRILL_DETAIL result type instead of SAMPLES), and this security-critical path should also be tested to ensure raise_for_access() is called correctly in both scenarios.
|
The suggestion is valid. The current tests verify access control when payload is None, but to ensure raise_for_access() is called correctly in the DRILL_DETAIL code path when payload is not None, a test case should be added. |
…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>
…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>
…ache#36550) Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit 861e5cd)
- development-setup.md: update Node.js prerequisite from v20 to v22 LTS (PR #37223 — project minimum Node version upgraded) - importing-exporting-datasources.mdx: document that masked_encrypted_extra (sensitive connection parameters like service account JSON) is now included in database import/export (PR #38078) - security.mdx: add note that get_samples() enforces datasource-level access control, closing prior gap where unprivileged users could fetch sample rows (PR #36550) - exploring-data.mdx: document natural language time range expressions including new "first of" expressions (first day/week of month/quarter/year) supported in the Custom time range picker (PR #37098) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ache#36550) Co-authored-by: Claude <noreply@anthropic.com>
SUMMARY
This PR fixes a security vulnerability (issue #31944) where users with "can samples on Datasource" permission could read data samples from datasets they don't have proper access to.
Root Cause:
The
get_samples()function insuperset/views/datasource/utils.pywas creatingQueryContextinstances and callingget_payload()directly without first enforcing access control throughraise_for_access().This allowed users who only had the "can samples on Datasource" permission to bypass datasource-level security checks and read samples from datasets they shouldn't have access to.
Fix:
Added
raise_for_access()calls on both the samples and count_star query contexts before fetching any data. This ensures users must have proper datasource access (schema access, datasource access, or ownership) before samples can be retrieved.Code Change:
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - This is a security fix with no UI changes.
TESTING INSTRUCTIONS
/datasource/samples?datasource_id=<id>&datasource_type=tableand retrieve samplesUnit Tests:
The PR includes comprehensive unit tests that verify:
raise_for_access()is called before fetching dataADDITIONAL INFORMATION
🤖 Generated with Claude Code