feat(uploadDataModel): add DataPreviewModal for file upload preview and integrate driver.js for guided tours#37791
Conversation
…nd integrate driver.js for guided tours
Code Review Agent Run #3aee67Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| if (lines.length === 0) { | ||
| return { columns: [], data: [] }; | ||
| } | ||
| const headerLine = lines[0]; |
There was a problem hiding this comment.
Suggestion: CSV files saved with a UTF-8 BOM (e.g., from Excel) will include the BOM character at the start of the first header cell, so the first column name in the preview will contain an invisible character and not match the actual column name used elsewhere; stripping a leading BOM from the header line before splitting avoids this mismatch. Update the header line extraction to remove a potential BOM character. [logic error]
Severity Level: Major ⚠️
- ⚠️ First CSV header in preview contains hidden BOM character.
- ⚠️ Affects CSV preview in upload-to-database modal.
- ⚠️ Can break string-based header comparisons or copies.| const headerLine = lines[0]; | |
| const headerLine = lines[0].replace(/^\uFEFF/, ''); |
Steps of Reproduction ✅
1. Create or export a CSV file from a common tool (for example, Excel) that saves CSVs
with a UTF-8 BOM, and ensure the file has at least one header row; this file will be used
as input to the CSV preview path in `parseCSV`
(`superset-frontend/src/features/databases/UploadDataModel/DataPreviewModal.tsx:36–59`).
2. In the UI, follow the PR steps: click "+" → "Upload CSV or Excel to database", upload
the BOM-encoded CSV file, and click "Preview data" to open `DataPreviewModal` with
`type='csv'` and `file` set (modal component defined at line 97).
3. The `useEffect` at lines 110–138 detects `type === 'csv'` and calls `parseCSV(file,
delimiter)`, which reads the file via `file.text()` and splits it into `lines` (lines
40–41); the first element, `lines[0]`, still contains the leading BOM character.
4. At line 45, `const headerLine = lines[0];` assigns the BOM-prefixed string directly,
and the subsequent `headerLine.split(delimiter)` at lines 46–48 produces a first column
name whose underlying value is `"\uFEFF<actualHeader>"`; this value is used as the column
title/dataIndex in `tableColumns` (lines 140–148), so the first header in the preview
table carries an invisible BOM character, which can cause subtle issues (e.g., copying the
header text or matching by string) compared with a BOM-free header.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/src/features/databases/UploadDataModel/DataPreviewModal.tsx
**Line:** 45:45
**Comment:**
*Logic Error: CSV files saved with a UTF-8 BOM (e.g., from Excel) will include the BOM character at the start of the first header cell, so the first column name in the preview will contain an invisible character and not match the actual column name used elsewhere; stripping a leading BOM from the header line before splitting avoids this mismatch. Update the header line extraction to remove a potential BOM character.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| if (!show || !file) { | ||
| setColumns([]); | ||
| setData([]); | ||
| setError(null); | ||
| return; | ||
| } | ||
| if (type === 'columnar') { | ||
| setError(t('Data preview is not available for Parquet files.')); | ||
| setColumns([]); | ||
| setData([]); | ||
| return; | ||
| } | ||
| setLoading(true); | ||
| setError(null); | ||
| const parse = | ||
| type === 'csv' ? parseCSV(file, delimiter) : parseExcel(file, sheetName); | ||
| parse | ||
| .then(({ columns: cols, data: parsedData }) => { | ||
| setColumns(cols); | ||
| setData(parsedData); | ||
| }) | ||
| .catch(err => { | ||
| setError(err?.message || t('Failed to parse file')); | ||
| setColumns([]); | ||
| setData([]); | ||
| }) | ||
| .finally(() => setLoading(false)); |
There was a problem hiding this comment.
Suggestion: The async parsing in the effect has no cancellation/guard, so if the user rapidly changes the file, closes the modal, or switches type while a previous parse is still in flight, the older promise can resolve later and overwrite state for the newer file or update state after unmount, causing incorrect previews and React warnings about state updates on unmounted components. Add a cancellation flag in the effect and check it in the then/catch/finally handlers, returning a cleanup function that flips this flag so outdated parses don't update state. [race condition]
Severity Level: Major ⚠️
- ⚠️ Data preview may briefly show stale file contents.
- ⚠️ Potential React warnings on rapid open/close of preview.
- ⚠️ Affects CSV/Excel preview in upload-to-database flow.| if (!show || !file) { | |
| setColumns([]); | |
| setData([]); | |
| setError(null); | |
| return; | |
| } | |
| if (type === 'columnar') { | |
| setError(t('Data preview is not available for Parquet files.')); | |
| setColumns([]); | |
| setData([]); | |
| return; | |
| } | |
| setLoading(true); | |
| setError(null); | |
| const parse = | |
| type === 'csv' ? parseCSV(file, delimiter) : parseExcel(file, sheetName); | |
| parse | |
| .then(({ columns: cols, data: parsedData }) => { | |
| setColumns(cols); | |
| setData(parsedData); | |
| }) | |
| .catch(err => { | |
| setError(err?.message || t('Failed to parse file')); | |
| setColumns([]); | |
| setData([]); | |
| }) | |
| .finally(() => setLoading(false)); | |
| let cancelled = false; | |
| if (!show || !file) { | |
| setColumns([]); | |
| setData([]); | |
| setError(null); | |
| return; | |
| } | |
| if (type === 'columnar') { | |
| setError(t('Data preview is not available for Parquet files.')); | |
| setColumns([]); | |
| setData([]); | |
| return; | |
| } | |
| setLoading(true); | |
| setError(null); | |
| const parse = | |
| type === 'csv' ? parseCSV(file, delimiter) : parseExcel(file, sheetName); | |
| parse | |
| .then(({ columns: cols, data: parsedData }) => { | |
| if (cancelled) { | |
| return; | |
| } | |
| setColumns(cols); | |
| setData(parsedData); | |
| }) | |
| .catch(err => { | |
| if (cancelled) { | |
| return; | |
| } | |
| setError(err?.message || t('Failed to parse file')); | |
| setColumns([]); | |
| setData([]); | |
| }) | |
| .finally(() => { | |
| if (!cancelled) { | |
| setLoading(false); | |
| } | |
| }); | |
| return () => { | |
| cancelled = true; | |
| }; |
Steps of Reproduction ✅
1. In the UI follow the PR test steps: click "+" → "Upload CSV or Excel to database"
(feature entry point described in PR), then choose a CSV file and click the "Preview data"
button so that `DataPreviewModal` from
`superset-frontend/src/features/databases/UploadDataModel/DataPreviewModal.tsx:97` is
shown with `show=true` and `file` set.
2. While the preview is loading (the `Loading...` state rendered at lines 160–163 is
visible), quickly close the modal (parent sets `show=false` and unmounts
`DataPreviewModal`) or select a different file and re-open preview, which remounts the
component with a new `file` prop.
3. The original effect at lines 110–138 has already started an async parse via
`parseCSV`/`parseExcel` and stored the resulting `Promise` in `parse`, but it does not
register any cleanup or cancellation; when that promise resolves after unmount/remount,
the `.then`/`.catch` handlers still invoke `setColumns`, `setData`, and `setError`.
4. Because there is no `cancelled` guard or effect cleanup, the stale async handler can
(a) update state on an unmounted component (leading to React dev warnings about state
updates on unmounted components) or (b) overwrite the state for the newly mounted modal
instance with results from the previous file/type, producing an incorrect data preview for
the current upload session.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset-frontend/src/features/databases/UploadDataModel/DataPreviewModal.tsx
**Line:** 111:137
**Comment:**
*Race Condition: The async parsing in the effect has no cancellation/guard, so if the user rapidly changes the file, closes the modal, or switches type while a previous parse is still in flight, the older promise can resolve later and overwrite state for the newer file or update state after unmount, causing incorrect previews and React warnings about state updates on unmounted components. Add a cancellation flag in the effect and check it in the then/catch/finally handlers, returning a cleanup function that flips this flag so outdated parses don't update state.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Pull request overview
Improves the Upload Data Model modal UX by adding an onboarding tour (driver.js), a client-side “data preview” modal for CSV/Excel uploads, and additional UX safeguards (5MB preview limit + validation fix).
Changes:
- Add a driver.js guided tour for CSV/Excel/Columnar upload flows and a “?” trigger in the modal header.
- Add a DataPreviewModal with client-side parsing for CSV/Excel (row-limited), and disable preview/metadata fetch for files > 5MB.
- Update the uploader UI to use a drag-and-drop dropzone and fix file-required validation using a fileList ref.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| superset-frontend/src/features/databases/UploadDataModel/uploadDataModalTour.ts | Adds driver.js tour step definitions + tour start helper. |
| superset-frontend/src/features/databases/UploadDataModel/uploadDataModalTour.test.ts | Unit tests for tour configuration and selectors. |
| superset-frontend/src/features/databases/UploadDataModel/styles.ts | Adds styled wrapper for the new Upload.Dragger dropzone styling. |
| superset-frontend/src/features/databases/UploadDataModel/index.tsx | Integrates tour trigger/auto-start, preview modal, 5MB gating, drag-and-drop upload UI, and validation fix. |
| superset-frontend/src/features/databases/UploadDataModel/DataPreviewModal.tsx | Implements CSV/Excel client-side parsing and renders preview table in a modal. |
| superset-frontend/src/features/databases/UploadDataModel/DataPreviewModal.test.tsx | Adds unit tests for the preview modal (primarily CSV). |
| superset-frontend/src/features/databases/UploadDataModel/ColumnsPreview.tsx | Adds “file too large” messaging support for column preview section. |
| superset-frontend/package.json | Adds driver.js dependency. |
| superset-frontend/package-lock.json | Locks driver.js dependency. |
Files not reviewed (1)
- superset-frontend/package-lock.json: Language not supported
| const text = await file.text(); | ||
| const lines = text.split(/\r?\n/).filter(line => line.trim().length > 0); | ||
| if (lines.length === 0) { | ||
| return { columns: [], data: [] }; | ||
| } | ||
| const headerLine = lines[0]; | ||
| const columns = headerLine | ||
| .split(delimiter) | ||
| .map((c, i) => c.trim() || `col_${i}`); | ||
| const dataRows = lines.slice(1, PREVIEW_ROW_LIMIT + 1); | ||
| const data = dataRows.map(row => { | ||
| const values = row.split(delimiter); | ||
| const obj: Record<string, string> = {}; | ||
| columns.forEach((col, i) => { | ||
| obj[col] = values[i]?.trim() ?? ''; | ||
| }); | ||
| return obj; | ||
| }); |
There was a problem hiding this comment.
parseCSV uses a simple split(delimiter) approach, which will mis-parse common CSV cases (quoted fields containing delimiters/newlines, escaped quotes). This can produce incorrect column counts and misleading previews; consider using a standards-compliant CSV parser (or an existing utility in the codebase) for the preview parsing.
| const text = await file.text(); | |
| const lines = text.split(/\r?\n/).filter(line => line.trim().length > 0); | |
| if (lines.length === 0) { | |
| return { columns: [], data: [] }; | |
| } | |
| const headerLine = lines[0]; | |
| const columns = headerLine | |
| .split(delimiter) | |
| .map((c, i) => c.trim() || `col_${i}`); | |
| const dataRows = lines.slice(1, PREVIEW_ROW_LIMIT + 1); | |
| const data = dataRows.map(row => { | |
| const values = row.split(delimiter); | |
| const obj: Record<string, string> = {}; | |
| columns.forEach((col, i) => { | |
| obj[col] = values[i]?.trim() ?? ''; | |
| }); | |
| return obj; | |
| }); | |
| const Papa = (await import( | |
| /* webpackChunkName: "papaparse" */ 'papaparse' | |
| )).default; | |
| const text = await file.text(); | |
| const result = Papa.parse<Record<string, string>>(text, { | |
| delimiter, | |
| header: true, | |
| skipEmptyLines: 'greedy', | |
| preview: PREVIEW_ROW_LIMIT + 1, | |
| quoteChar: '"', | |
| escapeChar: '"', | |
| }); | |
| if (!result.data || result.data.length === 0) { | |
| return { columns: [], data: [] }; | |
| } | |
| const rawFields = result.meta.fields && result.meta.fields.length > 0 | |
| ? result.meta.fields | |
| : Object.keys(result.data[0] ?? {}); | |
| const columns = rawFields.map((c, i) => (c ?? '').trim() || `col_${i}`); | |
| const data = result.data.slice(0, PREVIEW_ROW_LIMIT).map(row => { | |
| const obj: Record<string, string> = {}; | |
| columns.forEach(col => { | |
| const value = row[col] ?? ''; | |
| obj[col] = String(value); | |
| }); | |
| return obj; | |
| }); |
| if (!show || !file) { | ||
| setColumns([]); | ||
| setData([]); | ||
| setError(null); | ||
| return; | ||
| } | ||
| if (type === 'columnar') { | ||
| setError(t('Data preview is not available for Parquet files.')); | ||
| setColumns([]); | ||
| setData([]); | ||
| return; | ||
| } | ||
| setLoading(true); | ||
| setError(null); | ||
| const parse = | ||
| type === 'csv' ? parseCSV(file, delimiter) : parseExcel(file, sheetName); | ||
| parse | ||
| .then(({ columns: cols, data: parsedData }) => { | ||
| setColumns(cols); | ||
| setData(parsedData); | ||
| }) | ||
| .catch(err => { | ||
| setError(err?.message || t('Failed to parse file')); | ||
| setColumns([]); | ||
| setData([]); | ||
| }) | ||
| .finally(() => setLoading(false)); |
There was a problem hiding this comment.
The useEffect parsing flow doesn’t guard against in-flight parse promises when show, file, delimiter, or sheetName change. If the modal is closed or a different file is selected while parsing, a late promise resolution can overwrite state with stale columns/data; add an effect cleanup/cancellation guard before calling setColumns/setData/setError/setLoading.
| if (!show || !file) { | |
| setColumns([]); | |
| setData([]); | |
| setError(null); | |
| return; | |
| } | |
| if (type === 'columnar') { | |
| setError(t('Data preview is not available for Parquet files.')); | |
| setColumns([]); | |
| setData([]); | |
| return; | |
| } | |
| setLoading(true); | |
| setError(null); | |
| const parse = | |
| type === 'csv' ? parseCSV(file, delimiter) : parseExcel(file, sheetName); | |
| parse | |
| .then(({ columns: cols, data: parsedData }) => { | |
| setColumns(cols); | |
| setData(parsedData); | |
| }) | |
| .catch(err => { | |
| setError(err?.message || t('Failed to parse file')); | |
| setColumns([]); | |
| setData([]); | |
| }) | |
| .finally(() => setLoading(false)); | |
| let cancelled = false; | |
| const resetState = () => { | |
| setColumns([]); | |
| setData([]); | |
| setError(null); | |
| setLoading(false); | |
| }; | |
| if (!show || !file) { | |
| resetState(); | |
| return () => { | |
| cancelled = true; | |
| }; | |
| } | |
| if (type === 'columnar') { | |
| setError(t('Data preview is not available for Parquet files.')); | |
| setColumns([]); | |
| setData([]); | |
| setLoading(false); | |
| return () => { | |
| cancelled = true; | |
| }; | |
| } | |
| setLoading(true); | |
| setError(null); | |
| const parse = | |
| type === 'csv' ? parseCSV(file, delimiter) : parseExcel(file, sheetName); | |
| parse | |
| .then(({ columns: cols, data: parsedData }) => { | |
| if (cancelled) { | |
| return; | |
| } | |
| setColumns(cols); | |
| setData(parsedData); | |
| }) | |
| .catch(err => { | |
| if (cancelled) { | |
| return; | |
| } | |
| setError(err?.message || t('Failed to parse file')); | |
| setColumns([]); | |
| setData([]); | |
| }) | |
| .finally(() => { | |
| if (cancelled) { | |
| return; | |
| } | |
| setLoading(false); | |
| }); | |
| return () => { | |
| cancelled = true; | |
| }; |
| import { Modal, Table, TableSize } from '@superset-ui/core/components'; | ||
| import type { ColumnsType } from 'antd/es/table'; | ||
| import type { UploadType } from './uploadDataModalTour'; |
There was a problem hiding this comment.
This file imports ColumnsType from antd/es/table, which goes against the repo’s frontend convention to avoid direct Ant Design imports (and @superset-ui/core/components already re-exports ColumnsType via its Table barrel). Prefer importing the type from @superset-ui/core/components to keep imports consistent and reduce coupling to antd internals.
| async function parseExcel( | ||
| file: File, | ||
| sheetName?: string, | ||
| ): Promise<{ columns: string[]; data: Record<string, string>[] }> { | ||
| const XLSX = (await import(/* webpackChunkName: "xlsx" */ 'xlsx')).default; | ||
| const arrayBuffer = await file.arrayBuffer(); | ||
| const workbook = XLSX.read(arrayBuffer, { type: 'array' }); | ||
| const sheet = sheetName | ||
| ? workbook.Sheets[sheetName] | ||
| : workbook.Sheets[workbook.SheetNames[0]]; | ||
| if (!sheet) { | ||
| return { columns: [], data: [] }; | ||
| } | ||
| const jsonData = XLSX.utils.sheet_to_json<unknown[]>(sheet, { | ||
| header: 1, | ||
| defval: '', | ||
| }) as unknown[][]; | ||
| if (jsonData.length === 0) { | ||
| return { columns: [], data: [] }; | ||
| } | ||
| const headerRow = jsonData[0] as (string | number)[]; | ||
| const columns = headerRow.map((c, i) => String(c ?? '').trim() || `col_${i}`); | ||
| const dataRows = jsonData.slice(1, PREVIEW_ROW_LIMIT + 1) as ( | ||
| | string | ||
| | number | ||
| )[][]; | ||
| const data = dataRows.map(row => { | ||
| const obj: Record<string, string> = {}; | ||
| columns.forEach((col, i) => { | ||
| obj[col] = String(row[i] ?? ''); | ||
| }); | ||
| return obj; | ||
| }); | ||
| return { columns, data }; | ||
| } |
There was a problem hiding this comment.
parseExcel (and the dynamic xlsx import path) isn’t exercised by tests here, even though Excel preview is a key part of the feature. Add a unit test that mocks the xlsx module and verifies sheet selection + row/column extraction so regressions are caught.
| const onChangeFile = async (info: UploadChangeParam<any>) => { | ||
| setFileList([ | ||
| const newFileList = [ | ||
| { | ||
| ...info.file, | ||
| status: 'done', | ||
| status: 'done' as const, | ||
| }, | ||
| ]); | ||
| ]; | ||
| setFileList(newFileList); | ||
| fileListRef.current = newFileList; | ||
| if (info.file.originFileObj) { | ||
| form.validateFields(['file']).catch(() => {}); | ||
| } | ||
| if (!previewUploadedFile) { | ||
| return; | ||
| } | ||
| if (isFileTooLargeForPreview(info.file.originFileObj)) { | ||
| setColumns([]); | ||
| return; | ||
| } | ||
| await loadFileMetadata(info.file.originFileObj); | ||
| }; |
There was a problem hiding this comment.
onChangeFile awaits loadFileMetadata but there’s no protection against out-of-order responses if the user selects/removes another file quickly. A slow metadata request for a previous file can still call setColumns/setSheetNames later, potentially showing columns for the wrong file (and this is more noticeable now when large files skip metadata calls). Consider tracking the active file UID/request id (or aborting prior requests) and ignoring stale responses.
|
🎪 Showtime deployed environment on GHA for c752309 • Environment: http://52.12.205.9:8080 (admin/admin) |
- theming.mdx: document brandAppName theme token (PR #37370) — controls app name in browser title/nav/emails, takes precedence over APP_NAME config - cache.mdx: document SUPERSET_CACHE_WARMUP_USER config key (PR #38449) — controls the user account Selenium WebDriver authenticates as for thumbnail rendering and cache warmup; update selenium → Selenium capitalization - security.mdx: document missing SQL Lab RBAC permissions (PR #36263) — can_estimate_query_cost and can_format_sql must be explicitly granted - sql-templating.mdx: document Jinja support in calculated columns (PR #37791) with examples; add tip that "Format SQL" is Jinja-aware and dialect-specific (PRs #36277, #39393) - creating-your-first-dashboard.mdx: document dashboard tab URLs (#38660), auto-refresh (#37459), "Last queried at" timestamp (#36934), and tab selection when saving charts to dashboards (#36332) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- theming.mdx: document brandAppName theme token (PR #37370) — controls app name in browser title/nav/emails, takes precedence over APP_NAME config - cache.mdx: document SUPERSET_CACHE_WARMUP_USER config key (PR #38449) — controls the user account Selenium WebDriver authenticates as for thumbnail rendering and cache warmup; update selenium → Selenium capitalization - security.mdx: document missing SQL Lab RBAC permissions (PR #36263) — can_estimate_query_cost and can_format_sql must be explicitly granted - sql-templating.mdx: document Jinja support in calculated columns (PR #37791) with examples; add tip that "Format SQL" is Jinja-aware and dialect-specific (PRs #36277, #39393) - creating-your-first-dashboard.mdx: document dashboard tab URLs (#38660), auto-refresh (#37459), "Last queried at" timestamp (#36934), and tab selection when saving charts to dashboards (#36332) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SUMMARY
Improve file uploader UI and UX
Added the following:
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
AFTER
preset-hackathon-after-1.mov
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION