Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ version = "0.1.0"
description = "Local Data Designer unstructured seed reader plugin"
requires-python = ">=3.11"
dependencies = [
"data-designer-engine>=0.5.1,<0.6",
"data-designer-engine>=0.5.4,<0.6",
"pandas>=2,<3",
"pymupdf>=1.24.0",
"pymupdf4llm>=0.0.17",
Expand Down
11 changes: 5 additions & 6 deletions studio/backend/requirements/single-env/data-designer-deps.txt
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
# Data Designer runtime deps installed explicitly (single-env mode).
# DuckDB 1.5 removed Relation.record_batch(); keep <1.5 until upstream ships the fix.
# Synced with data-designer-engine==0.5.4 requirements.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the new comment is accurate, the old comment contained valuable context about the duckdb version constraint that is now lost. To improve maintainability and help future developers understand why this change was safe to make, please consider restoring some of that context. For example:

# Synced with data-designer-engine==0.5.4 requirements.
# duckdb constraint is now >=1.5.0 as upstream fixed the Relation.record_batch() removal.
# Synced with data-designer-engine==0.5.4 requirements.
# duckdb constraint is now >=1.5.0 as upstream fixed the Relation.record_batch() removal.

anyascii<1,>=0.3.3
duckdb<1.5,>=1.1.3
chardet<6,>=3.0.2
duckdb<2,>=1.5.0
faker<21,>=20.1.0
fsspec<2026,>=2025.3.0
httpx<1,>=0.27.2
httpx-retries<1,>=0.4.2
json-repair<1,>=0.48.0
jsonpath-rust-bindings<2,>=1.0
jsonschema<5,>=4.0.0
lxml<7,>=6.0.2
marko<3,>=2.1.2
mcp<2,>=1.26.0
networkx<4,>=3.0
python-json-logger<4,>=3
ruff<1,>=0.14.10
scipy<2,>=1.11.0
sqlfluff<4,>=3.2.0
tiktoken<1,>=0.8.0
Comment on lines 17 to 20

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep unstructured parser deps in single-env requirements

Removing pymupdf, pymupdf4llm, and mammoth from this requirements file breaks PDF/DOCX extraction in single-env installs: the local unstructured plugin is installed with --no-deps (studio/install_python_stack.py step 11), so its pyproject.toml dependencies are not installed transitively. As a result, _extract_text_from_file (studio/backend/routes/data_recipe/seed.py) will hit ModuleNotFoundError for .pdf/.docx uploads and return extraction errors for supported file types.

Useful? React with 👍 / 👎.

pymupdf>=1.24.0
pymupdf4llm>=0.0.17
mammoth>=1.8.0
6 changes: 3 additions & 3 deletions studio/backend/requirements/single-env/data-designer.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Install Data Designer in same env as Unsloth.
data-designer==0.5.2
data-designer-config==0.5.2
data-designer-engine==0.5.2
data-designer==0.5.4
data-designer-config==0.5.4
data-designer-engine==0.5.4
prompt-toolkit>=3,<4
Loading