Skip to content

Conversation

desmondcheongzx
Copy link
Collaborator

@desmondcheongzx desmondcheongzx commented Jul 30, 2025

Changes Made

Our current docs target daft developers instead of users. Let's flip the script.

Here's the build: https://docs.daft.ai/en/desmond-restructure-docs/

@desmondcheongzx desmondcheongzx requested review from kevinzwang and ccmao1130 and removed request for ccmao1130 July 30, 2025 20:33
@github-actions github-actions bot added the docs label Jul 30, 2025
@desmondcheongzx desmondcheongzx marked this pull request as draft July 30, 2025 20:33
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive documentation restructuring to shift focus from targeting Daft developers to targeting end users. The changes reorganize the entire documentation hierarchy with several key transformations:

Structural Reorganization:

  • Consolidates 'I/O' and 'Catalogs' sections into a unified 'Connectors' section, making it easier for users to find information about data sources
  • Introduces a new 'Modalities' section for working with different data types (images, text, JSON, URLs)
  • Creates a 'Running Custom Python Code' section for UDFs, GPU usage, and external APIs
  • Renames 'Advanced' topics to 'Optimization and Debugging', making the content more approachable
  • Moves API reference documentation to a dedicated 'api' directory

User-Focused Improvements:

  • Updates titles to be action-oriented (e.g., 'Apache Iceberg' → 'Reading from and Writing to Apache Iceberg')
  • Adds navigation.expand feature in MkDocs to automatically expand navigation sections
  • Implements comprehensive redirect mappings to maintain backward compatibility
  • Creates consolidated overview pages that combine related functionality

Content Integration:

  • Merges catalog functionality into the connectors documentation under a 'Daft Catalogs' section
  • Creates a new docs/connectors/index.md that provides function tables for all major data sources alongside catalog examples
  • Establishes placeholder files for new sections like docs/modalities/index.md and docs/optimization/index.md

The restructuring follows a logical user journey: quickstart → connectors → data modalities → custom code → scaling → optimization. This aligns with how users typically interact with Daft, starting from basic data connections and progressing to advanced optimization techniques.

Confidence score: 2/5

  • This PR contains several concerning issues that could break the user experience and leave important functionality undocumented
  • Multiple critical documentation files have been completely emptied (GPU, UDFs, Images, Text, JSON, URLs modalities) without replacement content, which will result in broken navigation links and missing guidance for users
  • The S3 Tables connector documentation has code examples with missing imports and inconsistent variable names that could confuse users
  • Files requiring immediate attention: docs/custom-code/gpu.md, docs/custom-code/udfs.md, docs/modalities/images.md, docs/modalities/text.md, docs/modalities/json.md, docs/modalities/urls.md, docs/connectors/s3tables.md

32 files reviewed, 4 comments

Edit Code Review Bot Settings | Greptile

Copy link

codecov bot commented Jul 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.20%. Comparing base (57c86b1) to head (1a0ed43).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4875      +/-   ##
==========================================
+ Coverage   78.95%   79.20%   +0.25%     
==========================================
  Files         893      893              
  Lines      124879   124421     -458     
==========================================
- Hits        98599    98553      -46     
+ Misses      26280    25868     -412     
Files with missing lines Coverage Δ
daft/dataframe/dataframe.py 86.64% <ø> (ø)

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@desmondcheongzx desmondcheongzx marked this pull request as ready for review July 31, 2025 08:44
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive documentation restructuring to transition from developer-focused to user-focused content. The changes reorganize the entire documentation hierarchy, moving content from nested directories (like resources/, migration/) to more accessible top-level locations. Key structural changes include:

  • New navigation structure: The main navigation now prioritizes user workflows with sections like "Data Connectors", "Running Custom Python Code", and "Modalities" (for handling different data types like text, images, JSON)
  • Content consolidation: Benchmark visualizations moved from docs/resources/benchmarks/ to docs/benchmarks/, telemetry docs moved to root level, and Spark Connect API documentation relocated to docs/api/
  • User-focused content: New comprehensive guides for UDFs, image processing, JSON handling, and URL/file operations with practical examples and real-world use cases
  • Marketing integration: Added performance claims and links to blog posts highlighting Daft's competitive advantages
  • Placeholder structure: Created stub files with "User guide coming soon!" messages for sections still in development (custom connectors, GPU documentation, text processing)

The restructuring maintains backward compatibility through redirect mappings in mkdocs.yml and includes quality improvements like fixing broken code examples, correcting grammar issues, and adding missing import statements. The new structure emphasizes Daft's multimodal data processing capabilities and provides clear pathways for users to understand and implement various data workflows.

Confidence score: 4/5

• This PR is generally safe to merge with mostly structural reorganization and content improvements, though some documentation inconsistencies need attention
• The score reflects minor issues with placeholder content linking to incomplete guides, some inconsistent code examples, and undefined functions in documentation samples
• Files needing attention: docs/custom-code/udfs.md (undefined functions and incorrect column references), docs/modalities/json.md (incomplete sections and typos), docs/modalities/images.md (inconsistent syntax examples)

25 files reviewed, 10 comments

Edit Code Review Bot Settings | Greptile

@desmondcheongzx desmondcheongzx enabled auto-merge (squash) July 31, 2025 19:49
@desmondcheongzx
Copy link
Collaborator Author

The content is not completely done, but the structure is there and already much better. So just going to blast ahead and merge it.

@desmondcheongzx desmondcheongzx merged commit 7987455 into main Jul 31, 2025
47 of 48 checks passed
@desmondcheongzx desmondcheongzx deleted the desmond/restructure-docs branch July 31, 2025 21:54
rohitkulshreshtha pushed a commit that referenced this pull request Aug 22, 2025
## Summary
- Fixed broken UDF documentation link in `docs/api/udf.md`
- Changed from `../core_concepts.md#user-defined-functions-udf` to
`../custom-code/udfs.md`

## Context
The `core_concepts.md` → `index.md` redirect added in #4875 causes
anchor links to be lost. Found 17 other anchor links that need similar
fixes:
- `#expressions`
- `#datatypes`
- `#dataframe`
- `#aggregations-and-grouping`
- `#schemas-and-types`
- `#reading-data`
- `#writing-data`
- `#sql`
- `#window-functions`
- `#multimodal-data`
- And others

This PR addresses only the verified UDF link fix. Other fixes to follow.
desmondcheongzx pushed a commit that referenced this pull request Aug 28, 2025
## Summary
- Remove obsolete `docs/core_concepts.md` file
- Remove obsolete `docs/migration/dask_migration.md` file  
- Remove broken anchor link references to core_concepts.md throughout
API documentation
- Update window functions reference to point to working tutorial link

## Context
Fixes broken links caused by #4875. The docs restructuring in #4875 made
core_concepts.md and dask_migration.md obsolete, but left many anchor
link references that no longer work. This PR removes the source files
and cleans up all the broken references.

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Jay-ju pushed a commit to Jay-ju/Daft that referenced this pull request Sep 1, 2025
…tual-Inc#5062)

## Summary
- Remove obsolete `docs/core_concepts.md` file
- Remove obsolete `docs/migration/dask_migration.md` file  
- Remove broken anchor link references to core_concepts.md throughout
API documentation
- Update window functions reference to point to working tutorial link

## Context
Fixes broken links caused by Eventual-Inc#4875. The docs restructuring in Eventual-Inc#4875 made
core_concepts.md and dask_migration.md obsolete, but left many anchor
link references that no longer work. This PR removes the source files
and cleans up all the broken references.

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
venkateshdb pushed a commit to venkateshdb/Daft that referenced this pull request Sep 6, 2025
…tual-Inc#5062)

## Summary
- Remove obsolete `docs/core_concepts.md` file
- Remove obsolete `docs/migration/dask_migration.md` file  
- Remove broken anchor link references to core_concepts.md throughout
API documentation
- Update window functions reference to point to working tutorial link

## Context
Fixes broken links caused by Eventual-Inc#4875. The docs restructuring in Eventual-Inc#4875 made
core_concepts.md and dask_migration.md obsolete, but left many anchor
link references that no longer work. This PR removes the source files
and cleans up all the broken references.

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant