Skip to content

Conversation

TheOphige
Copy link
Contributor

Changes Made

Added docstrings to all public methods in the Dataframe to improve code API documentation. Each docstring includes:

  • Descriptions of method behavior
  • Parameter explanations and types
  • Return value details

These updates aim to make it easier for contributors and users to understand and use the Dataframe API effectively.

Related Issues

Closes #4124

Checklist

  • Documented in API Docs (docstrings added to Dataframe methods)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @ccmao1130 for docs review) @ccmao1130

@github-actions github-actions bot added the docs label Jul 25, 2025
Copy link

codecov bot commented Jul 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.38%. Comparing base (10d1e34) to head (a382716).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4854      +/-   ##
==========================================
- Coverage   79.39%   79.38%   -0.01%     
==========================================
  Files         896      896              
  Lines      125771   125771              
==========================================
- Hits        99855    99845      -10     
- Misses      25916    25926      +10     
Files with missing lines Coverage Δ
daft/convert.py 100.00% <ø> (ø)
daft/dataframe/dataframe.py 86.79% <ø> (ø)

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@TheOphige TheOphige force-pushed the docs/fix-missing-io-dataframe-docs-4124 branch from bb87ab9 to 5c69aa1 Compare July 25, 2025 20:15
@TheOphige TheOphige marked this pull request as ready for review July 31, 2025 06:42
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds comprehensive docstrings to DataFrame methods and I/O conversion functions to address issue #4124, which identified missing API documentation components. The changes systematically enhance documentation across two key files:

daft/convert.py: Adds missing docstring examples to from_ray_dataset() and from_dask_dataframe() functions. The from_dask_dataframe() function also receives a missing Returns section. These additions provide concrete usage patterns for users migrating from other data processing frameworks like Ray and Dask.

daft/dataframe/dataframe.py: Adds extensive docstrings to DataFrame methods including explain(), num_partitions(), schema(), __iter__(), iter_rows(), to_arrow_iter(), iter_partitions(), various write methods (write_parquet(), write_csv(), write_json(), write_iceberg(), write_deltalake(), write_lance()), __getitem__(), describe(), summarize(), distinct(), filter(), count_rows(), concat(), melt(), aggregation methods (sum(), mean(), min(), max(), any_value(), count(), agg_list(), agg_set(), agg_concat()), union methods, collect(), __len__(), and conversion methods (to_pandas(), to_arrow(), to_pydict(), to_pylist(), to_torch_map_dataset(), to_torch_iter_dataset(), to_ray_dataset(), to_dask_dataframe()).

All docstrings follow the standard format with Args, Returns, and Examples sections. The examples demonstrate realistic usage scenarios and show expected output formats, which is essential for a data processing library where users need to understand both method signatures and data transformations. This significantly improves the API documentation completeness and makes the DataFrame API more accessible to contributors and users.

Confidence score: 4/5

  • This PR is generally safe to merge as it only adds documentation without changing functionality
  • Score reduced due to several formatting issues in examples, incomplete credentials in code samples, and some malformed table formatting that could confuse users
  • Files needing attention: daft/dataframe/dataframe.py for formatting corrections in examples

2 files reviewed, 5 comments

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@ccmao1130 ccmao1130 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build looks good, just have a small formatting edits!

@desmondcheongzx can you quickly review the docstring details?

@TheOphige TheOphige force-pushed the docs/fix-missing-io-dataframe-docs-4124 branch from 3c0669a to 5c69aa1 Compare August 2, 2025 06:54
@TheOphige TheOphige requested a review from ccmao1130 August 2, 2025 07:02
Copy link
Collaborator

@desmondcheongzx desmondcheongzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thanks for filling in the documentation gaps @TheOphige!

@desmondcheongzx desmondcheongzx enabled auto-merge (squash) August 5, 2025 19:54
@TheOphige
Copy link
Contributor Author

@desmondcheongzx
Great work!

@desmondcheongzx desmondcheongzx merged commit 4ebe15e into Eventual-Inc:main Aug 5, 2025
43 of 44 checks passed
@rchowell rchowell mentioned this pull request Aug 5, 2025
4 tasks
rchowell added a commit that referenced this pull request Aug 6, 2025
## Changes Made

Fixes doctests formatter errors introduced in #4854 

## Related Issues

#4854 

## Checklist

- [ ] Documented in API Docs (if applicable)
- [ ] Documented in User Guide (if applicable)
- [ ] If adding a new documentation page, doc is added to
`docs/mkdocs.yml` navigation
- [ ] Documentation builds and is formatted properly (tag @/ccmao1130
for docs review)

---------

Co-authored-by: Colin Ho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing docstring items for I/O & DataFrame pages of API Docs
3 participants