-
Notifications
You must be signed in to change notification settings - Fork 297
docs: add docstrings to I/O & DataFrame methods (issue Eventual-Inc#4124) #4854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add docstrings to I/O & DataFrame methods (issue Eventual-Inc#4124) #4854
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4854 +/- ##
==========================================
- Coverage 79.39% 79.38% -0.01%
==========================================
Files 896 896
Lines 125771 125771
==========================================
- Hits 99855 99845 -10
- Misses 25916 25926 +10
🚀 New features to boost your workflow:
|
bb87ab9
to
5c69aa1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR adds comprehensive docstrings to DataFrame methods and I/O conversion functions to address issue #4124, which identified missing API documentation components. The changes systematically enhance documentation across two key files:
daft/convert.py: Adds missing docstring examples to from_ray_dataset()
and from_dask_dataframe()
functions. The from_dask_dataframe()
function also receives a missing Returns section. These additions provide concrete usage patterns for users migrating from other data processing frameworks like Ray and Dask.
daft/dataframe/dataframe.py: Adds extensive docstrings to DataFrame methods including explain()
, num_partitions()
, schema()
, __iter__()
, iter_rows()
, to_arrow_iter()
, iter_partitions()
, various write methods (write_parquet()
, write_csv()
, write_json()
, write_iceberg()
, write_deltalake()
, write_lance()
), __getitem__()
, describe()
, summarize()
, distinct()
, filter()
, count_rows()
, concat()
, melt()
, aggregation methods (sum()
, mean()
, min()
, max()
, any_value()
, count()
, agg_list()
, agg_set()
, agg_concat()
), union methods, collect()
, __len__()
, and conversion methods (to_pandas()
, to_arrow()
, to_pydict()
, to_pylist()
, to_torch_map_dataset()
, to_torch_iter_dataset()
, to_ray_dataset()
, to_dask_dataframe()
).
All docstrings follow the standard format with Args, Returns, and Examples sections. The examples demonstrate realistic usage scenarios and show expected output formats, which is essential for a data processing library where users need to understand both method signatures and data transformations. This significantly improves the API documentation completeness and makes the DataFrame API more accessible to contributors and users.
Confidence score: 4/5
- This PR is generally safe to merge as it only adds documentation without changing functionality
- Score reduced due to several formatting issues in examples, incomplete credentials in code samples, and some malformed table formatting that could confuse users
- Files needing attention:
daft/dataframe/dataframe.py
for formatting corrections in examples
2 files reviewed, 5 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build looks good, just have a small formatting edits!
@desmondcheongzx can you quickly review the docstring details?
3c0669a
to
5c69aa1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, thanks for filling in the documentation gaps @TheOphige!
@desmondcheongzx |
## Changes Made Fixes doctests formatter errors introduced in #4854 ## Related Issues #4854 ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) --------- Co-authored-by: Colin Ho <[email protected]>
Changes Made
Added docstrings to all public methods in the
Dataframe
to improve code API documentation. Each docstring includes:These updates aim to make it easier for contributors and users to understand and use the
Dataframe
API effectively.Related Issues
Closes #4124
Checklist
Dataframe
methods)docs/mkdocs.yml
navigation