-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods #12698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
850daec to
e5feb59
Compare
|
Hmm, I need to fix more docstrings before this is ready. My local testing was unfortunately incomplete. |
e5feb59 to
05919a8
Compare
|
@github-actions crossbow submit -g python |
|
Revision: 05919a88a00341e8b7c02cb6b0b936b470d1db54 Submitted crossbow builds: ursacomputing/crossbow @ actions-1790 |
.github/workflows/docs_light.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
dev/archery/archery/cli.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
@pitrou I don't quite understand this change: |
|
@kszucs As answered in the comments: the problem is not the signature, it's that the Cython-generated docstring doesn't have parameter description, so this would fail the numpydoc rule PR01. |
kszucs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numpydoc validation changes LGTM, but haven't looked at the new docstrings.
|
@amol- @jorisvandenbossche Could you give this a look? |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this nice docstring updates! Looks all good. I added some comments, but nothing that should block this PR to get merged quickly.
python/pyarrow/_compute.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it needs to be changed for this PR, but just a general future note: I think putting only "Expression" (i.e. the type) on this line is valid as well for numpydoc, and personally I find the repeating of the name "is_valid: " not giving any added value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or did numpydoc complain about this? (seeing you changes this also in some existing docstrings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it seems to pass actually. Given that other docstrings used this convention, I just thought it was preferred.
python/pyarrow/_dataset.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a separate JIRA: it might be nice to actually share this part of the docstring in both places (so users don't have to check another object's docstring to see the help), but for now it's good to deduplicate this and have a single up-to-date version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I opened https://issues.apache.org/jira/browse/ARROW-16063
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also asked about this here: #12560 (comment) (I'll put this on Jira)
python/pyarrow/io.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember why this was needed? (which error did it give?)
Aside, we should probably try to instruct sphinx/autodoc/numpydoc to not include __init__ functions in the reference docs, as I don't think we have any case where this adds value (compared to the class docstring). See eg https://arrow.apache.org/docs/dev/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow.parquet.ParquetDataset.__init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that numpydoc tries to match the parameters documented in the BufferReader docstring with its __init__ signature. To be honest, I'm not sure why some classes prefer to define __cinit__ rather than __init__, there doesn't seem to be a consistent convention across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened https://issues.apache.org/jira/browse/ARROW-16123 for this
…thods The numpydoc-validation routine in Archery would skip over many Cython-generated methods and properties.
30b1159 to
dd5c342
Compare
|
Benchmark runs are scheduled for baseline = 3eb5673 and contender = 6ab947b. 6ab947b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
The numpydoc-validation routine in Archery would skip over many Cython-generated methods and properties.
This PR also fixes and enhances the docstrings that would newly raise validation errors.