Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Mar 23, 2022

The numpydoc-validation routine in Archery would skip over many Cython-generated methods and properties.

This PR also fixes and enhances the docstrings that would newly raise validation errors.

@pitrou pitrou marked this pull request as ready for review March 23, 2022 18:40
@github-actions
Copy link

@pitrou pitrou marked this pull request as draft March 23, 2022 19:36
@pitrou pitrou force-pushed the ARROW-15321-numpydoc-cython branch from 850daec to e5feb59 Compare March 23, 2022 19:48
@pitrou
Copy link
Member Author

pitrou commented Mar 23, 2022

Hmm, I need to fix more docstrings before this is ready. My local testing was unfortunately incomplete.

@pitrou pitrou force-pushed the ARROW-15321-numpydoc-cython branch from e5feb59 to 05919a8 Compare March 24, 2022 10:13
@pitrou
Copy link
Member Author

pitrou commented Mar 24, 2022

@github-actions crossbow submit -g python

@github-actions
Copy link

Revision: 05919a88a00341e8b7c02cb6b0b936b470d1db54

Submitted crossbow builds: ursacomputing/crossbow @ actions-1790

Task Status
test-conda-python-3.10 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-0.24 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.7-turbodbc-latest Github Actions
test-conda-python-3.7-turbodbc-master Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-dask-latest Github Actions
test-conda-python-3.9-dask-master Github Actions
test-conda-python-3.9-pandas-master Github Actions
test-conda-python-3.9-spark-master Github Actions
test-debian-11-python-3 Azure
test-fedora-35-python-3 Azure
test-ubuntu-20.04-python-3 Azure

@pitrou pitrou marked this pull request as ready for review March 24, 2022 11:52
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kszucs
Copy link
Member

kszucs commented Mar 24, 2022

@pitrou I don't quite understand this change: inspect_signature already supports cython generated embedded signatures but this PR seems to disable that. On the other hand now it properly retrieves the module of the method, so I assume _get_module does the actual fix?

@pitrou
Copy link
Member Author

pitrou commented Mar 24, 2022

@kszucs As answered in the comments: the problem is not the signature, it's that the Cython-generated docstring doesn't have parameter description, so this would fail the numpydoc rule PR01.

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpydoc validation changes LGTM, but haven't looked at the new docstrings.

@pitrou
Copy link
Member Author

pitrou commented Mar 28, 2022

@amol- @jorisvandenbossche Could you give this a look?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice docstring updates! Looks all good. I added some comments, but nothing that should block this PR to get merged quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it needs to be changed for this PR, but just a general future note: I think putting only "Expression" (i.e. the type) on this line is valid as well for numpydoc, and personally I find the repeating of the name "is_valid: " not giving any added value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or did numpydoc complain about this? (seeing you changes this also in some existing docstrings)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it seems to pass actually. Given that other docstrings used this convention, I just thought it was preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a separate JIRA: it might be nice to actually share this part of the docstring in both places (so users don't have to check another object's docstring to see the help), but for now it's good to deduplicate this and have a single up-to-date version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also asked about this here: #12560 (comment) (I'll put this on Jira)

Comment on lines +1286 to +1287
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember why this was needed? (which error did it give?)

Aside, we should probably try to instruct sphinx/autodoc/numpydoc to not include __init__ functions in the reference docs, as I don't think we have any case where this adds value (compared to the class docstring). See eg https://arrow.apache.org/docs/dev/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow.parquet.ParquetDataset.__init__

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that numpydoc tries to match the parameters documented in the BufferReader docstring with its __init__ signature. To be honest, I'm not sure why some classes prefer to define __cinit__ rather than __init__, there doesn't seem to be a consistent convention across the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou pitrou force-pushed the ARROW-15321-numpydoc-cython branch from 30b1159 to dd5c342 Compare March 28, 2022 15:11
@pitrou pitrou closed this in 6ab947b Mar 28, 2022
@pitrou pitrou deleted the ARROW-15321-numpydoc-cython branch March 28, 2022 16:56
@ursabot
Copy link

ursabot commented Mar 28, 2022

Benchmark runs are scheduled for baseline = 3eb5673 and contender = 6ab947b. 6ab947b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.17% ⬆️0.0%] test-mac-arm
[Failed ⬇️1.07% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️3.11% ⬆️13.36%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants