Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to create python stub files #3268

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

chadrik
Copy link
Contributor

@chadrik chadrik commented Sep 3, 2024

This PR makes it easier to create python stub files, which are used for IDE completion and static type analysis. The direct benefactors of this change are a very niche group, but we're a group that support the community by providing tools that make developing with OpenUSD a more productive and pleasurable experience.

I'm using these changes to create stubs which are published to pypi here. You can peruse the stub files here. For the past year or two I've had to maintain my own fork of OpenUSD to make this possible.

In order to sell you on this change, I'm going to demonstrate that the function signatures in the stub files that I'm creating are superior to those included with docstrings that ship with OpenUSD, and I'd like to plant the idea that this PR is the first step towards an eventual integration of stub generation into the native OpenUSD build process.

Background

As someone who wants very accurate python stubs, I've faced a handful of challenges:

  • Boost does not have native support for generating stubs. In 2019 I created a ticket to add stub generation support to Boost, and while it's gotten some +1's there is still zero movement. There seems to be little hope of side-stepping this issue unless OpenUSD plans to move to pybind11, which apparently does have stubgen support.
  • Using the C++ Doxygen documentation alone does not provide enough information to generate highly accurate stubs. The actual number of python overloads often differs from the number of C++ overloads. A cause of this and numerous other hard-to-solve idiosyncracies is that there are many manual adjustments made in the boost wrappers within this project.
  • Boost does have the ability to generate docstrings with python function signatures, but they are incomplete: many types are simply refered to as object. However, it does give us the actual number of python overloads, and is reliable about certain result types like list and tuple, and this extra information can help fill the gap between live inspection of python objects and the rich information in the C++ docs.

Comparison between python signatures: yours (native) vs mine (cg-stubs)

Overview

As part of its build process, OpenUSD generates docstrings in side-car python files named __DOC.py, which are loaded at import time. In order to make use of these docstrings within your IDE, it must be configured to be able to import the OpenUSD python modules to inspect them.

By comparison, pyi stub files are a lightweight approach to load not only function and class documentation into your IDE but also type information, which is informative, and is used to drive code completion. Additionally, stub files can be used by static type analysis tools like mypy to validate your code. Providing access to stub files within your IDE is typically as simple as running pip install types-usd within your project's virtualenv.

Quality

This is the docstring that OpenUSD generates for the method Usd.ClipsAPI.GetClipAssetPaths:

GetClipAssetPaths(assetPaths, clipSet) -> bool

assetPaths : VtArray[AssetPath]
clipSet : str


List of asset paths to the clips in the clip set named C{clipSet}.


This list is unordered, but elements in this list are referred to by
index in other clip-related fields.


----------------------------------------------------------------------
GetClipAssetPaths(assetPaths) -> bool

assetPaths : VtArray[AssetPath]


This is an overloaded member function, provided for convenience. It
differs from the above function only in what argument(s) it accepts.
This function operates on the default clip set.



UsdClipsAPISetNames

These are my stubs generated for the same methods:

class ClipsAPI(APISchemaBase):
    ...
    @overload
    def GetClipAssetPaths(self, clipSet: str | pxr.Ar.ResolvedPath) -> list[pxr.Sdf.AssetPath]:
        """
        List of asset paths to the clips in the clip set named C{clipSet}.


        This list is unordered, but elements in this list are referred to by
        index in other clip-related fields.
        """
    @overload
    def GetClipAssetPaths(self) -> list[pxr.Sdf.AssetPath]:
        """
        This is an overloaded member function, provided for convenience. It
        differs from the above function only in what argument(s) it accepts.
        This function operates on the default clip set.



        UsdClipsAPISetNames
        """

You'll notice that the docstrings present in yours and mine are equivalent (and share the same whitespace idiosyncracies) because I'm using the same doxygenlib modules to process docstrings.

The primary difference is in the overload signatures. Here's a more direct side-by-side comparsion:

First overload

# yours
def GetClipAssetPaths(assetPaths: VtArray[AssetPath], clipSet: str) -> bool: ...
# mine
def GetClipAssetPaths(self, clipSet: str | pxr.Ar.ResolvedPath) -> list[pxr.Sdf.AssetPath]: ...

Second overload

# yours
def GetClipAssetPaths(assetPaths: VtArray[AssetPath]) -> bool: ...
# mine
def GetClipAssetPaths(self) -> list[pxr.Sdf.AssetPath]: ...

tl;dr The native signatures are wrong in many ways. Below is a table explaining the sources of these differences:

OpenUSD cg-stubs
number of overloads may not be accurate number of overloads always matches those in python
doesn't handle pointers that become results inspects C++ signatures for pointer arguments and converts them to python result when the the boost signatures indicate a match
only converts std::vector and std::sequence also converts std::set, std::unordered_set, std::function, std::map, std::unordered_map, std::optional
doesn't handle implicitly convertible types like Ar.ResolvedPath scans headers for typedef and using statements to substitute these aliases for their actual types
intended for documentation reference intended for documentation reference, IDE code completion, and static type analysis
does not include self or cls args adds self and cls args for methods
not validated validated against actual python code
only visits functions which have matches in Doxygen processes every function in every mdoule

Ok, I'm convinced. What next?

Here's my proposed 3-step plan:

Step 1

Merge this PR.

Step 2

I will contribute type annotations to several open source project which I can use to further validate my stubs, and continue to broadcast the existence of these stubs far and wide to drive adoption. I've already created a PR for the great USD Cookbook repo. Ideally, if Pixar is open to it, I would like to contribute type annotations to OpenUSD itself, for example in Usdviewq. To ease into this, I have a followup to this PR to add annotations to doxygenlib.

Step 3

If we are all in agreement, then the final step would be to move the pyi stub generation process into OpenUSD itself. I'm a pretty busy guy and moving the project into OpenUSD would ensure that it undergoes more regular releases than I can manage. I will be around to help out with maintenance of the generator.

Q & A

What has changed under the hood?

There are two main changes in this PR, both limited to the build process:

  1. OpenUSD python bindings are built with the boost option to include boost-generated signatures.
  2. doxygenlib is updated to hide these signatures from users.

Tf.PreparePythonModule overrides function __doc__ attributes with values from __DOC.py. This behavior has remained unchanged. What has changed is the functions that are included in __DOC.py files during the build process. Prior to this change, cdWriterDocstring omitted entries in __DOC.py for functions which had a docstring provided by Boost, because an existing docstring indicated that it had been manually authored in the C++ wrapper.

After this PR, all functions have docstrings generated by Boost (because of the new Boost signatures), so we have to do a little extra work to determine if the Boost docstrings are just the auto-generated python signatures, or custom docstrings from the wrappers. Now when cdWriterDocstring processes an existing docstring, it removes the Boost signatures and if there's anything left it writes it to __DOC.py.

The upshot is that the additional docstrings make the python lib directory slightly larger, 25MB, or about 0.6% to a full OpenUSD build:

before: 4141 MB
after:  4166 MB

What are the differences for the user?

As mentioned, I've taken great pains to preserve the original behavior as much as possible, however, there are some edge cases.

Functions for which cdWriterDocstring found a C++ match in doxygen will remain the same, because they will recieve an entry in __DOC.py. However, there are functions which cdWriterDocstring never visits and which therefore do not receive an override in __DOC.py. These functions will now have docstrings generated by Boost.

For example, before this PR, the following function had no docstring:

$ python3.9 -c "import pxr.Sdf as Sdf;print(Sdf.ListEditorProxy_SdfNameKeyPolicy.ContainsItemEdit.__doc__)"
None

After this PR, it has a docstring generated by Boost:

$ python3.9 -c "import pxr.Sdf as Sdf;print(Sdf.ListEditorProxy_SdfNameKeyPolicy.ContainsItemEdit.__doc__)"
ContainsItemEdit( (ListEditorProxy_SdfNameKeyPolicy)arg1, (object)item [, (bool)onlyAddOrExplicit=False]) -> bool

Why enable the boost signatures by default?

If we follow through with the 3-step plan above, building with signatures will need to be the default, because we will not want to build one version of OpenUSD with signatures enabled to create the stubs, and then rebuild with signatures disabled.

Why not provide an option to enable the Boost signatures?

I'm open to doing this, but adding an option to enable the Boost signatures adds complexity to the doxygenlib code because it needs to produce parity in output with both scenarios.


I know this is a lot to read, but I thought I'd front load all of the context necessary to make a decision on this PR. Take your time and ask as many questions as you like.

Thanks!

@jesschimein
Copy link

Filed as internal issue #USD-10058

@jesschimein
Copy link

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants