RFC: tracking array API compliance #402
Labels
Deployment
Specification deployment (e.g., to a website).
RFC
Request for comments. Feature requests and proposed changes.
This RFC seeks to propose a means for tracking array API compliance.
Overview
Currently, consumers of array libraries lack a centralized mechanism for determining whether any given array API is compliant with the array API specification.
Array libraries have implemented various means for tracking implementation progress:
MXNet
PyTorch
NumPy
CuPy
Dask
(aggregated)
However, surfacing this information to understand how broadly an API is supported and in what version any given API was implemented requires knowing where to look, a significant investment of time and energy, and dogged investigation.
A significant barrier to specification adoption among downstream libraries is not knowing (a) what libraries currently implement any given API and (b) which array library versions are needed in order to access specification-compliant APIs.
This RFC seeks to address this barrier by providing a process for tracking array API specification compliance and making this information publicly available.
Proposal
This RFC proposes an approach similar to that of Web APIs whereby compatibility information is stored in JSON files and made publicly available on the web.
An example JSON file for the `asarray` API.
At a high level, for each API in the array API specification, there would be a corresponding JSON file containing compatibility data for each array library of interest.
The
status
field indicates whether an API is either on a standards track, is experimental and thus subject to change, or deprecated.The
status
field is an object as the contained fields are not mutually exclusive (e.g., an experimental API could be deprecated after failing to gain sufficient traction during the specification process, or a standards track API could be deprecated due to obsolescence and replacement by a new API).The
support
field maps array libraries to an implementation status. If an array library lacks even partial support, its corresponding field value isnull
.For array libraries with partial or full support, the corresponding field value would be an array of objects having the following fields:
version_added: the version in which a specification-compliant API was added for a respective array library.
version_removed: the version in which a specification-compliant API was removed for a respective array library.
status: a status object having the following fields:
boolean
indicating whether an API is exposed under an experimental/preview status and, thus, subject to possible change.boolean
indicating whether an array library has deprecated an API.boolean
indicating whether an API has only been partially implemented by an array library (e.g., partial kwarg support).notes: a string (possibly containing Markdown) for providing additional information concerning implementation status.
The
version_added
andversion_removed
fields are mutually exclusive.As an example, consider the following compliance data for
CuPy
andasarray
.The above indicates that the
asarray
API was implemented in CuPy starting in version10.0.0
, is not exposed on an experimental status, and is only partially implemented. The notes clarify that the partial implementation status is due to thecopy
kwarg having incomplete support.Suppose CuPy subsequently adds complete support for the
copy
kwarg in a subsequent version. In which case, the compliance data would be updated as follows:Notice that the
partial_implementation
flag and the clarifyingnotes
have been removed. By storing the data in an array, we are able to track implementation progress over time.In addition to total API compliance, this RFC proposes to break out support for each optional argument. Using
asarray
as an example,Compliance for optional arguments follows a similar structure as total API compliance. Namely, a special
__compat__
field containing compliance data and astatus
field indicating the status of the API at the standards level.Updating Compliance Data
Array library maintainers are best positioned to know both (a) when an API is implemented and (b) to what extent an API is compliant. Accordingly, array libraries should plan to dedicate a small amount of time updating compliance status for each release.
In the future, we can investigate automating this process. For example, array libraries could include compliance data in their release notes in a machine readable format which we can then use to generate automatic updates.
However, in the absence of such automation, this RFC proposes to rely on maintainers and crowdsourcing for ensuring that compliance data is up-to-date.
This RFC proposes that compliance data be stored in a standalone public Git repository against which contributors (including those outside of the Consortium) may open pull requests fixing or updating compliance entries.
Public Consumption
This RFC proposes to surface compliance data in a human-friendly manner by publishing this data directly in the publicly hosted specification.
The specification for each API should contain a table similar to the following:
In this example table, an individual is able to immediately infer how widely an API is implemented and to what extent implementations are specification-compliant.
For example, we can see that the
asarray
API is available in NumPy under an experimental status and has only partial support for thecopy
kwarg starting in version1.22.0
. CuPy has similar compliance; however, the API is not exposed experimentally. PyTorch has full compliance starting in version1.11.0
. All other libraries currently do not have stable releases exposing a specification-compliantasarray
.Questions
The text was updated successfully, but these errors were encountered: