Skip to content

Registry for ported WebAssembly libraries

Paul Cooper edited this page Sep 29, 2023 · 7 revisions

Overview

There are basically two solutions for hosting a registry service:

  1. Manage the metadata only. Such a solution is lightweight and separates the code repo with metadata, as there are already many existing services (i.e., GitHub, etc) offering source code management well. Therefore, we can just focus on managing the metadata and retrieve the source code based on specific fields of it. An example of this solution is vcpkg.
  2. Manage the metadata together with code snapshot. Such a solution requires additional storage for code snapshots but will in turn strictly ensure code consistency between publishing to and retrieving back from the registry. Examples of this solution include npm, yarn, etc.

In this section, we here focus on the discussion about the content and format of the metadata. The hosting of the registry service itself is still left as an open.

In practice, we’re also working on creating a PoC of registry service based on Verdaccio, which is an open-sourced lightweight private npm proxy registry. npm is the world's largest software registry and is widely used among developers to share and manage packages. It provides very mature service and powerful tools for package distribution and dependency management. The proposed Wasm library registry metadata is built on top of npm package metadata to reuse some of its powerful functionalities.

Metadata

The metadata for the Wasm registry is complementary to npm package metadata. We’ll inherit fields from npm for project basic stats, such as name, version, description, repository, dependencies, etc. In addition, we also need customized fields for Wasm registry, describing information needed for a successful build of a library, as well as package configurations required by other libraries that depend on it. Below is a mockup for Wasm registry metadata, with both fields inherited from npm registry metadata and customized for Wasm registry. Detailed explanations of each field added for Wasm registry are as below.

10000201000004BB00000800E3B943A46ACD34B8

Field for native library statistics

nativeLibrary describes information about the native library that this project ported from. These are useful when we’re resolving dependencies for a project to determine if multiple ports of the same library exist.

  • name: defines native library name
  • version: defines native library version

Field for toolchain statistics

toolchain describes a toolchain with the version used for building the library, which will be used to detect if there are possible toolchain conflicts and warn users.

Fields for building and using the library

buildTargets describes different build targets (i.e., static, shared, etc.) and corresponding configurations needed to build and use the library. Each key is the name of a build target, and the corresponding value contains three general fields: envs, buildSteps and pkgConfig.

envs defines compiler and linker flags settings via environment variables for build.

buildSteps describes the building process in a semantic manner, using an Array of build step. In each build step:

  • command: defines build command
  • args: defines arguments for build command
  • cwd: defines working directory for the command

pkgConfig describes the library configurations that could be utilized by other libraries that have dependency on it, and serves similar functionality as the pkg-config tool for native libraries. The configurations include library prefix, compiler and linker flags (-I, -L, -l and other Emscripten options), exported APIs and module runtime methods, etc.

  • prefix: defines library install prefix
  • cflags: defines compiler flags to include library header files
  • ldflags: defines linker flags for searching and accessing library files

Dependency management

The dependency management for native projects, as well as for ported Wasm projects, are different from that of Node – we must compile every dependency from source and then link them together in a proper order. Thus, a dedicated dependency resolver is required to resolve the dependency, set up the project sources and carefully control the whole workflow, which is not available from npm.

Versioning

Semantic versioning is a standardized way to number the software releases, explaining the differences of each release and is widely adopted in software development. As our solution is based on npm, we’d also follow semantic versioning rules for the Wasm libraries registered.

Dependency resolution

When multiple packages have a dependency on a common package, the dependency resolver will attempt to ensure that they use the same version of that common package within a semver compatible range.

100002010000038C00000112BDA22E2DE7F397D8

If multiple packages have a common dependency with semver incompatible versions, then the dependency resolver will raise errors as dependency conflicts.

If multiple ports of the same native library exist in the dependency graph, the dependency resolver will also raise errors as dependency conflicts.

1000020100000302000001790E876E60F5FAFE72

Currently we’re strict here to only allow one definition of a symbol in the dependency graph to prevent potential name clashes. In the future, we may explore if there is a possibility to allow multiple versions of the same library running in a single project.

Discussions

1. What can we reuse from npm and what to invent?

We can reuse npm registry API and core libraries for querying and publishing packages and add our implementation of Wasm library specific logics on top of that.

As elaborated in Dependency management section, we need to implement our own version of npm install for dependency management of ported Wasm libraries and projects.

2. What’s the registry service we should use?

Option Pros Cons
1 Reuse npm, maybe with an organization scope (i.e., @webnizer) or a specific keyword (i.e., webnizer) for easily filtering out ported Wasm libraries 1. The service is widely used among developers, which brings large traffic. 2. The service is up and stable, no need to host and maintain by ourselves. Highly depends on npm registry, and mixed entry with non-Wasm ported libraries.
2 Setup a new service for registry, probably based on open-source projects such as Verdaccio Independent entry for Wasm registry, flexible in customizing our own service. 1. Efforts on managing, hosting, and maintaining the service. 2. Limited set of functionalities supported vs. npm registry.

3. How to handle Wasm library specific metadata?

Solution1: Add custom fields directly into package.json and use a single top-level key (not defined, not preserved, and not prefix with _ or $) such as “webnizer” to nest all the custom fields.

10000200000005340000010E27D618A059073940

Using this solution will keep a single metadata source, but all the customized metadata will also be managed by the public registry database.

Solution2: Add a simple field in package.json to mark the package as a webnizer project and save all custom fields separately in .webnizerrc file.

1000020000000534000000847A3DF7AFE1C36188

1000020000000534000001453BA40D6371921395

Using this solution will expose minimal customized metadata to the public registry directly, but we need to make efforts on maintaining the metadata in two separate sources.

4. What is the distribution format of the ported libraries in the registry?

In the primary stage, we plan to distribute the ported library in the format of metadata (probably with source code snapshot), rather than pre-built archive files or shared libraries. Users need to rebuild everything when importing a library from the registry. Although building time extends, this will ensure the total building process passes as smoothly as possible and leave some flexibility for users to customize the build if needed.

5. Which version to use for a common package within a semver compatible range?

For npm and cargo, both will install the latest version (as of the first install time) within the compatible range and save the resolved dependency tree with selected package version in a lock file to ensure exactly the same dependencies will be installed next time.

For vcpkg, which is a C/C++ package manager, will install the oldest version within compatible range to ensure no upgrades are performed automatically when a new version is released.

Both are aiming to ensure the install result is reproducible from over time. We’re fine with both methods and will choose one in the development phase after detailed evaluation.

6. Toolchain version management

Different libraries may be built with different Emscripten toolchain versions when published to the registry. If there are possible toolchain version incompatibilities against the currently installed one in our environment, we’d go with the installed one and raise warnings to users about such incompatibility and potential build failures.