Skip to content

Improve the telemetry of the containers tools #539

@baronfel

Description

@baronfel

Today the .NET CLI collects telemetry during a given build from a variety of data sources, and one part of this telemetry is the PublishProtocol MSBuild Property. We (MS) use this property to track usage rates of the containers feature, as well as other message metadata like

  • if the build is happening in a CI/CD service or as a user-entered command,
  • the OS the build is happening on

This data point works great as long as people are using the containerization features via the dotnet publish -p PublishProfile=DefaultContainer mechanism (or another named profile). However, with the advent of Aspire and support for publishing multiple projects, more and more users are using containers via the more generic target-based mechanism: dotnet publish /t:PublishContainer.

Publishes that use the target-based form never set PublishProtocol to the DefaultContainer value, and so are not reflected in our telemetry. To accurate gauge usage of the feature, we'd like to make explicit containerization telemetry checkpoints that gathers specific data about how the container publish occurred.

Data to gather on success

  • Information about image inference
    • Did inference happen, or did the user manually specify an image?
    • If inference happened, did the user specify ContainerFamily to augment inference?
  • Information about the base image chosen
    • Was the base image one of the Microsoft images? If so, which one, and what tag?
  • Information about the project being published:
    • Is it a console app, a web app, a worker app, or something else?
    • Is it being published self-contained or framework dependent?
    • If it being published invariantly or not?
    • What RID is it being published for?
  • Information about the ContainerRegistry pushed to, if any
    • Can it be categorized into a bucket like: Azure, AWS, GCP, Other?
    • Or was the image pushed to a local binary (Docker vs Podman)
    • Or was the image pushed to a tarball?
  • Information about the way the project is being published
    • Publish Profile
    • Direct target execution
    • Some kind of external tool 'driving' the publish (i.e. azd)

Data to gather on failure

There are a number of places where the operation can fail, and each of them can have different data. At minimum we should be tracking why the failure occurred. Some examples of failure reasons include:

  • incorrect registry Uri
  • no login credentials for a registry that requires authentication
  • invalid login credentials for a registry that requires authentication
  • base image not found on the registry
  • mismatches between the RID of the containerized application and the RIDs that the base image supports
  • unknown/unsupported media types for the images (e.g. we don't fully support Image Indexes yet - this is how we'd track the rate of users hitting that failure mode)
  • timeouts pulling the base image layers from the chosen base image

For remote registries

  • timeouts uploading new image layers
  • errors 'finalizing' a layer (indicates layer digest calculation bugs in our code)
  • errors sending the new image manifest to the registry (indicates mismatches in our layer tracking code)

For local storage (docker/podman)

  • the local binary can't be found
  • the local binary couldn't load the image we created

For tar.gz output

  • ???
  • we have no real reports of errors during creating the tarball so I'm not sure what error modes folks might even have here

Data protection concerns

The information we gather will of course be

  • anonymized,
  • documented exhaustively, and
  • adhere to the telemetry opt-out that the entire .NET CLI participates in.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions