Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: compile-time instrumentation and -toolexec #69887

Open
RomainMuller opened this issue Oct 15, 2024 · 3 comments
Open

proposal: cmd/go: compile-time instrumentation and -toolexec #69887

RomainMuller opened this issue Oct 15, 2024 · 3 comments
Milestone

Comments

@RomainMuller
Copy link

RomainMuller commented Oct 15, 2024

Proposal Details

Following up on #41145 (comment), this issue aims to explain what we're doing at https://github.com/DataDog/orchestrion, why we're doing it this way, the challenges we face, and some of the toolchain evolution directions we'd like to explore.

Context

Users are embracing observability more and more, as the growing success of OpenTelemetry demonstrates; but onboarding Go applications to observability requires manually instrumenting the whole codebase, making it a tedious/expensive endeavor especially when onboarding existing large applications.

In go, everything is explicit. If some behavior happens, there is code describing it. No hidden magic, no hidden cost. This is a great property to have, but this means instrumentation can be distracting as it may take up a lot of space in the codebase (often multiple lines to start a span with several tags, and then to finish the span, possibly capturing error information & occasionally additional tags), which can result in obscuring the business logic. It also makes it easy for a developer to forget using instrumentation; resulting in observability blind spots (or in the case of security instrumentation, in leaving vulnerable code available to exploit).

Customers want to be able to onboard observability with as little friction as possible, and many prefer observability to stay out of sight when building application logic. The interest in approaches such as eBPF is a testament to the desire of having "no-code-change" observability; but these approaches are inherently limited to certain platforms, incur a performance toll that is not acceptable in certain situations, and limited (for good reasons!) in capabilities.

In addition to this, many enterprises haven't fully embraced the DevOps (and even fewer the DevSecOps) paradigm, and have different organizations take charge of developing, operating, and securing software products. There is often tension between the goals/KPIs of these different organizations (schematizing a little bit):

  • Developers are measured on how fast they deliver business logic
  • Operators are measured on how efficient and reliable the products operate
  • Security teams are measured on how secure the application is

In effect this means developers don't want to be encumbered by observability (or security) if they can avoid it; operators and security teams want to be able to deliver their goals without getting in the way of developers. These give an advantage to instrumentation techniques that do not require code modification; as they minimize impact on developer productivity while allowing to maximize coverage.

Other use-cases for compile-time source code re-writing have recently gotten some visibility:

Orchestrion

We have built orchestrion as a tool targeted to Ops and SecOps personas, but that can also fit within a Dev/DevOps/DevSecOps workflow. It uses -toolexec to intercept all invocations to go tool compile, re-writing all the .go files to add instrumentation everywhere possible.

Orchestrion can in some ways be seen as a compile-time-woven Aspect-oriented Programming (AoP) framework; except it is currently fairly specialized to our instrumentation needs.

Challenges we had to address

Adding dependency edges

One of the consequences of re-writing source code to inject instrumentation calls is that we often need to introduce new dependencies to the package being built. A package that is built against github.com/gorilla/mux once instrumented as a new dependency edge to gopkg.in/DataDog/dd-trace-go/contrib/gorilla/mux to support the instrumentation. In most cases these new dependency edges support additional import clauses, which is ideal as it allows the compiler to type-check all parts of the re-written file. To support the compiler, Orchestrion modifies the importcfg file to register mappings between import paths and the corresponding export file, as resolved by go list (via packages.Load).

In some situations, Orchestrion needs to use //go:linkname (in the "handshake") form in order to avoid creating circular import dependencies. In those cases, the produced archive has an implied dependency on some other package; which must be made available at link-time. We embed a special text file within the produced archive to track those "link-time" dependencies, so that the requirement is persisted in GOCACHE. We then intercept invocations of go tool link and update the importcfg.link file, adding all link-time dependencies to it (again, resolved by go list via packages.Load).

Resolving additional dependencies

In order to correctly resolve the new dependency edges, we have to use packages.Load (go list), and forward the appropriate BuildFlags, such as -cover, -covermode and -coverpkg. Today, the toolchain does not provide any visibility on the full build arguments and Orchestrion has to crawl the process tree in search for a go build (or go test, go run, etc...) invocation, gather it's arguments list, and do its best at parsing it, so it can forward the right values to child compilations if needed.

This poses challenges because go's standard flags are fairly permissive in terms of style and syntax (e.g, it transparently allows adding an extra - ahead of a flag, which allows using the -flag or --flag styles interchangeably), and there is no way to know for sure what flags have a value or are simple boolean toggles.

We also need to re-implement some default argument behavior; specifically there is an implied -coverpkg argument value if one is not provided explicitly, that we need to make explicit when triggering child builds, as failure to do so typically leads to a fingerprint mismatch at link time.

In addition to this, our need to resolve additional packages during the build means we are going to trigger more compilation processes than desired (by the user); because we have no way to co-operate with the toolchain to honor the -p n flag provided by the user.

Since multiple packages may introduce the same synthetic dependency, we often end up needing to resolve the same package multiple times in parallel, which is a waste of resources. To address this, we had to implement a job server, which is a process that exists for the duration of the build, and which is responsible for resolving those dependencies, ensuring a given dependency is resolved exactly once. That process can also check that we are not introducing cyclic dependencies; and "cleanly" aborts the build if it detects a cycle, instead of endlessly recursing over itself.

Parsing tool arguments

Orchestrion also needs to parse arguments passed to compile and link commands, but there is no programmatic way to get these in a structured way. We have to resort to parsing the help output of these commands to infer what arguments have value (or don't) and hope for the best.

In the case of link there isn't much we care about except for accessing the -importcfg flag. For compile, we actually need to establish the correct list of positional arguments (they'll be the .go files) without risking a false positive (we had a bug where we simply took all arguments that ended in .go but ran into a fun issue with the github.com/nats-io/nats.go package).

Interactions with GOCACHE

Orchestrion tries to co-operate with the go toolchain as naturally as possible. One aspect of this is integrating with GOCACHE to allow our users to enjoy the performance benefits of incremental builds.

In order to do this, Orchestrion needs to influence the build ID of objects that are built with instrumentation. The only "hook point" available is intercepting the -V=full invocation of all tools, and modifying the string it returns to include an invalidation fragment.

Today, we intercept both the compile -V=full and link -V=full invocations, and append to it a hash that summarizes:

  • The version of orchestrion being used, as changing it may change how aspects are applied, which may produce different instrumentation
  • The complete list of aspects in use, as changing these will produce different instrumentation
  • The complete list of packages aspects might inject, together with their dependencies (another use of packages.Load), as a lot of these are typically not accounted for in the "natural" build ID of any object

The drawback of being able to do this only at the "complete build level" is that it'll result in excessively frequent cache invalidation... Of course, changing orchestrion itself should be treated equivalent to changing compilers/toolchain, and result in complete invalidation... But the aspects & injected dependencies will only affect packages where they are effectively used, which isn't all of them.

For example, we don't do anything in the majority of the standard library (some aspects today target os and runtime), so the "regular" export files would be fine to use for all the rest.

Line information preserving

We want the instrumented code to retain the original code's line information, so that stack traces visible by the user still match the code they wrote. To do this, we sprinkle //line directives around every "synthetic" code.

The go/ast package does not provide much in the way of a facility to manipulate comments or directives, and we use github.com/dave/dst to simplify editing commentary within the source files. This however comes at a significant cost (as it performs several other costly operations we don't necessarily need done).

The injected code can also easily interfere with the debugging experience (via dlv), as the "real" source file including that instrumented code is often no longer present on disk when the program is being run... This becomes more problematic in cases when the original source file is not in "canonical" go format, as the line numbering we infer from the original file may be skewed (because github.com/dave/dst always produces canonical go format output) -- a "source map" (like is done in Node.JS, for example) would probably be somewhat easier to confidently manage.

Toolchain (-toolexec) evolution directions

Some of these may be moonshots, or things that don't fit well with the go design principles; but I'm including them anyway as they might hint at problems in search of a solution, and maybe someone comes up with a more acceptable solution:

  • Dedicated support for source-rewriting tools
    • Also, a simplified onboarding experience (pluggable transformers automatically discovered from go.mod dependencies?)
  • Ability to influence the build id on a per-package basis
  • Ability to add new edges to the build graph
  • Improved go/ast API, in particular w/r/t comment alterations
  • Improved source mapping (that'd make it easier to "stack" transforms, e.g cover + instrumentation)
  • Allow -toolexec to (try to) respect the -p n flag of the overarching build
  • Share overarching build flags with -toolexec, or a way for a -toolexec tool to request extra build artifacts from the overarching build (see also, adding new edges to the build graph)

Other related issues

@gopherbot gopherbot added this to the Proposal milestone Oct 15, 2024
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 15, 2024
@ianlancetaylor
Copy link
Member

CC @matloob @samthanawalla

@y1yang0
Copy link
Contributor

y1yang0 commented Nov 6, 2024

related to #68728

@matloob
Copy link
Contributor

matloob commented Dec 17, 2024

Adding dedicated support for source rewriting tools would result in a significant increase in complexity in the go command, and also in the interface surface area of the go command that we wouldn't be able to easily change. I don't believe the costs are worth it for a feature like this.

I think a use case like this is exactly the kind of thing that would warrant using a more sophisticated build system like Bazel that has features like aspects that would make these sorts of changes to the build graph much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Incoming
Development

No branches or pull requests

5 participants