-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/loader: improve and stabilize API #14120
Comments
All packages need to be installed before we can run (some) of the checks and code generators reliably. More precisely, anything that using x/tools/go/loader is fragile (this includes stringer, vet and others). The blocking issue is golang/go#14120; see golang/go#10249 for some more concrete discussion on `stringer` and golang/go#16086 for `vet`.
Alan, any chance of revisiting this early in 1.9? I'd be happy to be a sounding board. I just found that even golang.org/x/tools/cmd/gotype doesn't use go/loader, and it pains me to invoke slow, heavy 'go install' just to get some code typechecked. |
Hi Josh, thanks for the offer. I would like to finish off the loader but none of these changes are on my plate for Google project work in the near future so I can't promise any schedule. Do you have a sense of how much churn loader clients outside x/tools can tolerate as we finalize the API? Since Google vendors everything we're insulated from it, but I got a panel question at GopherCon that was effectively "when will you stop breaking things?". Perhaps we need a mailing list of interested parties. |
I think that churn during a dev cycle is expected. Once it gets frozen, then the Go 1 compatibility promise kicks in. This issue could be the mailing list, for a first pass. Off the top of my head, the non-Google people that come to mind who work a lot with the go/* packages are: @dominikh @fatih @valyala |
Coming back to this in light of the the 1.10 build cache (and The use case is code that does not (yet) fully compile. In particular I have code generators in mind, but the same could apply to tools like @mdempsky's My thinking is that
Thoughts? |
Loading from export data and loading from source give drastically different results. Export data doesn't provide enough information about function bodies, unexported objects, associations with the AST, and so on. People who use go/loader usually depend on it being a source importer, and changing that would break a lot of existing tools such as errcheck, staticcheck and many more. That renders your first option untenable. |
We are (once again) actively working on this and should have something to
share in a week or two. In brief, we are thinking along the following
lines:
The go/build.Package structure encodes too much of the 'go build' way of
organizing projects, so we need a data type that describes a package of Go
source code independent of the underlying build system. It should work
equally well with go build and vgo, and also other build systems such as
bazel & blaze, making it easy to construct analysis tools that work in all
these environments. (Currently, for example, tools such as errcheck and
staticcheck are essentially unavailable to the Go community at Google, and
some of Google's internal tools for Go are unavailable externally.) We
will provide a uniform way to obtain package metadata by querying each of
these build systems, optionally supporting their preferred command-line
notations for packages, so that tools integrate neatly with users' build
environments. The metadata query function will shell out to an external
query tool appropriate to the current workspace.
The go/loader package currently assumes "source all the way down": it
recursively parses, loads and type-checks all dependencies of the initial
packages. Much more efficient is to load source code only for the
requested packages, and to use export data (compiler-generated .a files)
for their dependencies. This may require a build to materialize the .a
files, but now that builds are incremental this cost is amortized. Building
.a files avoids the historical problem of using them: that they often don't
exist, as few people use "go install", and when they do exist they are
often stale.
The new x/tools/go/loader API, which we plan to call x/tools/go/packages,
and eventually contribute to the standard library, will provide a single
Package data type that describes a package's metadata (analogous to
go/build.Package), and optionally its syntax trees and type information, if
type-checking and/or parsing were requested for that package.
…On 24 April 2018 at 11:36, Dominik Honnef ***@***.***> wrote:
Loading from export data and loading from source give drastically
different results. Export data doesn't provide enough information about
function bodies, unexported objects, associations with the AST, and so on.
People who use go/loader <https://goto.google.com/loader> usually depend
on it being a source importer, and changing that would break a lot of
existing tools such as errcheck, staticcheck and many more. That renders
your first option untenable.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#14120 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AICt9DTlLxwjxWMDXQlDsTXyVS5FPRhTks5tr0ZsgaJpZM4HNty4>
.
|
Thanks @alandonovan, great to hear this is being worked on.
This was exactly the thinking behind my second option. Sounds good.
Is the plan still to have @rsc's
This might not be the appropriate place, but I'm going to ask the question in any case and we can move discussion elsewhere as required. I use I make this point because I assume external tooling is going to have to supply this context to I might have totally missed something obvious here... In the (unlikely) event I haven't, none of this is, I suspect, news to you, but I'm guessing/hoping what you share in a couple of weeks will cover this point? Thanks |
@alandonovan I hope you can consider the following points for the design of the go/packages API: For staticcheck I've been working on my own go/loader and my own export data format. Much like your plans, my loader uses export data if available and up to date, and source code otherwise. My export data, however, is different from that of any existing Go toolchain in that it stores a full representation of the AST and all type information, so that the end result is identical to parsing from source. I would hope that your new API would allow me to use my own export data instead of being forced to use that produced by the build system (go, vgo, bazel, …) I've been working on my entirely own loader because the go/loader API lacks certain associations that I need in my tools. Specifically, I need to map between all the different domains of go/token, go/ast, go/build, go/types and go/ssa. I need to be able to map from go/types.Package to loader.Package, from loader.Package to build.Package, from go/ast.File to loader.Package and even from token.File to ast.File. go/loader doesn't maintain some of these mappings (for example it discards go/build.Packages), and relies on expensive operations like PathEnclosingInterval for others. |
Excited to hear this is being worked on. I look forward to reviewing the API, for little things like being able to (optionally?) discover the size of a file prior to reading it. |
Paul:
Exactly. The -deps flag was added primarily so that we could obtain package metadata in a single invocation. (Currently it requires no fewer than three.)
Well observed. Future analysis tools will require the working directory as an input, as the name of the file or directory on which to operate, even if absolute, will no longer be sufficient. Dominik:
Interesting. We already have serial formats for ASTs (Go syntax) and for types (exportdata). Is it significantly more efficient to reconstruct typed ASTs by deserializing a third, combined format, rather than simply re-parsing and re-typechecking? Or is your goal to avoid the dichotomy between packages loaded from export data (which don't have typed syntax trees) and those produced by the type checker, which do?
Yes, the plethora of different entities that are 1:1 with, say, a package, can be confusing, but I'm surprised that performance (not complexity or mere inconvenience) of the operations for converting from one representation to another was a significant problem. Can you point me at any examples where these conversions were so burdensome that they motivated your current work? Josh:
Thanks. That requirement was not on my radar. |
For my tools, I always need the full AST (including comments), as well as the full type information (not just the export data), to compute various information (such as "is this function pure?", "is this function deprecated?" and soon taint analysis). Existing export data doesn't record this information, and even if it did, I continuously need new information computed, which makes it simpler to just serialize full type information + AST, than adding more and more aggregate information. To draw a comparison, imagine guru's
Convenience is certainly one factor. With go/loader, staticcheck computes a lot of the mappings based on the results of go/loader, which is certainly doable. Some of that, however, involves duplicating work, such as going back to disk to find and partially parse go/build packages. |
I'd like to add to the point about custom export formats. I believe that what Dominik wants is to be able to cache extra data on top of the existing export data that gc produces. That may be the detailed syntax tree, or it might be mappings between go/ast and go/types, or it might be other static analysis computations that a linter would like to cache to speed up consecutive runs (such as "is this func pure?"). Would it not be possible to add this to the interface, as a way to attach extra information to each package's cached data? I imagine said extra data could be invalidated with the existing data such as type information. The extra interface could be similar to I'm not sure if that would remove the need to support arbitrary export formats, but at least it seems closer to what most Go tool authors would need. It also seems like it would be easier to do, rather than designing an entirely separate export format. |
@mvdan Maintaining the go/ast <-> go/types mapping is one important aspect of my work, yes. The other, however, is retaining all information about unexported types. |
I pondered for a bit why you said "require the working directory" (emphasis my own) instead of But I'm now struggling to see how the type ImporterFrom interface {
Importer
ImportFrom(path, dir string, mode ImportMode) (*Package, error)
ImportFrom2(path, dir, ctxtDir string, mode ImportMode) (*Package, error)
} The suitably-poorly-named
All of this might well be pre-empting what you're going to share... in which case please just say and I'll hold my tongue for now! |
@alandonovan that sounds very much like what I've always wanted ❤️. However, a recurring pain point for me with the current system has been build tags. I doubt a new system will be able to (or even should) cover all of these concerns but I'll lay them out and hope that at least some will be covered, here or elsewhere: The current tags used are all in build.Context, but not necessarily all in BuildTags so serializing them to pass to a -tags flag when invoking an external command has to be done by hand. It would be nice if there were an easy way to get all the build tags used to load a program as a []string. Conversely, it would be nice to take a []string of tags and the analog of build.Context in the new system and return a version with the appropriate fields set while inheriting everything else appropriately. (Similar in purpose to buildutil.TagsFlag but it sets stuff like GOARCH so that the context can be easily introspected without having to know what all the tags mean). It would also be nice to know, file by file, what build tags from the Context, if any, triggered that file's inclusion. (I want this primarily to list files in a godoc type interface, annotated with why they were included). |
I like the general idea, but I'm concerned that almost none of my projects that currently use go/build will be able to use this new package, because almost all of them require access to source code. For example, one extracts doc comments; another prints file names of dependencies. I'm also concerned about tags - some tools require a coherent source tree that can build; others care more about seeing all source files regardless of tag combination. In the latter group, I see most tools that do package dependency analysis - I generally want to know about all possible dependencies, not just the ones for one particular combination of build tags. FWIW, the heuristic I used in my godeps (dependency locking) tool for tags is to treat any build tag expression as true for known OS and architecture tags even when negated, and fall back to the default tag matching for other tags. (code here) |
Sorry for radio silence; this project is inching along. Please take a look at the API outlined in https://go-review.googlesource.com/c/tools/+/116359. Feedback welcome. dominikh: you should find the new loader's WholeProgram mode gives you a complete source program, just as the current loader does today. But you hint at an interesting problem, that of building modular analyses that pass information derived from inspecting a lower-level package to the analysis of a higher-level one. I've been working on an "Analysis API" for package-at-a-time analyses that allows vet-like checkers to be expressed independent of the driver tool (such as vet, 'go test', 'go build', web-based CI or code review tools, etc). I hope it will enable us to write, share, and re-use checkers across tools more easily. Currently I'm working on a way for each checker to get at facts deduced during checking of lower-level packages such as "function p.F delegates to printf" or "function p.F never returns". Checkers may depend on other checkers both within a unit and across units. Facts must be serializable so they may be passed between units, allowing "separate analysis" analogous to separate compilation. I don't plan to specify the binary encoding; each fact can choose its own. I would love your comments on it (will share link soon). myitcv: the ImporterFrom interface doesn't have a "working directory" parameter but it's trivial to implement it using (in effect) a closure over the loader's working directory (os.Getwd by default). jimmyfrasche, rogpeppe: re: build tags, I have not thought through the implications of build tags yet, but it's on the list. I don't expect we'll provide a mechanism to explain which tags were significant, or which files were excluded from compilation based on tags (or other aspects of configuration) as it appears infeasible to implement on all build systems. A recurring challenge has been the range of tools that use go/build, which defies easy generalization. A tool that runs the type-checker must necessarily follow the compiler's definition of a package---a concrete list of files that make up one compilation unit under a specific configuration---whereas a tool that, say, enumerates the source files of a project needs to generalize across all possible configurations. The new go/packages API biases towards the former, not the latter. (This is problematic even for type-based tools: a refactoring tool must modify all the files of a package, not just the ones for the specific configuration that was analyzed by the type checker.) |
Thanks. Given the details you've shared in the CL, I think this becomes a moot question in any case. |
To add two related questions here:
|
I was hoping for go/packages to be a generalized loader that didn't need to pay attention to specific GOOS and GOARCH or other tags, but I discovered using go/packages in the current API will be impossible. I also discovered that there is a field called "Flags" that tags can be passed through to the underlying tools. (The issue that I filed was #26697 .) This seems like a mistake to me (both counts) and will force packages will be ported over to using like guru and gorename to be forced to ignore files that guarded by OS/ARCH/other tags, which is not what I want the tools to do at all, ever. Perhaps the underlying tools are hard to discover a maximal graph (might be the case with bazel?) Then maybe return E_NOT_SUPPORTED and tools could fall back using "build mode" and setting a specific GOOS/GOARCH/tags. Anyway, I have a specific need that requires the more general mode and I could work on this enabling this if that was desired. If not I'll just solve my case in my own repo. Thanks. |
I think we should close this issue, as discussion on |
We should simplify and stabilize the go/loader package's API. Before that happens, there are a number of features to add and bugs to fix:
Features:
support ... patterns just like "go build" and "go list" do. (Ideally the logic would be shared. Russ suggested fork/execing "go list", but I don't like this approach: it uses the actual file system instead of the abstraction defined by build.Context.)
Allow packages specified on the command line to use cgo. Currently only imported packages are subject to cgo preprocessing.
Support pkg-config.
API simplifications:
The public fields of loader.Program are sometimes annoying to use. Study existing clients and come up with something more convenient.
Update the go/types tutorial at https://github.com/golang/example/tree/master/gotypes.
Once this is done, we should use go/loader in tools such as golint, govet and stringer.
cc: @griesemer
The text was updated successfully, but these errors were encountered: