-
-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Saving the nogo fixes #4102
base: master
Are you sure you want to change the base?
feat: Saving the nogo fixes #4102
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
7b03e4c
to
29f650b
Compare
f506b28
to
c079a7f
Compare
3b49ecb
to
b57424d
Compare
thanks for the comments, i am in the process of addressing them. |
4f41cce
to
f6bab4d
Compare
this shows the log of the latest version: |
47d7bf7
to
ee5bf11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only half way through, posting some comments so far. I will continue later this week
go/tools/builders/nogo_main.go
Outdated
// Otherwise, bazel will complain "not all outputs were created or valid" | ||
change, err := NewChangeFromDiagnostics(diagnostics, pkg.fset) | ||
if err != nil { | ||
errs = append(errs, fmt.Errorf("errors in dumping nogo fix, specifically in converting diagnostics to change %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are errors here, does it make sense to call ToPatches
and SavePatchesToFile
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that was by design, although we may need to discuss about design choices here.
Btw, we need to write to empty string to nogoFixPath when there are errors. See nogo.go (line 100) for similar logic. Otherwise, bazel complains that the declared file is not defined.
Coming back to the design choices, here is another design in comparison:
if nogoFixPath != "" {
// If nogo fixes are requested, save the fixes to the file even if they are empty.
// This prevents Bazel from complaining about missing or invalid outputs.
change, err := NewChangeFromDiagnostics(diagnostics, pkg.fset)
if err != nil {
// Ensure an empty patch file is saved when there's an error in generating the change.
errs = append(errs, fmt.Errorf("error converting diagnostics to change: %v", err))
if saveErr := SavePatchesToFile(nogoFixPath, nil); saveErr != nil {
errs = append(errs, fmt.Errorf("error saving empty patches file: %v", saveErr))
}
} else {
fileToPatch, err := ToPatches(Flatten(*change))
if err != nil {
errs = append(errs, fmt.Errorf("error generating patches: %v", err))
if saveErr := SavePatchesToFile(nogoFixPath, nil); saveErr != nil {
errs = append(errs, fmt.Errorf("error saving empty patches file: %v", saveErr))
}
} else {
if err := SavePatchesToFile(nogoFixPath, fileToPatch); err != nil {
errs = append(errs, fmt.Errorf("error saving patches to file: %v", err))
}
}
}
}
In my current design, when NewChangeFromDiagnostics returns error, the change has partial result of fixes which can be still applied. Let us consider this case:
There are high-quality analyzers that produce great fixes, and there are some poorly written analyzers that produce wrong fixes, e.g., they produced corrupted offsets. Current design still allows the fixes from the well-written analyzers to be applied.
There is one caveat case: the change may already include some edits from the bad analyzer, which have valid offsets but are of poor quality.
In my opinion, the nogo framework should faithfully apply the fixes that are applicable (i.e., those with valid offsets). It should not ban fixes from all analyzers in the case that one analyzer is bad. Also it is the responsibility of the monorepo owners to remove/fix bad analyzers.
Let me know your thoughts,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we adopt my current design, I also updated NewChangeFromDiagnostics to more permissive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ability to let user choose analyzers to trust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, this support of letting user choose has to happen on the patching side, where we show preview of diff and users answer "yes" or "no".
this is now supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, users already select which linters to run in the bazel.
go/tools/builders/nogo_change.go
Outdated
panic("wrong size") | ||
} | ||
|
||
return string(out), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we return bytes instead? The out
is converted to string here and immediately converted back to []byte by its only caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
go/tools/builders/nogo_change.go
Outdated
// The following is about the `Change`, a high-level abstraction of edits. | ||
// Change represents a set of edits to be applied to a set of files. | ||
type Change struct { | ||
AnalyzerToFileToEdits map[string]map[string][]Edit `json:"analyzer_file_to_edits"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need 3 levels of nesting, only to be flattened later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we need this.
the two levels file:edits is required, since we track and apply patch per file.
besides, see the Flatten() function, it considers the cases that different analyzers produce conflicting edits, i.e., edits that overlap with each other. In this case, it is impossible to apply both edits.
we will ignore the latter analyzer (sorted already for determinism) but still allow the former analyzer to proceed.
This is why we add the extra indirection of indexing by analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid overusing maps and create more informative data structure: https://abhinav.github.io/future-proof-packages-2023/#/%EF%B8%8F-map-overuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, especially given this is across package boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ee5bf11
to
23277ea
Compare
go/tools/builders/nogo_change.go
Outdated
|
||
// Trim left | ||
for i := 0; i < len(lines); i++ { | ||
if hasNonWhitespaceCharacter(lines[i]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if hasNonWhitespaceCharacter(lines[i]) { | |
if strings.TrimSpace(lines[i]) == "" { |
this is the same, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer relevant, moved out of rules_go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, simplified.
} | ||
|
||
// LoadPatchesFromFile loads the map[string]string (file paths to patch content) from a JSON file. | ||
// Note LoadPatchesFromFile is used for testing only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test utilities should be in _test.go. Putting here and export it means it's part of public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer relevant, moved out of rules_go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
go/tools/builders/nogo_change.go
Outdated
} | ||
|
||
diff := UnifiedDiff{ | ||
// difflib.SplitLines does not handle well the whitespace at the beginning or the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if difflib.SplitLines
doesn't work well, can we not use it? It's also inefficient: you first read the whole file into the memory and then split it, which doubles the memory usage. You could just read the file line by line with bufio.Scanner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer relevant, moved out of rules_go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SplitLines is inevitable, see another patching lib also does it: https://github.com/sergi/go-diff/blob/master/diffmatchpatch/patch.go#L483.
the problem here is the whitespace prefix or suffix, which is handled by the Trim
go/tools/builders/nogo_main.go
Outdated
// Otherwise, bazel will complain "not all outputs were created or valid" | ||
change, err := NewChangeFromDiagnostics(diagnostics, pkg.fset) | ||
if err != nil { | ||
errs = append(errs, fmt.Errorf("errors in dumping nogo fix, specifically in converting diagnostics to change %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ability to let user choose analyzers to trust
go/tools/builders/nogo_change.go
Outdated
} | ||
|
||
// Flatten takes a Change and returns a map of FileToEdits, merging edits from all analyzers. | ||
func Flatten(change Change) map[string][]Edit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we produce one patch file per analyzer? That way, we don't need to merge edits to the same file from different analyzers. Like you said, some analyzers may produce bad edits. When users apply patches one by one, they can ignore those patched produced by bad analyzers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there will be a single file per target, as created at the .bzl files (the go files only fill in contents).
but still, in that single file, we can index patches with analyzer.
asking users to pick which analyzer to patch may be too much for developers. Ideally we want them simply say "yes/no". Also the conflicts may not happen often, we may ask only in the case of conflicts.
will think further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, the patching is moved out of rules_go,
fyi, it has the support of letting users preview the diff and pick. At the end, it applies the selected edits. In case of overlapping, we skip the first overlapping one for max progress, the next build can fix that one too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asking users to pick which analyzer to patch may be too much for developers.
I don't think so. The output from rules_go can be simply:
nogo: errors found by nogo during build-time code analysis:
src/example.com/foo/compute.go:28:2: self-assignment of x to x (assign)
To apply the fixes run:
$ patch -p1 < bazel-out/k8-fastbuild/bin/src/example.com/foo/go_default_library.nogo.assign.patch
When there are multiple analyzers failing, we can list multiple patch command for users to copy:
To apply the fixes run:
$ patch -p1 < bazel-out/k8-fastbuild/bin/src/example.com/foo/go_default_library.nogo.assign.patch
$ patch -p1 < bazel-out/k8-fastbuild/bin/src/example.com/foo/go_default_library.nogo.composite.patch
I agree with what @fmeum said here: #4102 (comment)
I don't want rules_go to depend on a custom patch tool, while we can use a standard Unix utility.
Ideally we want them simply say "yes/no".
I don't think users' input matters there. When two fixes conflict, there is nothing the tool can do even the user reply "yes", right? The only thing we can do is to reject the patch and ask user to manually apply the fix, which is exactly what the standard patch
command does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline.
agreed that we should use standard tool like 'patch -p1'.
we cannot have multiple patch files, each for one analyzer, since the bazel creates a single file for each target. Also it is not easy to tell the analyzers at bazel resolving time.
what we currently do is:
- for each file, we apply the edits (offset-based) and drop the analyzers that introduce overlapping edits with added edits. Analyzers are sorted for order. After this, we can compute the new content after fixing. Also, we can compute the patch from orig and new content.
- we merge patches for all files into one, so that "patch -p1 < file" works.
users' inputs do not matter, for the analyzers they pick, they may still have overlapping edits that we need to resolve.
addressed comments from @linzhp. made the following design-level change:
This way, we make sure we do not impact rules_go's performance. Accordingly, we shall change the expectation of this PR as follows: |
go/private/actions/archive.bzl
Outdated
# --run_validations (default=True) ensures nogo validation is applied to not only the input targets but also their dependent targets, | ||
# thereby producing available fixes for all targets. | ||
# Otherwise, if we externalize out_nogo_fix_tmp (not going through the ValidateNogo action) by putting it into a field (e.g., `nogo_fix`) in the OutputGroupInfo section of the input targets, | ||
# we can see the fix for the input targets, but will miss the fixes for the dependent targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just how Bazel prints out the output artifacts, but out_nogo_fix_tmp is generated for dependent targets as long as you pass --output_group nogo_fix
to bazel build
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question about output_group vs nogo validation action:
what is the benefit of this output_group option over validation?
also, I feel validation is more flexible, e.g., we can dump fixes immediately after error message there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vim swap file... removed.
go/tools/builders/nogo_change.go
Outdated
|
||
// DiagnosticEntry represents a diagnostic entry with the corresponding analyzer. | ||
type DiagnosticEntry struct { | ||
analysis.Diagnostic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, you just moved an existing struct here. That's fine then, but you should still make this struct private, otherwise people may start using it and we will run into issues mentioned in the link above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
go/tools/builders/nogo_change.go
Outdated
} | ||
|
||
// Flatten takes a Change and returns a map of FileToEdits, merging edits from all analyzers. | ||
func Flatten(change Change) map[string][]Edit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asking users to pick which analyzer to patch may be too much for developers.
I don't think so. The output from rules_go can be simply:
nogo: errors found by nogo during build-time code analysis:
src/example.com/foo/compute.go:28:2: self-assignment of x to x (assign)
To apply the fixes run:
$ patch -p1 < bazel-out/k8-fastbuild/bin/src/example.com/foo/go_default_library.nogo.assign.patch
When there are multiple analyzers failing, we can list multiple patch command for users to copy:
To apply the fixes run:
$ patch -p1 < bazel-out/k8-fastbuild/bin/src/example.com/foo/go_default_library.nogo.assign.patch
$ patch -p1 < bazel-out/k8-fastbuild/bin/src/example.com/foo/go_default_library.nogo.composite.patch
I agree with what @fmeum said here: #4102 (comment)
I don't want rules_go to depend on a custom patch tool, while we can use a standard Unix utility.
Ideally we want them simply say "yes/no".
I don't think users' input matters there. When two fixes conflict, there is nothing the tool can do even the user reply "yes", right? The only thing we can do is to reject the patch and ask user to manually apply the fix, which is exactly what the standard patch
command does.
@linzhp your proposal does not really work: patch is highly unreliable tool that is based on the surrounding context. After you apply the first patch, the 2nd patching may get the context messed up and is not able to proceed. I do not like such unreliable patching. The only reliable way is to perform the precise offset-based editing. This imposes a strict requirement though: the edits users want to apply should not overlap. This means we have to have some tool that is better than patch. Also I think this PR is only for exporting edits, right? this is already an improvement over existing status that no fixes can be exported. |
b4420cf
to
f4775d4
Compare
addressed the comments, tested with locally with
one outstanding question is about output_group vs nogo_validation, I am not sure about advantage of output_group. |
f4775d4
to
fdb823a
Compare
discussed offline and made these improvements:
|
894b2b4
to
fd673f5
Compare
fd673f5
to
8f805be
Compare
go/tools/builders/nogo_change.go
Outdated
return result, nil | ||
} | ||
|
||
func trimWhitespaceHeadAndTail(lines []string) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? The line numbers in the unified diff will be wrong if we remove the empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is because difflib.SplitLines blindly adds an extra "\n" at the end: https://github.com/pmezard/go-difflib/blob/master/difflib/difflib.go#L766C1-L772C2, I am not sure why it does so (does not look like a bug).
some examples:
"line1\nline2\nline3" -> ["line1\n", "line2\n", "line3\n"].
"line1\nline2\nline3\n" -> ["line1\n", "line2\n", "line3\n\n"].
This unintended additional newline (\n\n) can lead to incorrect diff outputs or unexpected behavior in downstream processes that consume these lines.
there may be a better workaround this.
go/tools/builders/nogo_change.go
Outdated
combinedPatch.WriteString("\n") // Ensure separation between file patches | ||
} | ||
|
||
// Remove trailing newline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? The new lines at the end of the patch don't seem harmful. If we don't need to delete the new line, we can pass a io.Writer
into this function and call difflib.WriteUnifiedDiff
instead, so we don't need to hold the patch in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
half way, will handle the rest tmr. |
b00b27a
to
d792536
Compare
What type of PR is this?
What does this PR do? Why is it needed?
nogo analyzers may produce fixes as analysis.Diagnostic. This PR allows rules_go to save the fixes as patches, one per target. The patch and the command to apply the patch is printed out to the terminal for users to manually apply. The patch is also available in the "nogo_fix" output group. This allows people to get patches for all targets without failing the build by passing
--norun_validations --output_groups nogo_fix
.Example output:
Other notes for review
An analyzer may suggest multiple alternative fixes to one issue. Only the first one is selected by default, unless it conflicts with other fixes, in which case it moves on to try the next alternative. If all alternatives are tried but still have conflicts, they will be skipped. In such case, the user will have to apply the patch first, and run nogo again to get the fix to the issue.