-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: add debuginfo agent command to dgraph #4464
Conversation
Signed-off-by: fristonio <[email protected]>
Signed-off-by: fristonio <[email protected]>
61d44dd
to
f4868a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some high level comments around the design of the command. And we should consider using github.com/pkg/profile/
and reduce the amount of code that we manage. Can review again once the current comments are addressed.
Reviewable status: 0 of 5 files reviewed, 25 unresolved discussions (waiting on @danielmai, @fristonio, and @manishrjain)
dgraph/cmd/agent/run.go, line 34 at r2 (raw file):
func init() { Agent.Cmd = &cobra.Command{ Use: "agent",
Agent
seems too generic. pprof
or something equivalent sounds better to me.
dgraph/cmd/agent/run.go, line 37 at r2 (raw file):
Short: "Run the Dgraph agent tool", Run: func(cmd *cobra.Command, args []string) { err := cmd.Help()
error can be scoped within the if
statement.
dgraph/cmd/agent/debuginfo/archive.go, line 1 at r2 (raw file):
/*
The archive creation code can be avoided. The user could use tar
gzip
or anything else of her choice directly. That way, we write and manage less code and user gets to have more choices.
dgraph/cmd/agent/debuginfo/pprof.go, line 44 at r2 (raw file):
} var profileTypes = []string{
We should take this as an argument with some suitable default. I wouldn't want to take all the profiles and wait for them every time. We could have default set to everything, but I should be able to choose what profiles are collected through a flag.
dgraph/cmd/agent/debuginfo/pprof.go, line 54 at r2 (raw file):
} func newPprofCollector(host, baseDir, filePrefix string, duration time.Duration) *pprofCollector {
This function can be removed all together and we can directly call Collect
.
dgraph/cmd/agent/debuginfo/pprof.go, line 57 at r2 (raw file):
timeout := duration + duration/2 var transport http.RoundTripper = &http.Transport{
This seems like a constant (variable that can be declared once at the top)
dgraph/cmd/agent/debuginfo/pprof.go, line 85 at r2 (raw file):
src, err := c.saveProfile(pType) if err != nil { glog.Errorf("error while saving pprof profile from %s: %s", src, err)
Not sure, whether it makes sense to just log the error and move on. We should return an error and fail at the caller right away.
dgraph/cmd/agent/debuginfo/run.go, line 39 at r2 (raw file):
var ( DebugInfo x.SubCommand
Do we really need a command and a subcommand? I think building for current scope and having just one command should be enough for now.
dgraph/cmd/agent/debuginfo/run.go, line 58 at r2 (raw file):
flags := DebugInfo.Cmd.Flags() flags.StringVarP(&debugInfoCmd.alphaAddr, "alpha", "a", "", "Address of running dgraph alpha.")
we should assume default address for zero and alpha. That's a more common use case. And taking profiles of Alpha is pretty common than doing the same for Zero. We should have defaults set accordingly.
dgraph/cmd/agent/debuginfo/run.go, line 61 at r2 (raw file):
flags.StringVarP(&debugInfoCmd.zeroAddr, "zero", "z", "", "Address of running dgraph zero.") flags.StringVarP(&debugInfoCmd.directory, "directory", "d", "", "Directory to generate the debuginfo in, if the directory is not present agent will "+
Directory to write the debug info into
is enough I think.
dgraph/cmd/agent/debuginfo/run.go, line 67 at r2 (raw file):
"the dump.") flags.Uint32VarP(&debugInfoCmd.infoDurationSecs, "duration", "s", 15, "Duration to collect the debuginfo for, this is used for info like pprof profiles etc.")
I like what go tool pprof --help
prints
-seconds Duration for time-based profile collection
Could use similar option and description
dgraph/cmd/agent/debuginfo/run.go, line 72 at r2 (raw file):
func collectDebugInfo() (err error) { if debugInfoCmd.directory == "" { debugInfoCmd.directory, err = ioutil.TempDir("/tmp", "dgraph-debuginfo")
we should print the directory in the logs, I don't see it printed right now.
And, I think we should use a local directory for keeping these files. Or eventually move the temp directory into a local directory in case no directory is specified. We should only expect the user to specify a prefix of the path.
dgraph/cmd/agent/debuginfo/run.go, line 104 at r2 (raw file):
} func archiveDebugInfo() error {
I think, there are very popular linux tools available to compress the data. We shouldn't need to write and manage this code. We can just provide a directory to the user, and then the user can choose to archive the directory if need be. If at all, we could print the command for compressing the directory.
Signed-off-by: fristonio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 25 unresolved discussions (waiting on @danielmai, @fristonio, @golangcibot, @mangalaman93, and @manishrjain)
dgraph/cmd/agent/run.go, line 22 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
File is not
gofmt
-ed with-s
(fromgofmt
)
Done.
dgraph/cmd/agent/run.go, line 29 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
field
alphaAddr
is unused (fromunused
)
Done.
dgraph/cmd/agent/run.go, line 30 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
field
zeroAddr
is unused (fromunused
)
Done.
dgraph/cmd/agent/run.go, line 31 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
field
compress
is unused (fromunused
)
Done.
dgraph/cmd/agent/run.go, line 36 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
var
agentCmd
is unused (fromunused
)
Done.
dgraph/cmd/agent/run.go, line 44 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
Error return value of
cmd.Help
is not checked (fromerrcheck
)
Done.
dgraph/cmd/agent/run.go, line 37 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
error can be scoped within the
if
statement.
Done.
dgraph/cmd/agent/debuginfo/pprof.go, line 44 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 102 characters (from
lll
)
Done.
dgraph/cmd/agent/debuginfo/pprof.go, line 131 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 108 characters (from
lll
)
Done.
dgraph/cmd/agent/debuginfo/pprof.go, line 44 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
We should take this as an argument with some suitable default. I wouldn't want to take all the profiles and wait for them every time. We could have default set to everything, but I should be able to choose what profiles are collected through a flag.
Done.
dgraph/cmd/agent/debuginfo/pprof.go, line 54 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
This function can be removed all together and we can directly call
Collect
.
Are you suggesting to remove the struct entirely use a function instead?
dgraph/cmd/agent/debuginfo/pprof.go, line 57 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
This seems like a constant (variable that can be declared once at the top)
We configure the timeout based on duration which is provided to us by the user using a command line flag, I am not sure if this should be a constant.
dgraph/cmd/agent/debuginfo/pprof.go, line 85 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Not sure, whether it makes sense to just log the error and move on. We should return an error and fail at the caller right away.
The command is introduced for debugging purposes, the debuginfo
command as I think should be the best effort. It should collect all the information it can so that user doesn't have to debug the issues with this command itself. What we can do here maybe is to write all the errors in a file along with debuginfo
so that we while debugging know what failed and why. How do you feel about this?
dgraph/cmd/agent/debuginfo/run.go, line 60 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 179 characters (from
lll
)
Done.
dgraph/cmd/agent/debuginfo/run.go, line 61 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 160 characters (from
lll
)
Done.
dgraph/cmd/agent/debuginfo/run.go, line 62 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 159 characters (from
lll
)
Done.
dgraph/cmd/agent/debuginfo/run.go, line 80 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
S1002: should omit comparison to bool constant, can be simplified to
debugInfoCmd.archive
(fromgosimple
)
Done.
dgraph/cmd/agent/debuginfo/run.go, line 58 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
we should assume default address for zero and alpha. That's a more common use case. And taking profiles of Alpha is pretty common than doing the same for Zero. We should have defaults set accordingly.
Ok, so we can assume that if no address is given it is the alpha node we are running debuginfo in and we can get debuginfo from that locally running alpha?
dgraph/cmd/agent/debuginfo/run.go, line 61 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Directory to write the debug info into
is enough I think.
Done.
dgraph/cmd/agent/debuginfo/run.go, line 67 at r2 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
I like what
go tool pprof --help
prints
-seconds Duration for time-based profile collection
Could use similar option and description
Done.
Signed-off-by: fristonio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 31 unresolved discussions (waiting on @danielmai, @fristonio, @golangcibot, and @manishrjain)
dgraph/cmd/debuginfo/pprof.go, line 32 at r3 (raw file):
) var pprofProfileTypes = []string{
This is used in a different file but defined here. Is that intentionally done?
dgraph/cmd/debuginfo/pprof.go, line 43 at r3 (raw file):
func savePprofProfiles(addr, pathPrefix string, duration time.Duration, profiles []string) { u, err := url.Parse(addr)
This seems unnecessary, we should be able to use the address directly. If the address is wrong, it will error out in the process.
dgraph/cmd/debuginfo/pprof.go, line 57 at r3 (raw file):
savePath := fmt.Sprintf("%s%s.gz", pathPrefix, profileType) err := saveProfile(source, savePath, duration)
scope err
dgraph/cmd/debuginfo/pprof.go, line 73 at r3 (raw file):
var resp io.ReadCloser if sourceURL != "" {
What happens when sourceURL
is empty. resp
will be nil and bad things will happen. We can remove this check altogether, let the pprof library handle it.
dgraph/cmd/debuginfo/pprof.go, line 105 at r3 (raw file):
} if resp.StatusCode != http.StatusOK { defer resp.Body.Close()
We are not closing resp.Body
in normal case.
dgraph/cmd/debuginfo/archive.go, line 41 at r3 (raw file):
} func newWalker(baseDir, debugDir string, output tarWriter) *walker {
This function is not required. You can directly use the struct and build an instance of type walker wherever needed.
dgraph/cmd/debuginfo/archive.go, line 52 at r3 (raw file):
if err != nil { glog.Errorf("Error while walking path %s: %s", path, err) return nil
we should log the info
here. Make a note in this function mentioning that we want to get what we can in the archive, and not throw an error.
dgraph/cmd/debuginfo/archive.go, line 76 at r3 (raw file):
// Just get the latest fileInfo to make sure that the size is correctly // when the file is write to tar file fpInfo, err := file.Stat()
info
doesn't have latest info?
dgraph/cmd/debuginfo/archive.go, line 124 at r3 (raw file):
} else if err == nil && info.IsDir() { baseDir = filepath.Base(debugDir) }
What about the else
case?
dgraph/cmd/debuginfo/run.go, line 36 at r3 (raw file):
archive bool directory string infoDurationSecs uint32
just duration
is fine
dgraph/cmd/debuginfo/run.go, line 49 at r3 (raw file):
DebugInfo.Cmd = &cobra.Command{ Use: "debuginfo", Short: "Generate debug info for dgraph on the current node.",
for dgraph
is redundant, could use alpha and zero if at all
dgraph/cmd/debuginfo/run.go, line 51 at r3 (raw file):
Short: "Generate debug info for dgraph on the current node.", Run: func(cmd *cobra.Command, args []string) { err := collectDebugInfo()
you can define err within the if
scope only.
dgraph/cmd/debuginfo/run.go, line 67 at r3 (raw file):
"Directory to write the debug info into.") flags.BoolVarP(&debugInfoCmd.archive, "archive", "x", true, "whether or not to archive the agent info, this could come handy when we need to export "+
whether to archive the generated report
is enough
dgraph/cmd/debuginfo/run.go, line 70 at r3 (raw file):
"the dump.") flags.Uint32VarP(&debugInfoCmd.infoDurationSecs, "seconds", "s", 15, "Duration for time-based profile collection.")
Are we keeping the first letter capital or small? Seems inconsistent.
dgraph/cmd/debuginfo/run.go, line 72 at r3 (raw file):
"Duration for time-based profile collection.") flags.StringSliceVarP(&debugInfoCmd.pprofProfiles, "profiles", "p", pprofProfileTypes, "list of pprof profiles to dump in debuginfo.")
use the word report
or debug info
instead of debuginfo
dgraph/cmd/debuginfo/run.go, line 79 at r3 (raw file):
debugInfoCmd.directory, err = ioutil.TempDir("/tmp", "dgraph-debuginfo") if err != nil { return fmt.Errorf("error while creating temporary directory for debuginfo: %s", err)
for debuginfo
is redundant
dgraph/cmd/debuginfo/run.go, line 98 at r3 (raw file):
func collectPProfProfiles() { var duration time.Duration = time.Duration(debugInfoCmd.infoDurationSecs) * time.Second
duration :=
dgraph/cmd/debuginfo/run.go, line 102 at r3 (raw file):
if debugInfoCmd.alphaAddr != "" { filePrefix := filepath.Join(debugInfoCmd.directory, "alpha_") savePprofProfiles(debugInfoCmd.alphaAddr, filePrefix, duration, debugInfoCmd.pprofProfiles)
saveProfiles
is fine too
dgraph/cmd/debuginfo/run.go, line 119 at r3 (raw file):
glog.Infof("Debuginfo archive successful: %s", archivePath) err = os.RemoveAll(debugInfoCmd.directory)
could scope err
within if
Signed-off-by: fristonio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 31 unresolved discussions (waiting on @danielmai, @fristonio, @golangcibot, @mangalaman93, and @manishrjain)
dgraph/cmd/debuginfo/pprof.go, line 32 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
This is used in a different file but defined here. Is that intentionally done?
I wanted to have all of the logic related to pprof in a single file.
dgraph/cmd/debuginfo/pprof.go, line 43 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
This seems unnecessary, we should be able to use the address directly. If the address is wrong, it will error out in the process.
If the address provided is localhost:8080
this won't work if we don't parse the URL correctly. We do this to make sure we have a valid HTTP URL(if one can be build from the provided argument) once we try to fetch it.
dgraph/cmd/debuginfo/pprof.go, line 57 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
scope
err
Done.
dgraph/cmd/debuginfo/pprof.go, line 73 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
What happens when
sourceURL
is empty.resp
will be nil and bad things will happen. We can remove this check altogether, let the pprof library handle it.
Done.
dgraph/cmd/debuginfo/pprof.go, line 105 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
We are not closing
resp.Body
in normal case.
Yeah, that's because the response is returned to the caller where it is used to copy the content of the Body into the file and then it is closed.
dgraph/cmd/debuginfo/archive.go, line 41 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
This function is not required. You can directly use the struct and build an instance of type walker wherever needed.
Done.
dgraph/cmd/debuginfo/archive.go, line 52 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
we should log the
info
here. Make a note in this function mentioning that we want to get what we can in the archive, and not throw an error.
Done.
dgraph/cmd/debuginfo/archive.go, line 76 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
info
doesn't have latest info?
We fetch it just to make sure we have the most updated file info and the right size, therefore.
dgraph/cmd/debuginfo/archive.go, line 124 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
What about the
else
case?
If the path provided is of a file, then it compresses it to a tar archive with a single file. Won't be a case we will be having anyways.
dgraph/cmd/debuginfo/run.go, line 36 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
just
duration
is fine
Done.
dgraph/cmd/debuginfo/run.go, line 49 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
for dgraph
is redundant, could use alpha and zero if at all
Done.
dgraph/cmd/debuginfo/run.go, line 51 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
you can define err within the
if
scope only.
Done.
dgraph/cmd/debuginfo/run.go, line 67 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
whether to archive the generated report
is enough
Done.
dgraph/cmd/debuginfo/run.go, line 70 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Are we keeping the first letter capital or small? Seems inconsistent.
Done.
dgraph/cmd/debuginfo/run.go, line 72 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
use the word
report
ordebug info
instead ofdebuginfo
Done.
dgraph/cmd/debuginfo/run.go, line 79 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
for debuginfo
is redundant
Done.
dgraph/cmd/debuginfo/run.go, line 102 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
saveProfiles
is fine too
Done.
dgraph/cmd/debuginfo/run.go, line 119 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
could scope
err
withinif
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 15 unresolved discussions (waiting on @danielmai, @fristonio, @golangcibot, and @manishrjain)
dgraph/cmd/debuginfo/archive.go, line 76 at r3 (raw file):
Previously, fristonio (Deepesh Pathak) wrote…
We fetch it just to make sure we have the most updated file info and the right size, therefore.
Do we expect this to change while we are archiving? Did you notice any issues without this code? We can remove it if we don't expect anything to change.
dgraph/cmd/debuginfo/archive.go, line 37 at r4 (raw file):
type walker struct { baseDir, debugDir string
minor: you could declare them one per line. Better indentation.
dgraph/cmd/debuginfo/archive.go, line 43 at r4 (raw file):
// walkPath function is called for each file present within the directory // that walker is processing. The function operates in a best effort manner // and tries to archive whatever it can without throwing an error.
minor: This is probably a bit of stretch but at least read this article https://dave.cheney.net/2019/01/27/eliminate-error-handling-by-eliminating-errors
Signed-off-by: fristonio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 14 unresolved discussions (waiting on @danielmai, @fristonio, @golangcibot, @mangalaman93, and @manishrjain)
dgraph/cmd/debuginfo/archive.go, line 76 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Do we expect this to change while we are archiving? Did you notice any issues without this code? We can remove it if we don't expect anything to change.
Done.
dgraph/cmd/debuginfo/archive.go, line 43 at r4 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
minor: This is probably a bit of stretch but at least read this article https://dave.cheney.net/2019/01/27/eliminate-error-handling-by-eliminating-errors
Ok
(cherry picked from commit a07b292)
Add
agent debuginfo
command to dgraph binary to dump the pprof profiles in a tar.gz file.Usage:
This change is