-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(Backup): Add native google cloud storage backup support #7829
Conversation
2854268
to
e075b29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 16 unresolved discussions (waiting on @manishrjain, @rohanprasad, and @vvbalaji-dgraph)
x/handlers.go, line 364 at r1 (raw file):
} func NewGCSHandler(uri *url.URL, creds *MinioCredentials) (gcs *GCS, err error) {
creds
is not being used anywhere in the function. If it is unused, we should remove it.
Rather, we should pass the creds instead of reading from the env
variables.
x/handlers.go, line 399 at r1 (raw file):
gcs.bucket = gcs.client.Bucket(uri.Host) if _, err := gcs.bucket.Attrs(ctx); err != nil {
Is this the idiomatic way to check for the existence of the bucket?
x/handlers.go, line 401 at r1 (raw file):
if _, err := gcs.bucket.Attrs(ctx); err != nil { gcs.client.Close() return nil, errors.Wrapf(err, "while accessing bucket: %s", err)
No need to have err twice,
errors.Wrap(err, "while accessing bucket")
x/handlers.go, line 413 at r1 (raw file):
// GCS used a flat storage and provides an illusion of directories. To create a directory, file // name must be followed by '/'. dir := filepath.Join(gcs.pathName, path, "") + "/"
Use filepath.Separator
instead of '/'.
x/handlers.go, line 414 at r1 (raw file):
// name must be followed by '/'. dir := filepath.Join(gcs.pathName, path, "") + "/" glog.V(2).Infof("Creating dir: %s", dir)
use %q
for paths.
x/handlers.go, line 418 at r1 (raw file):
writer := gcs.bucket.Object(dir).NewWriter(ctx) if err := writer.Close(); err != nil { return errors.Wrapf(err, "while creating directory: %s", err)
No need or err again in wrap message.
x/handlers.go, line 446 at r1 (raw file):
// GCS doesn't has the concept of directories, it emulated the folder behaviour if the path is // suffixed with '/'. absPath += "/"
use filepath.Separator
instead
x/handlers.go, line 455 at r1 (raw file):
return true } else if err != nil { glog.Error(errors.Wrapf(err, "while checking if directory exists: %s", err))
No need to wrap when you are just logging the error. Use glog.Errorf()
and format the error.
x/handlers.go, line 469 at r1 (raw file):
return false } else if err != nil { glog.Error(errors.Wrapf(err, "while checking if file exists: %s", err))
Same here.
x/handlers.go, line 488 at r1 (raw file):
absPath := gcs.JoinPath(path) if len(absPath) != 0 { absPath += "/"
filepath.Separator
x/handlers.go, line 503 at r1 (raw file):
} if err != nil { glog.Error(errors.Wrapf(err, "while listing paths: %s", err))
Same here.
x/handlers.go, line 521 at r1 (raw file):
reader, err := gcs.bucket.Object(gcs.JoinPath(path)).NewReader(ctx) if err != nil { return nil, errors.Wrapf(err, "while reading file: %s", err)
Same here.
x/handlers.go, line 527 at r1 (raw file):
data, err := ioutil.ReadAll(reader) if err != nil { return nil, errors.Wrapf(err, "while reading file: %s", err)
Same here.
x/handlers.go, line 534 at r1 (raw file):
// Rename renames the src file to the destination file. func (gcs *GCS) Rename(src, dst string) error {
Did we check that GCP doesn't provide atomic rename operation? If it does, then we won't need copy + delete.
x/handlers.go, line 541 at r1 (raw file):
if _, err := dstObj.CopierFrom(srcObj).Run(ctx); err != nil { return errors.Wrapf(err, "while renaming file: %s", err)
Same here.
x/handlers.go, line 557 at r1 (raw file):
reader, err := gcs.bucket.Object(gcs.JoinPath(path)).NewReader(ctx) if err != nil { return nil, errors.Wrapf(err, "while reading file: %s", err)
Same here
x/handlers.go, line 364 at r1 (raw file): Previously, ahsanbarkati (Ahsan Barkati) wrote…
For GCP, a credentials.json file is provided which is used for authentication. The file contains keys, project_ids, etc. Are we suggesting getting all the values from creds (via GQL vals) or just the file path? |
x/handlers.go, line 399 at r1 (raw file): Previously, ahsanbarkati (Ahsan Barkati) wrote…
No, this is one of the ways. The other is to iterate over all buckets and check if the name matches. |
x/handlers.go, line 413 at r1 (raw file): Previously, ahsanbarkati (Ahsan Barkati) wrote…
works fine as long as we are on a system with '/' as file separator. For GCS, we need to use '/', as we only officially only support linux we can use filepath.Separator. But anyone building and running this on windows will face issues. |
x/handlers.go, line 534 at r1 (raw file): Previously, ahsanbarkati (Ahsan Barkati) wrote…
Yup, this is straight from the docs. |
x/handlers.go, line 364 at r1 (raw file): Previously, rohanprasad (Rohan Prasad) wrote…
Maybe take both env and from query (if specified, will override the value)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 16 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, and @vvbalaji-dgraph)
x/handlers.go, line 401 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
No need to have err twice,
errors.Wrap(err, "while accessing bucket")
Done.
x/handlers.go, line 414 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
use
%q
for paths.
Done.
x/handlers.go, line 418 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
No need or err again in wrap message.
Done.
x/handlers.go, line 446 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
use
filepath.Separator
instead
Done.
x/handlers.go, line 455 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
No need to wrap when you are just logging the error. Use
glog.Errorf()
and format the error.
Done.
x/handlers.go, line 469 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Same here.
Done.
x/handlers.go, line 488 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
filepath.Separator
Done.
x/handlers.go, line 503 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Same here.
Done.
x/handlers.go, line 521 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Same here.
Done.
x/handlers.go, line 527 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Same here.
Done.
x/handlers.go, line 541 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Same here.
Done.
x/handlers.go, line 557 at r1 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Same here
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some comments. Is it possible to add an integration test for this as we have for minio?
Reviewable status: 0 of 3 files reviewed, 19 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @rohanprasad, and @vvbalaji-dgraph)
go.sum, line 9 at r2 (raw file):
cloud.google.com/go v0.46.3/go.mod h1:a6bKKbmY7er1mI7TEI4lsAkts/mkhTSZK8w33B4RAg0= cloud.google.com/go v0.50.0/go.mod h1:r9sluTvynVuxRIOHXQEHMFffphuXHOMZMycpNR5e6To= cloud.google.com/go v0.52.0/go.mod h1:pXajvRH/6o3+F9jDHZWQ5PbGhn+o8w9qiu/CffaVdO4=
go mod tidy?
x/handlers.go, line 364 at r1 (raw file):
Previously, rohanprasad (Rohan Prasad) wrote…
Maybe take both env and from query (if specified, will override the value)?
We should support taking the values from ENV/file and override it with the passed credentials. Check MinioCredentialsProvider()
. Also, support not using environment variables if shared-cluster
is set to true.
x/handlers.go, line 413 at r1 (raw file):
Previously, rohanprasad (Rohan Prasad) wrote…
works fine as long as we are on a system with '/' as file separator. For GCS, we need to use '/', as we only officially only support linux we can use filepath.Separator. But anyone building and running this on windows will face issues.
/
is the requirement of GCP? Do we have this separator defined in gcp library? if yes, then lets use that, else I would suggest using /
and adding comment there.
x/handlers.go, line 90 at r2 (raw file):
// s3://dgraph.s3.amazonaws.com/dgraph/backups?secure=true // minio://localhost:9000/dgraph?secure=true // file:///tmp/dgraph/backups
Add an example uri
x/handlers.go, line 478 at r2 (raw file):
// JoinPath appends the given path to the root path of the handler. func (gcs *GCS) JoinPath(path string) string { return filepath.Join(gcs.pathName, path)
filepath.Join will use the separator of the system that we are running dgraph on?
go.sum, line 9 at r2 (raw file): Previously, NamanJain8 (Naman Jain) wrote…
This is after go mod tidy. |
x/handlers.go, line 364 at r1 (raw file): Previously, NamanJain8 (Naman Jain) wrote…
+1 for overriding values. Why would we want to disable it in shared-cluster mode? We are overriding it anyway. |
x/handlers.go, line 413 at r1 (raw file): Previously, NamanJain8 (Naman Jain) wrote…
Didn't find any, using a constant instead. |
x/handlers.go, line 478 at r2 (raw file): Previously, NamanJain8 (Naman Jain) wrote…
Yes, (depends on the system we build dgraph) it will use "" for Windows system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 19 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @NamanJain8, and @vvbalaji-dgraph)
x/handlers.go, line 90 at r2 (raw file):
Previously, NamanJain8 (Naman Jain) wrote…
Add an example uri
Done.
There's no provision to add an integration test as of now. Mocking the GCP connection won't give us much benefit because all we are doing is calling those APIs here. Leaving this as a TODO for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test these APIs manually. Few comments, otherwise looks good to me.
Reviewable status: 0 of 3 files reviewed, 23 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @NamanJain8, @rohanprasad, and @vvbalaji-dgraph)
x/handlers.go, line 364 at r3 (raw file):
client *storage.Client bucket *storage.BucketHandle pathName string
Isn't pathName
more like a pathPrefix
. Should we call it pathPrefix
?
x/handlers.go, line 371 at r3 (raw file):
var c *storage.Client if creds.SecretKey != "" || os.Getenv("GOOGLE_APPLICATION_CREDENTIALS") != "" {
I see only the secretKey is being used. Any reason to pass *MinioCredentials
struct? We can just pass the key.
x/handlers.go, line 417 at r3 (raw file):
ctx := context.Background() // GCS used a flat storage and provides an illusion of directories. To create a directory, file
Minor typo: GCS used -> GCS uses
x/handlers.go, line 458 at r3 (raw file):
}) if _, err := it.Next(); err != iterator.Done {
err == iterator.Done
? If there is some other error then also it will return true.
x/handlers.go, line 458 at r3 (raw file): Previously, ahsanbarkati (Ahsan Barkati) wrote…
if err != iterator.Done, then we can surely say that there are items in the buffer and hence directory exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 22 unresolved discussions (waiting on @ahsanbarkati, @manishrjain, @NamanJain8, and @vvbalaji-dgraph)
x/handlers.go, line 364 at r3 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
Isn't
pathName
more like apathPrefix
. Should we call itpathPrefix
?
Done.
x/handlers.go, line 371 at r3 (raw file):
Previously, ahsanbarkati (Ahsan Barkati) wrote…
I see only the secretKey is being used. Any reason to pass
*MinioCredentials
struct? We can just pass the key.
Done.
f686d9f
to
06d4658
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to intrude on your PR - but I believe the handling of GOOGLE_APPLICATION_CREDENTIALS
is already done by by the google bindings automatically with different semantics than you have written here.
There is a case as written where credentials are used if the environment variable is set, and the other case explicitly disables authenticated access. However, this does not allow the case where the default service account of the GCE/GKE node you are running on can be automatically loaded and used:
If the environment variable GOOGLE_APPLICATION_CREDENTIALS isn't set, ADC uses the service account that is attached to the resource that is running your code.
This would be useful for automatically deploying dgraph alphas capable of secure GCS upload, instead of pushing out explicit service account json files, mounting them via helm chart and setting the variable accordingly.
I'm trying to follow how to properly use this as per the comments in this PR but am somewhat struggling.
Any help is greatly appreciated and thank you for adding this functionality! |
You cannot pass your creds in via the GraphQL mutation - you can only pass in the path on the alpha where the service account file is located. This is a bit unfortunate in my view as the helm chart does not allow attaching arbitrary secrets from k8s into the pods, which means the only way to use the chart is to use the machine service account or workload identity. The destination is the only argument you need but forceFull will work as normal to control type. |
Thanks for the response. What I ended up doing is ssh'ing into my After fixing some permissions issues, I got it working. Thanks again to all the contributors 👍 |
Fixes DGRAPH-3340
Currently, we use MinIO to write backup files to CGP storage. To do this we need to start a minio server (with GCP credentials) and trigger a backup with the destination as the minio server. This requires the user to run the minio server every time (or always) when a backup needs to be triggered. Using native SDK will help get rid of the extra service.
Google cloud storage (GCS) requires the user to create Service account key and credentials file. When starting an alpha we can set env variable:
GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json" dgraph alpha
to allow access to GCS. To trigger a backup the user can use the admin endpoint to initiate the backup. The destination should be set as:
"gs://<bucket_name>/path/to/backup"
This change is