-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eng 588 [Server] allow users to upload csvs to demo #82
Eng 588 [Server] allow users to upload csvs to demo #82
Conversation
…om:aqueducthq/aqueduct into eng-588-allow-users-to-upload-csvs-to-demo-be
A small request on PR naming @eunice-chan -- when we're stacking multiple PRs like this, it would be great if we included the system component in the title. Otherwise, it's kinda difficult to tell which one is which. e.g., #83 could be titled "ENG 588: [UI] Allow users to upload CSVs to demo DB" and #82 could be titled "ENG 588: [Server] Add endpoint for accepting and uploading new table". Does that make sense? |
…om:aqueducthq/aqueduct into eng-588-allow-users-to-upload-csvs-to-demo-be
…low-users-to-upload-csvs-to-demo-be
…om:aqueducthq/aqueduct into eng-588-allow-users-to-upload-csvs-to-demo-be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most parts makes sense! I think my main concern is the fact we convert csv content directly to a jobSpec
field and read that field in python. This may not be a reliable solution for two reasons:
- Job specs are passed as a part of command line, and cmd execution usually have an underneath upperbound https://serverfault.com/questions/163371/linux-command-line-character-limit#:~:text=The%20shell%2FOS%20imposed%20limit,131072%20which%20is%20128*1024%20. This bound is generally very large if our job spec only contains 'metadata'. But it's usually comparable to the data size (e.g. 128k in one of the response above), and dependent on how we call the command. So it's better we don't take the risk here to support larger csv files, where it's reasonable to expect users to have ~MB to ~GB level of data.
Alternatively we can store the csv content in some path using store.Put()
, and use the existing load operator to create the table by passing the file path as the upstream artifact.
- Another thing to mention is typically
byte -> string [transmit] string -> byte
conversion is not that reliable when transmitted across systems (e.g. go server and python executor) as the same string could have different encodings. Using base64 encoding / decoding for byte -> string conversion part will be much reliable since base64 produces ASCII only characters which are the same across all systems. Since we will put the content in store, we don't need to worry about this if we use the alternative solution.
…om:aqueducthq/aqueduct into eng-588-allow-users-to-upload-csvs-to-demo-be
…om:aqueducthq/aqueduct into eng-588-allow-users-to-upload-csvs-to-demo-be
contentPath := fmt.Sprintf("create-table-content-%s", args.RequestId) | ||
if err := storage.NewStorage(h.StorageConfig).Put(ctx, contentPath, args.csv); err != nil { | ||
return nil, http.StatusInternalServerError, errors.Wrap(err, "Cannot save CSV.") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a defer storage.Delete(path)
to always delete the file once the upload is completed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, appreciate the fast update! I assumed you've tested your updates? Just left one minor suggestion but I think it's generally good to go!
#82 -- BE API endpoint to create table in demo DB from CSV
#83 -- FE UI to allow upload of CSV to demo DB
#55 -- FE Integrations Details page where this functionality lives