-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streams and BLOB escapes for large databases #13
Conversation
|
||
"github.com/JamesStewy/go-mysqldump" | ||
_ "github.com/go-sql-driver/mysql" | ||
"github.com/jamf/go-mysqldump" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few instances of this around to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have to close this PR then and open a new one from my fork then. But I can do that after the rest of this PR and the changes are done. I'd like to see table concurrency as an example but that might be beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with you just leaving changing this to the end just to keep it as one PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no you miss understand me I'd have to close this one and open a new one. This is a change I can't make on the jamf:master
upstream and would have to do it on my fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. If you could make that fork and create a new PR referencing this one that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing this see #14
Hi @BrandonRoehl, thank you very much for your extensive contribution. With regards to the addition of concurrency, I feel that the per table SQL queries (in
Finally, just to confirm, this PR would be a fix for #7 and would replace #9? Thanks, James. |
@JamesStewy This would fix #8, #7 and would replace #9 I though there was a reason I did the concurrency like this for stability because |
How so?
If you where to make that change on line 130 then there would be no need for the second
The writing is already serial in a sense because of the mutex. Also, in regards to concurrency, the current implementation leaves potential errors on line 157 unchecked. Adding SQL calls into the go routines will then also add more potential errors that need to be checked and communicated back to the main routine. |
#8 is for batch inserts that this version writes to the out to stream as soon as it can so you can use https://github.com/machinebox/progress to get progress just like they were intending to monitor progress. The error on line 157 can only happen if the stream is closed or the template is incorrect. Commit coming to check stream errors like in Here is the error doing the changes with line 151 and line 157 where you run the select statements in parallel. Only the write to stream can be done in a coroutine. Error with parallel streams.Its because there are parallel reads from
EDIT: Would like to note that the only way you can run multiple at the same time is by opening parallel connections. This however introduces tons of insatiability. And, depending on the environment and how many open connections the MySQL instance allows total failure with scaling databases. Parallel connections for dump in clustered environments can get separate nodes and retrieve bad data. |
#8 states
To me this reads as when the dump is being restored to the database, not when the dump is being read from the database. As you stated, this PR with the addition of a separate library, provides progress indication when reading from the database (which I like too btw).
I see what you mean. But the docs for
and reading further on suggests that it, by default can create and manage multiple underlying connections. So for one, I don't understand why you are having issues with concurrent SQL calls (I am pretty sure I have done that before), and second, wouldn't the default behaviour of |
Oh I thought that meant "executing the dump" like executing the dump command.
Because GoLang's Now I might still be doing something wrong here. So If you can get it to work consistently I'd have no problem with it, I'd be thrilled, but right now it doesn't and can throw tons of errors it something that can be very mission critical when people will need to rely on these backups to restore too. And the only time you realize a scheduled backup fails is when you need to restore it. The same reason we decided to let it keep trying to write other tables if one fails is so if it can get even one more I consider that a win. |
Okay, looks like the extra concurrency is going to be a lot of work to get right so I am happy to not do it. We can just go back to having the writes be concurrent as you had it when making this PR. |
Closing to reference #13 |
io.Writer
for streamsData
structureos.File
and stream chains