-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There seems to be a problem replicating datasets that are zfs-promoting branches part of bigger tree (rpool/ROOT, zones...) #503
Comments
Examples from a live inspection session of where a replication avoided cleanup (of src and dst) because of this:
Looking at snapshots on current destination - they range from 2019-12-08 to 2019-12-10, including a few manually named snapshots along the way:
On the source the history starts from 2019-12-19, so indeed nothing in common to sync from, right?..
What about the origins?
So the source rootfs dataset historically cloned off a filesystem to update into newer release at "2019-12-19-09:10:26", upgrading from "hipster_2019.10*" to "hipster_2020.04*" and the zfs tree was re-balanced to promote the currently activated rootfs as the owner of all history, inverting the relation of who is a clone of whom (data-wise this is equivalent). The destination dataset is a poor orphan without origins, in fact most of them are (I may have initialized the backup pool by replicating my system without znapzend, so oldest rootfs datasets on backup have proper origins). I suppose whenever znapzend found a new rootfs and had autoCreation enabled, it just made an automatic snapshot and sent it from scratch as the starting point, and rotated since, independently of other rootfs'es on the source pool. Looking at manually named snapshots on the source datasets seems to confirm this guess, the ones expected to be common with "hipster_2019.10*" rootfs source and backup, are now part of "hipster_2020.04*" history in a relation that znapzend currently does not handle:
|
Sending an incremental replication from "the owner of history" to its differently-named clone seems to be a valid operation:
it showed some data read into the buffer, but for the past several minutes it is blinking the destination disk and showing no traffic so I'm a bit lost whether ZFS is doing anything in fact... maybe the kernel is thinking how to handle that... UPDATE: Alas, after 7 minutes it found there is no good way to send from original origin:
indeed:
Makes sense in hindsight: the new name of rootfs started life as a poor orphan... so it has no history either:
|
Probably a general non-disruptive solution is not possible: if we do have a history written, and suddenly a clone is made from an old manually named snapshot that is not present on destination, we might not be able to replicate without rolling back stuff from the trunk. If an even older common point exists, destination could be cloned from that, replicated up to the "origin" of the newly found source clone, and upwards from that point as the history of the new source clone; but this is likely to be fragile and work in some cases at best (which is better than nothing), and something admins should have taken care of in the past of their backup increments to do have those old common snapshots. At the very least, when a new such situation arises, and there are no snapshots on destination newer than the divergence point (at least, none named manually and not via znapzend configured pattern), e.g. after a |
A big data point to design such stuff: here's how the BE layout on that system looks, in destination and source:
|
More data points: Created and activated a new The new one became the owner of rootfs history from the beginning of time, and the origin of snapshots for other rootfs clones including the recently active one:
Similarly for zone roots, with the complication that they only cloned the
The latest (currently activated) ZBE sequentially numbered version is the origin for other clones of this zone root, except the numbers earlier removed (on source) with
|
For the setup of datasets and snapshot names elaborated above, a direct attempt to send new dataset name as an increment from the old seems to work, ZFS discovers where it needs to clone and append automatically, at least for a replication stream (which is what I want, but not necessarily what znapzend users might want if they wish to e.g. disable some child datasets from replication... not sure if that is currently possible, but is a constraint against
I guess the laptop now has a long interesting night ahead... UPDATE: Cool, it can even recognize the increments already present in destination pool, though probably part of some other clone's history currently:
And that is a bit I hate about delays in zfs recv - seen even in a single-command mode (may be worse when everything is in separate commands, so e.g. mbuffers are not as well utilized to fill up on one side while another "thinks" when there are many single-use mbuffer processes):
I think every receive takes a minute procrastinating and then a few seconds doing I/O. Haven't seen any snapshot increment today that would clock under 60s. Grrr... Looking at the consoles, I see it doing zero active I/O on source and destination pools ( UPDATE: It "sped up", now the base time taken for an increment to be received is 42 seconds or so. I remain puzzled. |
So in the end the dried-up routine I've done in shell for the rootfs and zoneroots was:
Here the interesting lines are the last ones in "history of clones":
where "${SRCROOTCONTAINER}/${CURRENTBE}@${NEWEST_SRC_SNAP}" is literally the origin string copy-pasted from the listing above (
but requesting a re-send starting with that snapshot name seems to work, e.g.:
this probably impacts snapshot histories where same-named (but different) snapshots have been made and replicated over time. Note that the clone for "zbe-86" still was not made in the example above. Possibly I'd have to branch it off under such name an earlier snapshot on destination (common with source), will see soon... so far it is quiet for a long time... TODO: When it is all done, check that histories of existing destination datasets remain in place and the new zoneroot did get branched off of an older snapshot. Otherwise maybe making the clone from an ancient point and receiving increments into it more explicitly is the proper way (recursively with children somehow then)?.. |
At least, running a dozen operations like this in parallel keeps the destination pool fairly busy. Although each |
Clone and send did not go too well...
Maybe it should go on with what it did, without a named destination until the stream from UPDATE:
|
Update: the long lags issue raised on illumos IRC; a viable theory is that as my source and destination pools have quite a few datasets (me being me) and hordes of snapshots (largely thanks to znapzend, and to my desire to roll back into considerable past if needed), overall on the order of 1k datasets and 100k snaps, it seems that |
Regarding The benefit for situations like what my backup pool found itself in is that |
One more bit of wizdom from the trenches: not always it happens so that a top-level rootfs snapshot common between source and backup pools exists same-named on both sides for child datasets (e.g. in my systems I have split-off So care has to be taken to choose a consistent set of snapshots for the subtree to pose as a new rootfs name, and then clone and promote each one individually ( Oh, did I mention it can take days with sufficiently many snapshots involved (might want to script only hopping between milestones relevant for other rootfs'es to branch off to remake them as inheriting parts of the same history and ignoring older zfs-auto-snaps), and that to avoid confusion nothing has to appear in these datasets so for the time being |
The IRC discussion of the slow
|
At least, after cloning and promoting target datasets first, and sending a replication stream afterwards, I confirmed this cut the lag between received snapshots from about a minute to about 5 sec - much more bearable! |
something to add to the documentation maybe ? |
At least, I suppose...
I'm locally PoC'ing a (shell) script that would help me automate this
chore, it is currently chewing through my zone roots' older branches (I
cloned and promoted new "history owners" manually like in later posts
above).
I plan to post that as a contrib type of thing that solves a narrow purpose
under right conditions which can be a pretty common usecase still (rolling
backups of rootfs and zoneroots) and as we learn more edge cases (e.g. that
we should not pre-clone a destination branch that is to be cloned from an
existing snapshot by zfs recv), the logic can be rolled into znapzend. So
far most of the script are comments about logic and data samples, for maybe
two dozen actual code lines ;)
Jim
…On Tue, Aug 11, 2020, 13:28 Tobias Oetiker ***@***.***> wrote:
something to add to the documentation maybe ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#503 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMPTFFQOYH4SRPPOGG3S7DSAETMVANCNFSM4PP4UYDA>
.
|
Noted in documentation with #512 and also posted my shell script that addresses part of the problem there as a |
Contrib PoC script for issue #503 and README updates
A bit of good news: there seems to be no special handling required for catching up a cloned rootfs or zbe with child datasets that is NOT a tip to be |
Yet another data point: on Solaris 10u10, while |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
As I was checking how my backups went, I found that replicas of my rootfs related datasets are not branches of each other, but seem to be unique data histories (with no "origin" set in each of them on the backup side; I suppose
--autoCreation
was involved, my configs have it enabled by default). This causes confusion at least when a backup pool is not regularly accessible, and older automatic snapshots get deleted, and there is no common point to resync from (not even among snapshot made by beadm, etc.).This may also waste space on backup pool, writing many copies of rootfs instead of building on top of shared history, but I am not fully sure about that bit (my backup pool claims a high dedup ratio while dedup is off in dataset settings).
In practice this impacts primarily rpool/ROOT/something(/subsets) and pool/zones/zonerootnames/... that are automatically shuffled as a particular rootfs gets activated (so whatever is related to the current rootfs is promoted as the main tree of ZFS "live" data, and older roots are origined at its snapshots), but I suppose other zfs promotions are similarly susceptible.
My guess is that autocreation should check if the newly found dataset has an origin snapshot attribute, and issue a replication going from that point not from scratch. The "origin" key word is not seen in znapzend codebase ;)
** Is it going to appear, e.g. as part of same autocreation run? => re-schedule it to appear first, at least up to this snapshot? (maybe recurse through further origins as needed)
** Does it have some other znapzend policy on source, so can appear from another run? => use that knowledge somehow? or is it a breach of whatever sanity in place? just do a
runonce --inherited
on that other policy, and wait for it to complete?** Can it make sense to send the current (non-"original") dataset in a way similar to that explored for --sinceForced=... from Runonce --since=X made safer (only do dangerous stuff if --sinceForced=X is asked) #497 to make sure that even if it appears as an independent history of incremental snapshots, that history includes the origin snapshot (probably not named by znapzend timestamped patterns) so that if the user later adds that origin to a backed-up policy, it could be re-branched from such point. Common history is common, whoever owns it at the moment.
** Oh, and there are also subsequent zfs promotions shuffling around the current owners of historic snapshots and their consumers who branched off that point...
The text was updated successfully, but these errors were encountered: