Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage in R4.0 fixes/improvements/missing parts #2256

Closed
14 tasks done
marmarek opened this issue Aug 17, 2016 · 7 comments
Closed
14 tasks done

Storage in R4.0 fixes/improvements/missing parts #2256

marmarek opened this issue Aug 17, 2016 · 7 comments
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. T: task Type: task. An action item that is neither a bug nor an enhancement.

Comments

@marmarek
Copy link
Member

marmarek commented Aug 17, 2016

Most of those are just minor things, so lets collect them in a single ticket to not trash github-issues repo.

  • qubes-lvm is already running as root, no need to use sudo internally; and BTW it isn't possible to allow non-root user managing LVM: besides access to various files, /dev/mapper/control driver have hardcoded check for CAP_SYS_ADMIN
  • qubes-lvm should be as minimal/fast as possible - it is called multiple times during some time-critical actions (like DispVM startup); for example it should not load qubes.xml, nor even import qubes
  • vm.storage.stop() should not be called in vm.shutdown() - VM is still running at that time (system inside is shutting down), even vm.kill() isn't good idea - VM can always shutdown itself (like when someone run poweroff inside); it needs to be done from some real shutdown event handler. Libvirt 2.2 is going to support something like this (/etc/libvirt/hooks/libxl), but for now all we have is block devices hotplug scripts. Libvirt 2.2 is going to be released somewhere in September, so we can wait for it.
  • ThinPool do not implement resize method
  • qvm-block (or any other tool) do not support resizing volumes; not sure if that should be qvm-block work, but surely it should be available somewhere
  • qvm-block ls list internal VM volumes (even without --internal flag), if VM is created with non-default pool
  • qubes-lvm (or any other storage utility) should not parse volumes content in any way; I see there unused function lvm_image_changed, which calls tune2fs on given volume
  • VMs are attached to "origin" LVM volume, instead of snapshot created for performing changes
  • volatile volumes may be created just before VM startup - no need to create them at VM creation - will be removed anyway
  • ~~~handling extra volumes (not exposed by other VM, but created in some existing pool) is kind of broken: 1) no single API to create such volume (needs manual filling vm.storage.pool, vm.volumes etc), 2) loading such volumes from qubes.xml (mentioned in qubes.xml, but not in initial vm.volume_config) fails~~~ not supposed to work, devices API should be used instead
  • LVM snapshots are mishandled in many ways:
    • "-back" volume is created as snapshot of "new" data, not the old one,
    • "-snap" volumes are inconsistently created/deleted (for example created on domain creation only to be removed just before VM start)
    • "-snap" volume used even when no snapshot on start was requested (volume.snap_on_start=False)
  • qvm-block ls when called without particular VM, list empty metadata for internal volumes: first of all those volumes should be hidden by default (without -i flag), but even when listed, should have domain and volume name filled
  • volumes are not updated when VM template is changed - vm.volumes['root'].source still point at old template's root.
  • VM rename is broken in many ways:
    • volume.vid is updated, but volume.path is not (both file and lvm pools)
    • volume.source is not updated if source.vid is changed (like template rename)

I'll update this list with further issues when found.
/cc @kalkin

@marmarek marmarek added C: core P: major Priority: major. Between "default" and "critical" in severity. T: task Type: task. An action item that is neither a bug nor an enhancement. labels Aug 17, 2016
@marmarek marmarek added this to the Release 4.0 milestone Aug 17, 2016
@marmarek
Copy link
Member Author

As for qubes-lvm performance - half of its running time is used for just python startup + imports. For me it means it shouldn't be a separate script, but a function which calls appropriate tools (through sudo). This means it can't use python lvm API, but it looks it is used only for volume/pool existence checks, so not a big deal to rewrite it to parse lvdisplay -c (or similar) output.

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Aug 18, 2016
It tries to parse (untrusted) volume content, so remove it to not use it
accidentally.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Aug 18, 2016
This script should run as fast as possible, so avoid importing large
module. In fact the only used thing was argparse wrapper, so switch to
the standard one and drop aliases.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Aug 18, 2016
@kalkin
Copy link
Member

kalkin commented Aug 28, 2016

@kalkin
Copy link
Member

kalkin commented Aug 29, 2016

marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Sep 3, 2016
Add 'backenddomain' element when source (not target) domain is not dom0.
Fix XML elemenet name. Actually set volume.domain when listing
VM-exposed devices.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Sep 4, 2016
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Sep 4, 2016
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Sep 4, 2016
marmarek added a commit to marmarek/old-qubes-core-admin that referenced this issue Sep 4, 2016
Add 'backenddomain' element when source (not target) domain is not dom0.
Fix XML elemenet name. Actually set volume.domain when listing
VM-exposed devices.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
...instead of manual copy in python. DD is much faster and when used
with `conv=sparse` it will correctly preserve sparse image.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
The wrapper doesn't do anything else than translating command
parameters, but it's load time is significant (because of python imports
mostly). Since we can't use python lvm API from non-root user anyway,
lets drop the wrapper and call `lvm` directly (or through sudo when
necessary).

This makes VM startup much faster - storage preparation is down from
over 10s to about 3s.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
- add missing lvm remove call when commiting changes
- delay creating volatile image until domain startup (it will be created
  then anyway)
- reset cache only when really changed anything
- attach VM to the volume (snapshot) created for its runtime - to not
  expose changes (for example in root volume) to child VMs until
  shutdown

QubesOS/qubes-issues#2412
QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
There were two: _reset and _reset_volume. Neither of them was working,
but the later was closer. Remove the other one.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
Just calling pool.init_volume isn't enough - a lot of code depends on
additional data loaded into vm.storage object. Provide a convenient
wrapper for this.

At the same time, fix loading extra volumes from qubes.xml - don't fail
on volume not mentioned in initial vm.volume_config.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
There are mutiple cases when snapshots are inconsistently created, for
example:
 - "-back" snapshot created from the "new" data, instead of old one
 - "-snap" created even when volume.snap_on_start=False
 - probably more

Fix this by following volume.snap_on_start and volume.save_on_stop
directly, instead of using abstraction of old volume types.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
In case of LVM (at least), "internal" flag is initialized only when
listing volume attached to given VM, but not when listing them from the
pool. This looks like a limitation (bug?) of pool driver, it looks like
much nicer fix is to handle the flag in qvm-block tool (which list VMs
volumes anyway), than in LVM storage pool driver (which would need to
keep second copy of volumes list - just like file driver).

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Nov 4, 2016
Things like if read-only volume is really read-only, volatile is
volatile etc.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2017
Content should be reset back to base volume at each VM startup.
Disposable VMs depend on this behaviour.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2017
Have dm-snapshot of dm-snapshot. The first layer is to "cache" changes
done by base volume holder (TemplateVM in case of root.img), the second
layer is to hold changes do by snapshot volume holder (AppVM in case of
root.img). In case of Linux VMs the second layer is normally done inside
of VM (original volume is exposed read-only). But this does not work for
non-Linux VMs, orr even Linux but without qubes-specific startup
scripts.

This is first part of the change - actual construction of two layers of
dm-snapshot, not plugged in to core scripts yet.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2017
Use the right cow image and apply the second layer to provide read-write
access. The correct setup is:
 - base image + base cow -> read-only snapshot (base changes "cached"
   until committed)
 - read-only snapshot + VM cow -> read-write snapshot (changes discarded
   after VM shutdown)

This way, even VM without Qubes-specific startup scripts will can
benefit from Template VMs, while VMs with Qubes-specific startup scripts
may still see original root.img content (for possible signature
verification, when storage domain got implemented).

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 14, 2017
Content should be reset back to base volume at each VM startup.
Disposable VMs depend on this behaviour.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Feb 14, 2017
Have dm-snapshot of dm-snapshot. The first layer is to "cache" changes
done by base volume holder (TemplateVM in case of root.img), the second
layer is to hold changes do by snapshot volume holder (AppVM in case of
root.img). In case of Linux VMs the second layer is normally done inside
of VM (original volume is exposed read-only). But this does not work for
non-Linux VMs, orr even Linux but without qubes-specific startup
scripts.

This is first part of the change - actual construction of two layers of
dm-snapshot, not plugged in to core scripts yet.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jul 25, 2017
Do not create it at volume creation time. It it needed only when VM is
running, so create it just before startup only.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jul 25, 2017
 - improve TestPool mock - init_volume now return appropriate mock type,
 instead of TestPool
 - improve patching base directory (/var/lib/qubes) - it is stored in
 more than one place...
 - fix inheritance in TC_01_ThinPool class
 - fix expected LVM volume names ('vm-' prefix)
 - fix cleanup after FilePool tests - remove temporary qubes.xml

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jul 26, 2017
Don't set 'source' volume in various places (each VM class constructor
etc), do it as part of volume initialization. And when it needs to be
re-calculated, call storage.init_volume again.

This code was duplicated, and as usual in such a case, those copies
were different - one have set 'size', the other one not.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jul 26, 2017
Do not create it at volume creation time. It it needed only when VM is
running, so create it just before startup only.

QubesOS/qubes-issues#2256
@jpouellet
Copy link
Contributor

  • pools' import_data() must not naively return the existing backend unchanged and just dd over it because:
    1. A management VM could import a tiny filesystem to the beginning of the disk containing just a .bashrc which reads the rest of the raw blocks and exfiltrates them. This violates the "mgmt should be able to enforce policy but not read data" principle we are aiming for.
    2. Even if the writing VM is not malicious and writes the whole disk, conv=sparse means old blocks could remain in now-zero holes and cause problems. See [Contribution] qubes-incremental-backup-poc OR Wyng backup #858 (comment) for more info
    3. Importing a smaller-utilization filesystem over a larger-utilization one does not reclaim the blocks that are now unused but still allocated in the backend, causing higher long-term disk utilization.

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 17, 2018
Abandon keeping the most recent volume revision on bare VID name - this
leads to various problems with atomic commit (needs to rename/remove old
one before committing new one). Instead, always have volume revision
suffix, even for the newest one. To make absolutely sure that the right
revision is chosen, number revisions sequentially, in addition to
timestamps. This avoid wrong ordering in case of time change (including
time sync problems).
This approach also makes it easier to recover from interrupted operation
(including commit, clone, resize etc), because old revision is kept
there, as normal revision (instead of -tmp or such).

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 17, 2018
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
 - import operation can be aborted, without data loss
 - importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
 - import operation can be reverted - it create separate revision,
similar to start/stop
 - easier to prevent qube from starting during import operation
 - template still can be used when importing new version

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 17, 2018
Abandon keeping the most recent volume revision on bare VID name - this
leads to various problems with atomic commit (needs to rename/remove old
one before committing new one). Instead, always have volume revision
suffix, even for the newest one. To make absolutely sure that the right
revision is chosen, number revisions sequentially, in addition to
timestamps. This avoid wrong ordering in case of time change (including
time sync problems).
This approach also makes it easier to recover from interrupted operation
(including commit, clone, resize etc), because old revision is kept
there, as normal revision (instead of -tmp or such).

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 17, 2018
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
 - import operation can be aborted, without data loss
 - importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
 - import operation can be reverted - it create separate revision,
similar to start/stop
 - easier to prevent qube from starting during import operation
 - template still can be used when importing new version

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 17, 2018
Abandon keeping the most recent volume revision on bare VID name - this
leads to various problems with atomic commit (needs to rename/remove old
one before committing new one). Instead, always have volume revision
suffix, even for the newest one. To make absolutely sure that the right
revision is chosen, number revisions sequentially, in addition to
timestamps. This avoid wrong ordering in case of time change (including
time sync problems).
This approach also makes it easier to recover from interrupted operation
(including commit, clone, resize etc), because old revision is kept
there, as normal revision (instead of -tmp or such).

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 17, 2018
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
 - import operation can be aborted, without data loss
 - importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
 - import operation can be reverted - it create separate revision,
similar to start/stop
 - easier to prevent qube from starting during import operation
 - template still can be used when importing new version

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 22, 2018
Abandon keeping the most recent volume revision on bare VID name - this
leads to various problems with atomic commit (needs to rename/remove old
one before committing new one). Instead, always have volume revision
suffix, even for the newest one. To make absolutely sure that the right
revision is chosen, number revisions sequentially, in addition to
timestamps. This avoid wrong ordering in case of time change (including
time sync problems).
This approach also makes it easier to recover from interrupted operation
(including commit, clone, resize etc), because old revision is kept
there, as normal revision (instead of -tmp or such).

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Mar 22, 2018
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
 - import operation can be aborted, without data loss
 - importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
 - import operation can be reverted - it create separate revision,
similar to start/stop
 - easier to prevent qube from starting during import operation
 - template still can be used when importing new version

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jul 12, 2018
First rename volume to backup revision, regardless of revisions_to_keep,
then rename -snap to current volume. And only then remove backup
revision (if exceed revisions_to_keep). This way even if commit
operation is interrupted, there is still a volume with the data.
This requires also adjusting few functions to actually fallback to most
recent backup revision if the current volume isn't found - create
_vid_current property for this purpose.
Also, use -snap volume for clone operation and commit it normally later.
This makes it safer to interrupt or even revert.

QubesOS/qubes-issues#2256
marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jul 12, 2018
Do not write directly to main volume, instead create temporary volume
and only commit it to the main one when operation is finished. This
solve multiple problems:
 - import operation can be aborted, without data loss
 - importing new data over existing volume will not leave traces of
previous content - especially when importing smaller volume to bigger
one
 - import operation can be reverted - it create separate revision,
similar to start/stop
 - easier to prevent qube from starting during import operation
 - template still can be used when importing new version

QubesOS/qubes-issues#2256
@marmarek
Copy link
Member Author

marmarek commented Sep 4, 2018

All points handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. T: task Type: task. An action item that is neither a bug nor an enhancement.
Projects
None yet
Development

No branches or pull requests

4 participants