Skip to content

Add race mitigation lock to vboxwrapper#5571

Merged
AenBleidd merged 6 commits intoBOINC:masterfrom
computezrmle:add_race_mitigation_lock
Apr 24, 2024
Merged

Add race mitigation lock to vboxwrapper#5571
AenBleidd merged 6 commits intoBOINC:masterfrom
computezrmle:add_race_mitigation_lock

Conversation

@computezrmle
Copy link
Contributor

Introduction

Vboxwrapper allows to use VirtualBox multiattach/differencing disk images since version 26206.
Attaching a disk in this mode can't be done in 1 step.
Hence, a workaround suggested by VirtualBox had to be implemented in vboxwrapper and improved.

The required steps (in short):

  1. Create a VM
  2. Attach a virtual disk in normal mode to register it
  3. Handle errors like the "4.0 or later" error ("closemedium"; see below)
  4. Disconnect the disk from the VM but leave it in the registry
  5. Reconnect the disk in multiattach mode ("storageattach")

For more details see:
#4602
#4603

Long term experience with LHC@home shows that errors like these happen more often than expected:

Command: VBoxManage -q storageattach "boinc_086a07090bec38dd" --storagectl "Hard Disk Controller" --port 0 --device 0 --type hdd --mtype multiattach --medium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi"
Exit Code: -2135228409
Output:
VBoxManage: error: Cannot attach medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07_prod.vdi': the media type 'MultiAttach' can only be attached to machines that were created with VirtualBox 4.0 or later
VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component SessionMachine, interface IMachine, callee nsISupports
VBoxManage: error: Context: "AttachDevice(Bstr(pszCtl).raw(), port, device, DeviceType_HardDisk, pMedium2Mount)" at line 781 of file VBoxManageStorageController.cpp
Command:
VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi" 
Output:
VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi' because it has 2 child media
VBoxManage: error: Details: code VBOX_E_OBJECT_IN_USE (0x80bb000c), component MediumWrap, interface IMedium, callee nsISupports
VBoxManage: error: Context: "Close()" at line 1875 of file VBoxManageDisk.cpp

They usually happen when

  • the parent disk is currently not attached to a VM but still registered and
  • multiple fresh VMs start concurrently and try to go through the workaround (step 3 from above)

VirtualBox processes all incoming commands FIFO, hence "closemedium" and "storageattach" (normal/multiattach) from different vboxwrappers can easily be mixed.

Suggested Solution

This PR introduces a lock to protect the critical code sections.
All vboxwrapper instances trying to modify the same parent disk entry must own the lock or wait.
This is important when a VM gets registered as well as when a VM gets deregistered.

Also included in this PR

  • some smaller modifications to improve error number reporting
  • srand() has been moved at the beginning of main()

Tests done

Several series of concurrently starting VMs (CMS subproject from LHC@home)

  • host: Linux, 16c/32t Threadripper, 128 GB RAM
  • up to 30 VMs concurrently
  • all were starting up within 7-8 s
  • Lock request loops were called up to 5-7 times (max)

@computezrmle computezrmle marked this pull request as draft April 4, 2024 17:38
@computezrmle computezrmle changed the title Add files via upload Add race mitigation lock to vboxwrapper Apr 4, 2024
@computezrmle
Copy link
Contributor Author

While it builds fine on my local Linux system I get these errors on the github build system:

vbox_vboxmanage.o: In function `VBOX_VM::remove_race_mitigation_lock(int&, std::string&)':
/__w/boinc/boinc/samples/vboxwrapper/vbox_vboxmanage.cpp:2315: undefined reference to `shm_unlink'
vbox_vboxmanage.o: In function `VBOX_VM::set_race_mitigation_lock(int&, std::string&, std::string const&)':
/__w/boinc/boinc/samples/vboxwrapper/vbox_vboxmanage.cpp:2415: undefined reference to `shm_open'
vbox_vboxmanage.o: In function `VBOX_VM::remove_race_mitigation_lock(int&, std::string&)':
/__w/boinc/boinc/samples/vboxwrapper/vbox_vboxmanage.cpp:2315: undefined reference to `shm_unlink'
collect2: error: ld returned 1 exit status

Any idea what's wrong?

@AenBleidd
Copy link
Member

@computezrmle, these two builds are building on Ubuntu 13.04, because we still need to provide the binaries for the old libc version:
image
I believe, it works on your machine because you're using more fresh version of one of the libraries

@computezrmle
Copy link
Contributor Author

@AenBleidd
I'm using openSUSE Tumbleweed [6.7.4-1-default|libc 2.39]
Can I do anything to solve this?

@AenBleidd
Copy link
Member

@computezrmle, try to add -lrt here at the very end between -lboinc and $(STDCPPTC)

$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(LDFLAGS) -o vboxwrapper$(VBOXWRAPPER_RELEASE_SUFFIX) vboxwrapper.o vbox_common.o vbox_vboxmanage.o vboxcheckpoint.o vboxjob.o vboxlogging.o floppyio.o $(MAKEFILE_LDFLAGS) -lboinc_api -lboinc $(STDCPPTC)

@computezrmle computezrmle marked this pull request as ready for review April 5, 2024 06:01
Makes log output look like other records in 'vbox_trace.txt'
@computezrmle
Copy link
Contributor Author

A case is not handled correctly in is_disk_image_registered() when a VM restarts.
Set the PR to draft to work out a solution.

@computezrmle computezrmle marked this pull request as draft April 10, 2024 16:43
@codecov
Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 10.54%. Comparing base (6cd441f) to head (85bc766).
Report is 38 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5571      +/-   ##
============================================
- Coverage     10.80%   10.54%   -0.26%     
  Complexity     1068     1068              
============================================
  Files           279      279              
  Lines         36294    35869     -425     
  Branches       8409     8409              
============================================
- Hits           3920     3781     -139     
+ Misses        31980    31694     -286     
  Partials        394      394              

see 90 files with indirect coverage changes

@computezrmle
Copy link
Contributor Author

A reliable volunteer from LHC@home and I tested the artifact from yesterday with real project tasks on Windows and Linux.
They work as expected and solve the issues addressed in the first comment of this PR.
Thus, it's ready for review now.

@computezrmle computezrmle marked this pull request as ready for review April 23, 2024 13:59
@computezrmle computezrmle requested a review from AenBleidd April 23, 2024 20:25
@AenBleidd AenBleidd merged commit 674164a into BOINC:master Apr 24, 2024
@computezrmle computezrmle deleted the add_race_mitigation_lock branch April 24, 2024 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants