Non-GPU mandelboxes #6987

yanchenm · 2022-07-29T04:59:59Z

Ticket(s) Closed

Closes Make the host-service/backend & mandelboxes work for non-GPU instances #5725

Description
Modify host-service and mandelboxes to be able to run on non-GPU AWS instances.

Implementation

Add a new NOGPU environment variable to both host and containers (automatically set through host-setup and Dockerfile respectively)
Bring back dummy display driver for mandelboxes

Documentation & Tests Added

Testing Instructions

Test that host setup, host service, and mandelboxes work on non-GPU instance:

Spin up r5 class instance
./setup_host.sh --localdevelopment --nogpu
Everything else regarding setup/running mandelboxes should be exactly the same as before, no additional options or manual intervention required
Reboot
Build desired mandelbox image
Start host service
Run mandelbox image
Echo $NOGPU and make sure value is true
Check /usr/share/whist/display.log and ensure that X server has launched successfully and not crashed (last lines of log do not contain Server terminated with error (1).)
Check /var/log/whist/display-out.log and look for success message Waiting for AwesomeWM to exit, to keep the X server open indefinitely...
Protocol will not run successfully but make sure it has run (logs present in /var/log/whist)

Test that host setup, host service, and mandelboxes still work as expected on GPU instances.

PR Checklist

Did the PR author fully test this PR end-to-end?
Did one PR reviewer fully test this PR end-to-end?
Did one PR reviewer conduct a thorough code design review?

github-actions · 2022-07-29T05:01:51Z

Schema is unchanged, no database migration needed.

Carry on!

codecov · 2022-07-29T05:08:30Z

Codecov Report

Merging #6987 (3559c1a) into dev (de05a3a) will decrease coverage by 0.02%.
The diff coverage is 23.52%.

❗ Current head 3559c1a differs from pull request most recent head 9e1cf16. Consider uploading reports for the commit 9e1cf16 to get more accurate results

@@            Coverage Diff             @@
##              dev    #6987      +/-   ##
==========================================
- Coverage   54.43%   54.41%   -0.03%     
==========================================
  Files         156      156              
  Lines       32089    32109      +20     
==========================================
+ Hits        17467    17471       +4     
- Misses      14358    14374      +16     
  Partials      264      264

Flag	Coverage Δ		*Carryforward flag
backend-services	`44.49% <23.52%> (-0.04%)`	⬇️	Carriedforward from f5fe0f2
protocol	`56.45% <ø> (-0.02%)`	⬇️

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files	Coverage Δ
...ckend/services/host-service/mandelbox/mandelbox.go	`54.21% <0.00%> (ø)`
backend/services/metadata/environment.go	`54.79% <0.00%> (-5.82%)`	⬇️
backend/services/host-service/spinup.go	`70.93% <44.44%> (-0.57%)`	⬇️
protocol/client/frontend/virtual/interface.cpp	`15.70% <0.00%> (-0.40%)`	⬇️
protocol/client/frontend/virtual/impl.c	`20.88% <0.00%> (-0.38%)`	⬇️
protocol/whist/logging/logging.c	`89.71% <0.00%> (-0.32%)`	⬇️
protocol/whist/fec/wirehair/WirehairTools.cpp	`77.54% <0.00%> (ø)`
protocol/whist/fec/wirehair/WirehairCodec.cpp	`88.04% <0.00%> (+0.05%)`	⬆️
backend/services/host-service/host-service.go	`13.62% <0.00%> (+0.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de05a3a...9e1cf16. Read the comment docs.

mandelboxes/base/Dockerfile.20

host-setup/setup_host.sh

philippemnoel

this looks super clean!! Definitely want a review from @MauAraujo and @rpadaki

mandelboxes/base/Dockerfile.20

host-setup/setup_host.sh

backend/services/Makefile

host-setup/setup_host.sh

rpadaki

Just developer QoL suggestions. I'll spin up a new instance and get testing later today!

@philippemnoel @gabrieleoliaro what do you think of switching the client side of the end-to-end tester over to a non-GPU instance. I think this would (a) save us money and (b) better simulate conditions on a client's machine.

backend/services/metadata/environment.go

host-setup/docker-daemon-config/daemon.json

host-setup/setup_host.sh

mandelboxes/base/Dockerfile.20

mandelboxes/base/display/whist-dummy-display.conf

mandelboxes/scripts/build_mandelbox_image.py

MauAraujo

This is great @yanchenm ! Sorry for the delay in reviewing, but we have been a bit busy these last few days. Left a couple of small comments, I think once we fix them this PR will be good to go. I was able to test on an r5.large instance and start the host service + a mandelbox. I confirmed that the Xserver started correctly and even was able to connect the protocol (it complained about nvidia drivers and Chrome didn't start, but it connected and appeared to stream audio). Thanks for taking care of this and great work :)

backend/services/host-service/metrics/metrics.go

backend/services/metadata/environment.go

mandelboxes/base/Dockerfile.20

rpadaki · 2022-08-08T22:43:17Z

Chrome didn't start because we explicitly make it use the gpu @MauAraujo -- great catch. That's an easy fix; just need to experiment with chrome flags

philippemnoel · 2022-08-10T10:50:36Z

This is great @yanchenm ! Sorry for the delay in reviewing, but we have been a bit busy these last few days. Left a couple of small comments, I think once we fix them this PR will be good to go. I was able to test on an r5.large instance and start the host service + a mandelbox. I confirmed that the Xserver started correctly and even was able to connect the protocol (it complained about nvidia drivers and Chrome didn't start, but it connected and appeared to stream audio). Thanks for taking care of this and great work :)

could we make Chrome start as part of this PR? The protocol doesn't need to start we can fix this next, should be easy

philippemnoel · 2022-08-10T10:51:21Z

@yanchenm @MauAraujo can we get this PR wrapped up please?

MauAraujo · 2022-08-10T15:28:05Z

@yanchenm @MauAraujo can we get this PR wrapped up please?

Yes, we just have to solve the remaining E2E issues and it's good to go. Will wrap up today

yanchenm · 2022-08-10T17:47:18Z

Testing Chrome flags to get the launch working, will update

github-actions · 2022-08-11T16:59:56Z

Protocol End-to-End Streaming Test Results

Experiments Summary

Expand Summary

✅ Experiment 1 - Bandwidth: Unbounded, Delay: None, Packet Drops: None, Queue limit: default. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/yanchen/non-gpu/2022_08_11@16-30-35/ 2022_08_11@16-30-35/ --recursive

✅ Experiment 2 - Bandwidth: variable between 15Mbit and 30Mbit, Delay: 10 ms, Packet Drops: None, Queue Limit: None, Conditions change over time? Yes, frequency is variable between 1000 ms and 2000 ms. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/yanchen/non-gpu/2022_08_11@16-39-45/ 2022_08_11@16-39-45/ --recursive

✅ Experiment 3 - Bandwidth: variable between 10Mbit and 20Mbit, Delay: 10 ms, Packet Drops: None, Queue Limit: None, Conditions change over time? Yes, frequency is variable between 500 ms and 2000 ms. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/yanchen/non-gpu/2022_08_11@16-44-05/ 2022_08_11@16-44-05/ --recursive

✅ Experiment 4 - Bandwidth: 10Mbit, Delay: 10 ms, Packet Drops: None, Queue Limit: 100 packets, Conditions change over time? No.. Download logs:

aws s3 cp s3://whist-e2e-protocol-test-logs/yanchen/non-gpu/2022_08_11@16-48-25/ 2022_08_11@16-48-25/ --recursive

Full Results: link here

philippemnoel

lgtm!

Signed-off-by: Philippe Noël <[email protected]>

This reverts commit d608bf8.

yanchenm requested review from rpadaki, MauAraujo, philippemnoel and goliaro as code owners July 29, 2022 05:00

github-actions bot added 📁 Repo: services This PR/Issue modifies /services code 📁 Repo: host-setup This PR/Issue modifies /host-setup code 📁 Repo: mandelboxes This PR/Issue modifies /mandelboxes code 📁 Repo: backend This PR/Issue modifies /backend code labels Jul 29, 2022

yanchenm commented Jul 29, 2022

View reviewed changes

mandelboxes/base/Dockerfile.20 Show resolved Hide resolved

philippemnoel reviewed Jul 29, 2022

View reviewed changes

host-setup/setup_host.sh Outdated Show resolved Hide resolved

philippemnoel reviewed Jul 29, 2022

View reviewed changes

mandelboxes/base/Dockerfile.20 Outdated Show resolved Hide resolved

philippemnoel reviewed Jul 29, 2022

View reviewed changes

host-setup/setup_host.sh Outdated Show resolved Hide resolved

MauAraujo reviewed Jul 29, 2022

View reviewed changes

backend/services/Makefile Outdated Show resolved Hide resolved

host-setup/setup_host.sh Outdated Show resolved Hide resolved

rpadaki suggested changes Jul 29, 2022

View reviewed changes

yanchenm requested review from rpadaki and MauAraujo August 4, 2022 19:47

MauAraujo reviewed Aug 8, 2022

View reviewed changes

backend/services/host-service/metrics/metrics.go Outdated Show resolved Hide resolved

backend/services/metadata/environment.go Outdated Show resolved Hide resolved

backend/services/metadata/environment.go Show resolved Hide resolved

mandelboxes/base/Dockerfile.20 Show resolved Hide resolved

yanchenm requested review from jkarthic, npip99 and sardination as code owners August 8, 2022 22:16

github-actions bot added the 📁 Repo: protocol This PR/Issue modifies /protocol code label Aug 8, 2022

MauAraujo force-pushed the yanchen/non-gpu branch from f27fb23 to 49b1135 Compare August 10, 2022 16:23

yanchenm requested a review from a team August 10, 2022 17:26

yanchenm and others added 15 commits August 11, 2022 09:17

fix typos

8fc6379

fix variable check

effb248

add nogpu flag

4c49b44

add default case for nvidia driver

6844e97

add unrecognized option

0071173

explicit NOGPU false flag

0530102

remove nogpu check from xorg conf update script

93f4fa2

documentation

528c82c

pr comments

bb51798

flip nogpu flag

70378ff

remove unused import

bcf2797

pr comments

a74dd17

add gpu flag to integration test

80d9ccf

add gpu flag to helper

0300d09

Supress hadolint warnings

f5fe0f2

yanchenm dismissed stale reviews from goliaro and MauAraujo via f5fe0f2 August 11, 2022 16:18

yanchenm force-pushed the yanchen/non-gpu branch from 7fbde6d to f5fe0f2 Compare August 11, 2022 16:18

add unit tests for IsGPU

279cc3a

yanchenm requested a review from MauAraujo August 11, 2022 17:05

philippemnoel reviewed Aug 12, 2022

View reviewed changes

Merge branch 'dev' into yanchen/non-gpu

9e1cf16

Signed-off-by: Philippe Noël <[email protected]>

philippemnoel merged commit d608bf8 into dev Aug 12, 2022

philippemnoel deleted the yanchen/non-gpu branch August 12, 2022 14:16

rpadaki added a commit that referenced this pull request Aug 12, 2022

Revert "Non-GPU mandelboxes (#6987)"

a8ce09a

This reverts commit d608bf8.

rpadaki mentioned this pull request Aug 12, 2022

Revert "Non-GPU mandelboxes" #7043

Merged

rpadaki added a commit that referenced this pull request Aug 12, 2022

Revert "Non-GPU mandelboxes (#6987)"

5350f78

This reverts commit d608bf8.

rpadaki restored the yanchen/non-gpu branch August 12, 2022 18:36

rpadaki mentioned this pull request Oct 20, 2022

Fix LTR crashes on FFmpeg SW encoder #7464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-GPU mandelboxes #6987

Non-GPU mandelboxes #6987

yanchenm commented Jul 29, 2022 •

edited by MauAraujo

Loading

github-actions bot commented Jul 29, 2022

codecov bot commented Jul 29, 2022 •

edited

Loading

philippemnoel left a comment

rpadaki left a comment

MauAraujo left a comment

rpadaki commented Aug 8, 2022

philippemnoel commented Aug 10, 2022

philippemnoel commented Aug 10, 2022

MauAraujo commented Aug 10, 2022

yanchenm commented Aug 10, 2022

github-actions bot commented Aug 11, 2022

philippemnoel left a comment •

edited

Loading

Non-GPU mandelboxes #6987

Non-GPU mandelboxes #6987

Conversation

yanchenm commented Jul 29, 2022 • edited by MauAraujo Loading

github-actions bot commented Jul 29, 2022

Schema is unchanged, no database migration needed.

codecov bot commented Jul 29, 2022 • edited Loading

Codecov Report

philippemnoel left a comment

Choose a reason for hiding this comment

rpadaki left a comment

Choose a reason for hiding this comment

MauAraujo left a comment

Choose a reason for hiding this comment

rpadaki commented Aug 8, 2022

philippemnoel commented Aug 10, 2022

philippemnoel commented Aug 10, 2022

MauAraujo commented Aug 10, 2022

yanchenm commented Aug 10, 2022

github-actions bot commented Aug 11, 2022

Protocol End-to-End Streaming Test Results

Experiments Summary

philippemnoel left a comment • edited Loading

Choose a reason for hiding this comment

yanchenm commented Jul 29, 2022 •

edited by MauAraujo

Loading

codecov bot commented Jul 29, 2022 •

edited

Loading

philippemnoel left a comment •

edited

Loading