Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to accelerated image within a docker container? #136

Open
maxwolffe opened this issue Sep 1, 2022 · 25 comments
Open

Convert to accelerated image within a docker container? #136

maxwolffe opened this issue Sep 1, 2022 · 25 comments

Comments

@maxwolffe
Copy link

Is it possible to build overlaybd accelerated images from within a container?

I see that buildkit is experimentally supported (https://github.com/data-accelerator/buildkit) and can be run in a container (https://github.com/data-accelerator/buildkit#containerizing-buildkit), but I also see that accelerator layer is not supported. (https://github.com/data-accelerator/buildkit#containerizing-buildkit).

Is there another path for building or converting overlaybd images within a container?

@maxwolffe maxwolffe changed the title Convert to accelerated image within a docker container Convert to accelerated image within a docker container? Sep 1, 2022
@liulanzheng
Copy link
Member

liulanzheng commented Sep 2, 2022

@maxwolffe Building overlaybd image can be run in containers under a specified condition.
First, the container must be privileged and mount /dev into container, because date should write to tcmu devices.
Second, run the overlaybd service on host, or run in a privileged container. only one instance in one host, multiple overlaybd backstore is not supported.
Could you describe your environment and requirements?

@shuaichang
Copy link

shuaichang commented Sep 6, 2022

Another option is that the overlaybd-ctr could be independent from overlaybd damonse (tcmu and snapshotter), also not require containerd config change: https://github.com/containerd/accelerated-container-image/blob/main/docs/IMAGE_CONVERTOR.md

# bin/ctr supports image conversion without requiring overlaybd-tcmu and overlaybd-snapshotter, or starts overlaybd-tcmu ondemand during conversion.
sudo bin/ctr obdconv registry.hub.docker.com/library/redis:6.2.1 registry.hub.docker.com/overlaybd/redis:6.2.1_obd_new

The use case is that we want to build Overlaybd images which introducing minimal changes to our container image release pipeline.

@lihuiba
Copy link

lihuiba commented Sep 6, 2022

@liulanzheng It is possible in principle that a simple pure command line tool without dependency on container engine can do the conversion in a container. And we'd better have such a tool, like @shuaichang has suggested.

@liulanzheng
Copy link
Member

put all in one command line tool requires amount of work. Actually, we can make overlaybd running in containers, even if it doesn't seem formal as other tcmu backstores. But it can be work very well in a controlled environment.
Based on the containerization overlaybd, we can make a simple golang conversion tool, including pulling, conversoin and pushing, and most codes can reuse containerd.

@lihuiba
Copy link

lihuiba commented Sep 15, 2022

@shuaichang @maxwolffe We have found a solution to over come the problems that prevents an all-in-one tool for image conversion. We'll try it later. And participation is welcome!

@shuaichang
Copy link

@lihuiba that’s great to know, it will be very helpful. Any guess when can we try the conversion tool?

@liulanzheng
Copy link
Member

@shuaichang we are focusing on this new approach, but we spent some time exploring. At present, I expect that it will be completed by the end of this month or early next month.

@maxwolffe
Copy link
Author

This is awesome news @liulanzheng ! Any update we can follow or help we can offer? We'd love to help pilot this.

@liulanzheng
Copy link
Member

@maxwolffe A preliminary version will be released in the next one or two days, but it is not perfect for some formats may not support. We will continue to improve and we can improve it together.

@maxwolffe
Copy link
Author

Amazing! Looking forward to it!

@liulanzheng
Copy link
Member

@maxwolffe sorry there's a bad news, we encountered some problems in functional test. We known how solve these problems but this new implementation involves thousand lines of new codes, it takes extra time.

@liulanzheng
Copy link
Member

liulanzheng commented Oct 19, 2022

@maxwolffe please refer to USERSPACE_CONVERTOR

It's not yet complete very well, feedbacks and questions are welcome.

@yuchen0cc @WaberZhuang are the authors, any problems can state here.

@maxwolffe
Copy link
Author

Great! Thanks @liulanzheng and team! Excited to try it out.

@maxwolffe
Copy link
Author

@liulanzheng - thanks again for your help getting this out.

I've started playing with this but encountered an issue, I'm hoping you can help me to debug (happy to open a separate issue if helpful).

When I attempt the convertor for an image which I can successfully download (from a private repository), I get a "failed to extract" error.

> sudo bin/convertor -r harbor-xxx/main/universe/kata-installer -u username:password -i latest -o latest_obd

INFO[0002] downloaded layer 0
ERRO[0003] run with error: failed to overlaybd apply for layer 0: failed to apply tar to overlaybd: 2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:135|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:152|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:160|read_global_config_and_set:set log_level:1
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:163|read_global_config_and_set:set log_path:/var/log/overlaybd.log
failed to extract
: exit status 255

@liulanzheng
Copy link
Member

@yuchen0cc

@yuchen0cc
Copy link
Contributor

@maxwolffe please set overlaybd-apply log level to 'debug' to get more info.
https://github.com/data-accelerator/overlaybd-apply/blob/main/src/tools/overlaybd-apply.cpp#L64

set_log_output_level(0);

@maxwolffe
Copy link
Author

@yuchen0cc - thanks friend. So I make that change (and included my own little change to confirm that the logging change was included). Here is the new output (looks very similar to the previous output):

    set_log_output_level(0);
    LOG_INFO("Logging changes included");
    photon::init(photon::INIT_EVENT_DEFAULT, photon::INIT_IO_DEFAULT);
INFO[0002] downloaded layer 0
ERRO[0004] run with error: failed to overlaybd apply for layer 0: failed to apply tar to overlaybd: 2022/11/02 03:46:41|INFO |th=0000000000000000|/home/max.wolffe/overlaybd-apply/src/tools/overlaybd-apply.cpp:65|main:Logging changes included
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/02 03:46:41|DEBUG|th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/photon-src/net/curl.cpp:227|libcurl_init:libcurl version libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:186|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:194|read_global_config_and_set:set log_level:1
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:197|read_global_config_and_set:set log_path:/var/log/overlaybd.log
failed to extract
: exit status 255

@yuchen0cc
Copy link
Contributor

yuchen0cc commented Nov 2, 2022

@maxwolffe sorry, the setting is overwrite by the following config in 'imgservice'. So please set the log level in config file instead. The config is stored in '/etc/overlaybd/overlaybd.json' by default.

{
    "logLevel": 0,
    ......
}

besides the log printed to terminal, more logs are redirected to '/var/log/overlaybd.log'
P.S. pull newest overlaybd commits to avoid unused debug info.

@maxwolffe
Copy link
Author

@yuchen0cc - thanks for the pointer there. I updated that, encounter the same error, with the same stderr/stdout output. When I look in the overlay.log, I see the following:

max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ grep "06:41" /var/log/overlaybd.log | wc -l
31939
max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ tail -n 10 /var/log/overlaybd.log
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[125829120,8]--> Mapping[125829128,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:66571993088,length:4096}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[130023424,8]--> Mapping[130023432,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:66572054528,length:4096}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[130023544,8]--> Mapping[130023552,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:4096,length:32768}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[8,64]--> Mapping[16,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:1024,length:1024}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[2,2]--> Mapping[10,0,1]
2022/11/09 06:41:09|ERROR|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/untar/libtar.cpp:275|extract_all:extract failed, filename usr/lib64/python2.7/unittest/, No space left on device

Here's my current storage device usage:

max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ df -H
Filesystem       Size  Used Avail Use% Mounted on
udev              33G     0   33G   0% /dev
tmpfs            6.6G  840k  6.6G   1% /run
/dev/nvme0n1p1   521G   65G  456G  13% /
tmpfs             33G     0   33G   0% /dev/shm
tmpfs            5.3M     0  5.3M   0% /run/lock
tmpfs             33G     0   33G   0% /sys/fs/cgroup
/dev/loop0        51M   51M     0 100% /snap/snapd/17336
/dev/loop1        59M   59M     0 100% /snap/core18/2538
/dev/loop2        50M   50M     0 100% /snap/snapd/16292
/dev/loop3        59M   59M     0 100% /snap/core18/2620
/dev/loop4        26M   26M     0 100% /snap/amazon-ssm-agent/6312
/dev/loop5        27M   27M     0 100% /snap/amazon-ssm-agent/5656
/dev/nvme0n1p15  110M  4.6M  105M   5% /boot/efi
tmpfs            6.6G     0  6.6G   0% /run/user/1000

When I turn off the debug logs I get the following output (complete output for this run):

max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ grep "06:44" /var/log/overlaybd.log
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:171|read_global_config_and_set:global config: cache_dir: /opt/overlaybd/registry_cache, cache_size_GB: 4, cache_type: file
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:227|init:create registryfs with cafile:/etc/ssl/certs/ca-certificates.crt
2022/11/09 06:44:48|INFO |th=00007F1557DCFB00|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/zfile/zfile.cpp:668|load_jump_table:trailer_offset: 4737183, idx_offset: 4207947, idx_bytes: 529236, dict_size: 0, use_dict: 0
2022/11/09 06:44:48|INFO |th=00007F1557DCFB00|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/zfile/compressor.cpp:98|init:create batch buffer, size: 1
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.cpp:261|open_lowers:LSMT::open_files_ro(files, 1) success
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.cpp:286|open_upper:upper layer : tmp_conv/sha256:a02a4930cb5d36f3290eb84f4bfa30668ef2e9fe3a1fb73ec015fc58b9958b17/writable_index , tmp_conv/sha256:a02a4930cb5d36f3290eb84f4bfa30668ef2e9fe3a1fb73ec015fc58b9958b17/writable_data
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:955|create_mappings:segment size: 0
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:1113|open_file_rw:create LSMTSparseFile object
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:1126|open_file_rw:Layer Info: { UUID:463ED3A8-B15F-456C-A313-C74008B46039 , Parent_UUID: 00000000-0000-0000-0000-000000000000, SparseRW: 1, Virtual size: 68719476736, Version: 1.1 }
2022/11/09 06:44:48|WARN |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:1384|stack_files:STACK FILES WITHOUT CHECK ORDER!!!
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:188|~LSMTReadOnlyFile:pread times: 0, size: 0M
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:188|~LSMTReadOnlyFile:pread times: 0, size: 0M
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.cpp:147|start_bk_dl_thread:no need to download
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.h:49|ImageFile:new imageFile, bs: 512, size: 68719476736
2022/11/09 06:44:48|WARN |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:190|set_result_file:no resultFile config set, ignore writing result
2022/11/09 06:44:50|ERROR|th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/src/userspace/ext2_utils.h:453|__translate_error:ext2fs unclassified error: at /home/runner/work/overlaybd-apply/overlaybd-apply/src/userspace/user.cpp:236, ecode 2133571366:to be decode
2022/11/09 06:44:50|ERROR|th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/untar/libtar.cpp:275|extract_all:extract failed, filename usr/lib64/python2.7/unittest/, No space left on device

Maybe cache is running out of space?

This image I'm attempting to convert is 552MB large.

@yuchen0cc
Copy link
Contributor

yuchen0cc commented Nov 9, 2022

@maxwolffe the "no space left" means our lsmt block device whose default size is 64 GB per layer. However it also maybe bugs in extfs.
About the testing image: 1. What's the uncompressed size of the image? 2. How many layers does it have? 3. Is it a public image (or make it public) for us to debug?

@northtyphoon
Copy link
Contributor

I ran into the same issue. You can repro it using the image in docker hub jupyter/all-spark-notebook:latest. it threw the No space left on device when extract layer 10.

@yuchen0cc
Copy link
Contributor

@northtyphoon many thanks! we'll be managed to figure it out.

yuchen0cc added a commit to yuchen0cc/accelerated-container-image that referenced this issue Nov 15, 2022
…ir for each layer, remove unused temp file.

Signed-off-by: yuchen.cc <[email protected]>

containerd#136
@yuchen0cc
Copy link
Contributor

@maxwolffe @northtyphoon sorry to keep you waiting so long.
We find bug in mkdir in extfs and fix it.
Also there are problems when using sparse file in lsmt, so we use append file instead.
Using append file maybe slower than sparse file, and wil take more space while converting.
Please have a try.

We will be continue working on the problems in sparse file...

@northtyphoon
Copy link
Contributor

Thank you @yuchen0cc

@yuchen0cc
Copy link
Contributor

yuchen0cc commented Dec 26, 2022

Recently, we have fixed bugs in sparse file, and made some efforts to speed up converting.
@maxwolffe @northtyphoon please have a try~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants