Fix isolate options #913

gollux · 2018-03-24T14:53:08Z

I reviewed how current CMS calls isolate and there are several problems with it:

(1) When using cgroups (which is currently the CMS default), --cg-timing is not specified, so the time limit does not work properly.

(2) When not using cgroups, memory limit is still passed as --cg-mem, which does not work.

(3) We do not set --quota, so unless root set up FS quotas for all UIDs used by isolate, the disk space consumed by the solution is not limited at all.

This pull request solves (1) and (2). I would like to know you opinion how to specify the quota: should I add a setting in the config file for that? Or a more general config setting for adding arbitrary isolate options (--extra-time could be handy, too)?

This change is

…enabled

codecov · 2018-03-24T15:00:47Z

Codecov Report

Merging #913 into master will increase coverage by <.01%.
The diff coverage is 75%.

@@            Coverage Diff             @@
##           master     #913      +/-   ##
==========================================
+ Coverage   61.08%   61.08%   +<.01%     
==========================================
  Files         203      203              
  Lines       16938    16940       +2     
==========================================
+ Hits        10346    10348       +2     
  Misses       6592     6592

Flag	Coverage Δ
#functionaltests	`47.14% <75%> (+0.05%)`	⬆️
#unittests	`37.43% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
cms/grading/Sandbox.py	`67.01% <75%> (+0.29%)`	⬆️
cms/io/PsycoGevent.py	`80.48% <0%> (-4.88%)`	⬇️
cms/db/base.py	`84.25% <0%> (-3.71%)`	⬇️
cms/service/Worker.py	`80% <0%> (-3.16%)`	⬇️
cms/grading/Job.py	`90.38% <0%> (-0.97%)`	⬇️
cms/service/EvaluationService.py	`68.97% <0%> (-0.77%)`	⬇️
cms/service/ProxyService.py	`70.2% <0%> (-0.51%)`	⬇️
cms/server/admin/handlers/base.py	`68.3% <0%> (+0.32%)`	⬆️
cms/service/workerpool.py	`61.57% <0%> (+0.49%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0591fe7...fff8912. Read the comment docs.

stefano-maggiolo · 2018-03-24T17:07:24Z

Thanks for looking at the options! As you said, use_cgroup is always enabled so the second commit doesn't change the behaviour (still appreciated!). I'm curious about the first, since we have test that should pick up if timing is not checked appropriately. Is it only wrong when using multiple processes/threads? In which direction?

gollux · 2018-03-24T17:10:21Z

As you said, use_cgroup is always enabled so the second commit doesn't change the behaviour (still appreciated!).

Actually, it is enabled by default and it is possible to switch it off in CMS configuration.

I'm curious about the first, since we have test that should pick up if timing is not checked appropriately. Is it only wrong when using multiple processes/threads? In which direction?

Yes, it is off only with multiple processes (maybe also threads). It measures only run time of the master process, not of all processes in the cgroup.

stefano-maggiolo · 2018-03-24T22:13:36Z

I cannot reproduce, can you help me understand (trying to figure out if this justify backporting and releasing a new version for 1.3).

With python & threading:

import sys
import threading
import time

def w():
    t = time.time()
    i = 0
    while time.time() - t < 1:
        i += 1

n = int(sys.stdin.readline().strip())
t = threading.Thread(target=w)
t.start()
t.join()
sys.stdout.write("correct %d\n" % n)

$ echo 1 | time python /tmp/batchstdio.py 
correct 1
1.04user 0.01system 0:01.06elapsed 99%CPU (0avgtext+0avgdata 9884maxresident)k

CMS: Execution timed out | (0.588 s) (0.633 s) (3 MiB)

With C & fork:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>

int main() {
    int n;

    pid_t ret = fork();

    if (ret == 0) {
      int i = 0;
      struct timespec start, end;
      clock_gettime(CLOCK_MONOTONIC_RAW, &start);
      while(1) {
        i++;
        clock_gettime(CLOCK_MONOTONIC_RAW, &end);
        long long delta_us = (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_nsec - start.tv_nsec) / 1000;
        if (delta_us > 1000000) {
          break;
        }
      }
      if (i < 10000) {
        printf("slow\n");
      }
    } else {
      if (ret != -1) {
        waitpid(ret, NULL, 0);
        scanf("%d", &n);
        printf("correct %d\n", n);
      }
    }

    return 0;
}

$ echo 1 | time /tmp/fork 
correct 1
0.11user 0.88system 0:01.00elapsed 99%CPU (0avgtext+0avgdata 2672maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps

CMS: Execution timed out | 0 | (1.004 s) (1.055 s) (0 MiB)

(note these are not hitting wall time limit, which is 2).

gollux · 2018-03-25T11:03:32Z

Your Python test with threads times out properly (Linux kernel adds execution times of all threads), but the C test does not:

***@***.***:~/a/tree/isolate$ isolate -v -p10 --cg --run -t 0.5 -M- -- ./fork Using control group box-0 under parent . Entering control group box-0 Started proxy_pid=13261 box_pid=13262 box_pid_inside_ns=2 Binding ./box on box (flags 1006) Binding /bin on bin (flags 1007) Binding /dev on dev (flags 1003) Binding /lib on lib (flags 1007) Binding /lib64 on lib64 (flags 1007) Mounting proc on proc (flags 5) Binding /usr on usr (flags 1007) Binding //etc on etc (flags 1007) [timer][timer][timer][timer][timer][timer][timer][timer][timer][timer]correct 1 time:1.004 time-wall:1.052 max-rss:1376 csw-voluntary:7 csw-forced:2 cg-mem:5028 status:TO message:Time limit exceeded Time limit exceeded

Please observe that the time limit was reported only after the program finished.

stefano-maggiolo · 2018-03-25T14:03:07Z

So it's more a matter of detecting the TO early, rather than detecting it at all?

gollux · 2018-03-25T16:45:23Z

So it's more a matter of detecting the TO early, rather than detecting it at all?

It is a little bit unexpected, but yes :-) It turns out that the wait4() syscall is not reporting the resource usage of the finished process only, but it also adds times of all its finished children. This does not seem to be documented anywhere, but the kernel source is clear. However, it can still be misused: Suppose you have a task with time limit 1s. You fork two processes: process A calculates for 900 ms, process B calculates for 900 ms and then it loops forever. The main process waits for A, and then it exists, leaving B running. Process B is killed by the kernel when tearing down the PID namespace, but it is too late: only the time spent in A was included in the total time of the parent process. Hence you spent 1800 ms by computing, but Isolate sees only 900 ms. I would really like to have this fix backported to the 1.3 branch.

stefano-maggiolo · 2018-03-25T19:55:19Z

Thank you, merged in both and we'll release a 1.3.1 in O(days), possibly with the quota fix too.

Speaking of quota. We use -f to limit the size of each file written (default to 1GB), and we have strict control on what files are writable (at most 1 for Batch, 0 for Communication and TwoSteps). Isn't that enough?

gollux · 2018-03-26T08:19:23Z

Thank you, merged in both and we'll release a 1.3.1 in O(days), possibly with the quota fix too.

Thanks! Since this has shown that isolate's parsing of options is not robust enough, I plan to change it a bit: * --cg-mem and similar CG-related options will be refused in non-CG mode. * --cg-timing will be the default in --cg mode. It is dubious if it is ever useful to use non-CG timing with CGs, but if somebody still wants to do it, there will be --no-cg-mode available.

Speaking of quota. We use -f to limit the size of each file written (default to 1GB), and we have strict control on what files are writable (at most 1 for Batch, 0 for Communication and TwoSteps). Isn't that enough?

I looked at the IsolateSandbox class once again and I see no way how to circumvent the file size limit, so no urgent fix is needed. In the long term, I do not feel much comfortable with the current mechanism. At the first sight, it seems to fit the needs, but it has many sharp corners: for example, the tested program cannot write something to the output file, then delete it, and start writing again. Of course, this is not a common access pattern, but it is obviously not forbidden by the usual contest rules, so it should work. Also, it is bold to assume that no library function will ever need to create a temporary file. For these reasons, I would prefer to use the directory created by Isolate itself when initializing the sandbox. You can create the input files and the executable there and have "isolate --run" change the owner of all files to the UID of the sandboxed program. Hence, the program will have a fully writable directory, limited only by the filesystem quota. Finally, Isolate will change the UID back to the caller's one, so CMS will have full access to the files. Does this make sense?

stefano-maggiolo · 2018-03-26T08:53:22Z

Good idea on avoiding incompatible combinations of options.

For quota: I think we want a subdirectory of isolate's dir as fully writable by the program, in case (we write other stuff in the main dir, such as run.log and commands.log). I agree with the idea, the quota limit seems fairer and tiny bit safer than the current approach (but again, I'm far from being an expert).

In case I think we can have a reasonable default stored in an undocumented option similar to cgroup, so that it's easy to change in case of peculiar needs. I'm going to introduce some more of those "reasonable default" options soon to limit trusted executions, like the checker. (We should probably document those options in a "you shouldn't change those, but in case you need here they are" section.)

As you said, it seems less important for 1.3 though.

stefano-maggiolo · 2018-03-26T08:53:44Z

I'm going to close this and create an issue to track quota.

Using cgroups without --cg-timing makes little sense and it is too easy to forget to ask for --cg-timing explicitly. Whoever wishes to use the single-process time limit mechanism with CG, he can make his wish explicit by --no-cg-timing. For more background, see cms-dev/cms#913.

gollux added 2 commits March 24, 2018 15:47

Sandbox: Use isolate with --cg-timing if cgroup are enabled

c1224c3

Sandbox: Isolate requires --mem if cgroups are disabled, --cg-mem if …

fff8912

…enabled

stefano-maggiolo closed this Mar 26, 2018

stefano-maggiolo mentioned this pull request Mar 26, 2018

Use isolate's quota option instead of fsize #916

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix isolate options #913

Fix isolate options #913

gollux commented Mar 24, 2018 •

edited by stefano-maggiolo

Loading

codecov bot commented Mar 24, 2018 •

edited

Loading

stefano-maggiolo commented Mar 24, 2018

gollux commented Mar 24, 2018 via email

stefano-maggiolo commented Mar 24, 2018

gollux commented Mar 25, 2018 via email

stefano-maggiolo commented Mar 25, 2018

gollux commented Mar 25, 2018 via email

stefano-maggiolo commented Mar 25, 2018

gollux commented Mar 26, 2018 via email

stefano-maggiolo commented Mar 26, 2018

stefano-maggiolo commented Mar 26, 2018

Fix isolate options #913

Fix isolate options #913

Conversation

gollux commented Mar 24, 2018 • edited by stefano-maggiolo Loading

codecov bot commented Mar 24, 2018 • edited Loading

Codecov Report

stefano-maggiolo commented Mar 24, 2018

gollux commented Mar 24, 2018 via email

stefano-maggiolo commented Mar 24, 2018

gollux commented Mar 25, 2018 via email

stefano-maggiolo commented Mar 25, 2018

gollux commented Mar 25, 2018 via email

stefano-maggiolo commented Mar 25, 2018

gollux commented Mar 26, 2018 via email

stefano-maggiolo commented Mar 26, 2018

stefano-maggiolo commented Mar 26, 2018

gollux commented Mar 24, 2018 •

edited by stefano-maggiolo

Loading

codecov bot commented Mar 24, 2018 •

edited

Loading