Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First iteration of etc passthrough #850

Closed
wants to merge 8 commits into from

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Aug 17, 2022

Description of the changes

The first etc passthrough file is the /etc/hostname. Gramine reads the option to global PAL state, and LibOS uses it to create a pseudo file with its content. Gramine sanitizes hostname. It requires that it is a valid domain. This is a difference from Linux, as Linux accepts any hostname value. That said, Linux doesn't assume that the root user tries to exploit it through hostname, and Gramine should.

Gramine uses pseudofs for etc passthrough.

From what I can see, most applications use the uname(2) syscall to obtain a current hostname, for example: gethostame.
The /etc/hostname is read during startup, and the sethostname(2) is called. For example, this is done on Ubuntu 20.04 with /etc/init/hostname.conf script.

This change doesn't address the support of the sethostname(2). The value in /etc/hostname and hostname may differ (on Linux sethostname(2) syscall changes only kernel value but not the value in /etc/hostname). We probably should separately propagate the current value of the hostname. We are also missing a mechanism for propagating the hostname across multiple processes. We might also decide just to remove sethostname(2) completely. My understanding is that we have it only to enable LTP tests.

Issue

#689

Next step

  • Add support for resolv.conf

How to test this PR?

New tests added.


This change is Reviewable

The first `etc` passthrough file is the `/etc/hostname`. Gramine reads
the option to global PAL state, and LibOS uses it to create a pseudo file
with its content. Gramine sanitizes hostname. It requires that it is a valid
domain. This is a difference from Linux, as Linux accepts any hostname value.
That said, Linux doesn't assume that the root user tries to exploit it through
hostname, and Gramine should.

Gramine uses pseudofs for `etc` passthrough.

Signed-off-by: Mariusz Zaborski <[email protected]>
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 30 of 30 files at r1, all commit messages.
Reviewable status: all files reviewed, 48 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @oshogbo)


-- commits line 2 at r1:
I'd prefer the name of the file in the one-liner. So maybe append (currently only 'hostname')


-- commits line 5 at r1:
the option -> the host file


Documentation/manifest-syntax.rst line 152 at r1 (raw file):

- ``/etc/hostname``

We should also add what happens if the user specifies /etc/hostname through this manifest option and through e.g. sgx.trusted_files (iiuc, the new manifest option wins and overrides the host file).


libos/include/libos_internal.h line 156 at r1 (raw file):

long pal_to_unix_errno(long err);

long libos_set_hostname(char* name, size_t len);

I would prefer set_hostname(). We don't add libos_ prefix to any functions other than syscall-entrypoint functions, so the current name is confusing to me.


libos/src/libos_init.c line 427 at r1 (raw file):

    RUN_INIT(init_mount);
    RUN_INIT(init_std_handles);
    RUN_INIT(init_mount_etc);

It's important that init_mount_etc goes after init_mount, right? This forces the override of /etc/hostname even if the user specified it in fs.mounts manifest option. I would add a quick comment about this.


libos/src/fs/libos_fs.c line 78 at r1 (raw file):

        goto err;

    if ((ret = init_etc()) < 0)

Why not init_etcfs for uniformity with the above functions?


libos/src/fs/etc/fs.c line 7 at r1 (raw file):

/*
 * This file contains the implementation of `etc` passthrough.

passthrough -> FS?

Here, at this LibOS FS layer, it seems strange to talk about "passthrough" since here the files are not backed by anything from the host, but rather backed by the PAL-state objects.

I would also add that LibOS assumes that contents of all etc files were already sanitized.


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

int init_etc(void) {
    pseudo_add_str(NULL, "hostname", &provide_etc_hostname);

What kind of magic is this? How can the parent_node == NULL (first argument)? What does this actually do?

And how is this connected to init_mount_etc()?

UPDATE: It seems that this creates a "parentless" hostname pseudo-file that can be used as a backing storage for /etc/hostname in the below function. I'm still struggling to find the exact place in the FS code where this connection happens.


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

int init_etc(void) {
    pseudo_add_str(NULL, "hostname", &provide_etc_hostname);

Why do we do this unconditionally? shouldn't we check for g_pal_public_state->passthrough_etc_files? Or we don't care about creating an unused pseudo-file?


libos/src/fs/etc/fs.c line 41 at r1 (raw file):

        .type = "pseudo",
        .path = "/etc/hostname",
        .uri = "hostname",

What is this URI? Is it useless? I would just call it "dummy" then, or set it to NULL (if possible).

UPDATE: Apparently it's not useless, but even after looking at the code, I still don't get the connection between this and "parentless" hostname pseudo-file above.


libos/src/fs/etc/fs.c line 56 at r1 (raw file):

    /* Propagate hostname */
    size_t off = ADD_CP_OFFSET(sizeof(g_pal_public_state->hostname) +
                               sizeof(g_pal_public_state->passthrough_etc_files));

Why do you need to propagate the boolean g_pal_public_state->passthrough_etc_files? It can never change in the child, as it is read from the same manifest file (which cannot be changed).


libos/src/sys/libos_uname.c line 21 at r1 (raw file):

 * effects of set{host,domain}name in process A will not be visible in process B.
 * These syscalls are rarely used and are implemented in Gramine mainly to enable LTP to test
 * our `uname` implementation. */

This comment is not exactly correct now. nodename is propagated to the child on fork (though not synchronized afterwards). Maybe put a comment about this in a separate paragraph.


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

long libos_set_hostname(char* name, size_t len) {
    if (len >= sizeof(g_current_uname.nodename))

Why =? From what I read in the man pages, nodename doesn't need to be null-terminated.


libos/src/sys/libos_uname.c line 54 at r1 (raw file):

long libos_syscall_sethostname(char* name, int len) {
    if (len < 0 || (size_t)len >= sizeof(g_current_uname.nodename))

ditto


libos/test/regression/hostname.c line 3 at r1 (raw file):

#define _DEFAULT_SOURCE BSD /* This is required for gethostname */

#include <sys/wait.h>

This one is not sorted


libos/test/regression/hostname.c line 11 at r1 (raw file):

#include <unistd.h>

static void test_fork(const char* tag, const char* name, void (*f)(const char*, const char*)) {

Could you maybe rename name -> expected_name? I got confused. Here and everywhere in this test.


libos/test/regression/hostname.c line 17 at r1 (raw file):

    if (pid == -1) {
        printf("Unable to fork %s\n", tag);
        exit(1);

Instead of these two lines, we typically use err(). Here and everywhere.


libos/test/regression/hostname.c line 40 at r1 (raw file):

    if (gethostname(buf, sizeof(buf)) != 0) {
        printf("%sgethostname: failed %d\n", tag, errno);

Hm, there is a a corner case of ENAMETOOLONG (see man page's NOTES). It means that the hostname is itself correct but too long. But I guess we can ignore it.


libos/test/regression/hostname.c line 40 at r1 (raw file):

    if (gethostname(buf, sizeof(buf)) != 0) {
        printf("%sgethostname: failed %d\n", tag, errno);

Please add a space between %s (for tag) and gethostname


libos/test/regression/hostname.c line 45 at r1 (raw file):

    if (strcmp(buf, name) != 0) {
        printf("%sgethostname dosen't match hostname (expected: %s, got: %s)\n",

Please add a space between %s (for tag) and gethostname


libos/test/regression/hostname.c line 45 at r1 (raw file):

    if (strcmp(buf, name) != 0) {
        printf("%sgethostname dosen't match hostname (expected: %s, got: %s)\n",

typo: doesn't


libos/test/regression/hostname.c line 47 at r1 (raw file):

        printf("%sgethostname dosen't match hostname (expected: %s, got: %s)\n",
               tag, name, buf);
        exit(1);

Here you can use errx()


libos/test/regression/hostname.c line 58 at r1 (raw file):

    /*
     * If the etc hostname was not provided, assume that etc shouldn't exists.

exist


libos/test/regression/hostname.c line 58 at r1 (raw file):

    /*
     * If the etc hostname was not provided, assume that etc shouldn't exists.

If the expected etc hostname ...


libos/test/regression/hostname.c line 62 at r1 (raw file):

    if (strcmp(name, "") == 0) {
        if (fd != -1 || errno != ENOENT) {
            printf("The etc file shouldn't exists, but exists\n");

shouldn't exist


libos/test/regression/hostname.c line 80 at r1 (raw file):

    /*
     * Sometimes etc hostname might have a trailing '\n', gramine is romving it,

removing


libos/test/regression/hostname.c line 89 at r1 (raw file):

    if (strcmp(buf, name) != 0) {
        printf("%s etc don't have a expected value (expected: %s, got: %s)\n",

%s /etc/hostname doesn't have ...


libos/test/regression/hostname.c line 101 at r1 (raw file):

    }

    test_gethostname("", argv[1]);

Sometimes you have messages that use the tag like this: in %s. If you specify the empty string as the tag, then the message looks weird (...in \n). I suggest to add some tag here like "parent" or "main".


libos/test/regression/hostname.c line 104 at r1 (raw file):

    test_etc_hostname("", argv[2]);
    test_fork("fork gethostname", argv[1], test_gethostname);
    test_fork("fork etc gethostname", argv[2], test_etc_hostname);

fork etc_hostname


libos/test/regression/hostname.c line 106 at r1 (raw file):

    test_fork("fork etc gethostname", argv[2], test_etc_hostname);

    printf("hostname test passed\n");

We use TEST OK typically.


libos/test/regression/hostname_pass_etc.manifest.template line 9 at r1 (raw file):

fs.mounts = [
  { path = "/lib", uri = "file:{{ gramine.runtimedir(libc) }}" },
  { path = "/hostname", uri = "file:{{ binary_dir }}/hostname" },

Maybe as a test also add smth like this:

  { path = "/etc/hostname", uri = "file:{{ binary_dir }}/hostname" }, # dummy file to check that `libos.passthrough_etc_files` overrides

libos/test/regression/hostname_pass_etc.manifest.template line 14 at r1 (raw file):

sgx.debug = true
sgx.nonpie_binary = true
sgx.thread_num = 16

Why? Just remove this.


libos/test/regression/test_libos.py line 8 at r1 (raw file):

import unittest

from socket import gethostname

I would prefer

import socket

... socket.gethostname() ...

It's more clear to me when I read Python code.


libos/test/regression/test_libos.py line 1125 at r1 (raw file):

    def test_070_hostname_localhost(self):
        stdout, _ = self.run_binary(['hostname', 'localhost', ''])

Hm, the third argument (the empty string) relies on the fact that the generic manifest (manifest.template) doesn't have smth like fs.mounts = [ { path = "/etc", uri = "file:/etc" } ]. I would add a comment here, this is not obvious (and may change in the future).


libos/test/regression/test_libos.py line 1125 at r1 (raw file):

    def test_070_hostname_localhost(self):
        stdout, _ = self.run_binary(['hostname', 'localhost', ''])

Why is it always the case that the second argument is localhost? Can't gethostname() return something real (because we unconditionally call libos_set_hostname in LibOS)?

I'll need to look at the PAL implementation... UPDATE: Ok, you set g_pal_public_state->hostname = "localhost" during PAL init time.


pal/include/host/linux-common/host_info.h line 11 at r1 (raw file):

/* Function to fetch the hostname */
int get_hostname(char* hostname, size_t size);

Do we really a new header file for this? Or there will be more functions in follow-up PRs?

Otherwise I would just stash this func declaration in the already-existing file linux_utils.h.

UPDATE: Ok, you should move stuff from topo_info.h to this new file, since you consider the name "host_info" more generic (based on your PAL C code).


pal/src/host/linux/pal_main.c line 134 at r1 (raw file):

                       false, &g_pal_public_state.passthrough_etc_files);
    if (ret < 0)
        INIT_FAIL("Unable to parse manifest");

Please make it more specific, that the failure happened while parsing libos.passthrough_etc_files

UPDATE: in the linux-sgx PAL, you write this: log_error("Cannot parse 'libos.passthrough_etc_files'");


pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

    if (first_process) {
        get_host_info();
    }

Any reason why you moved it here?


pal/src/host/linux-common/host_info.c line 7 at r1 (raw file):

/*
 * This file contains the APIs to expose host information.

Hm, do you want to put stuff from topo_info.c here? Because that file also "exposes host information".

Alternatively, let's keep them separate, but please rename the files to something more specific, like etc_host_info.


pal/src/host/linux-common/host_info.c line 13 at r1 (raw file):

#include <linux/utsname.h>

#include "linux_utils.h"

Why do you need this include?


pal/src/host/linux-common/host_info.c line 23 at r1 (raw file):

    if (DO_SYSCALL(uname, &c_uname) != 0)
        return -1;

Meh, can we return some UNIX error code? Actually, why not the error code from uname() itself?


pal/src/host/linux-sgx/host_main.c line 18 at r1 (raw file):

#include "host_ecalls.h"
#include "host_internal.h"
#include "host_info.h"

Seems not sorted


pal/src/host/linux-sgx/host_main.c line 920 at r1 (raw file):

        return ret;

    /* If we do not expose etc, we don't need any addiotianl information about host. */

additional


pal/src/host/linux-sgx/pal_main.c line 250 at r1 (raw file):

}

static bool is_hostname_valid(const char* hostname) {

Please explain the sanitization rules in a top-level comment. For example, why there is a limit on 63 chars?


pal/src/host/linux-sgx/pal_main.c line 257 at r1 (raw file):

        return false;

    while (*ptr != '\0') {

Aren't you supposed to also check that ptr - hostname > PAL_HOSTNAME_MAX and fail if it is?

Otherwise the malicious host can force this enclave code into an infinite loop.


pal/src/host/linux-sgx/pal_main.c line 637 at r1 (raw file):

        ret = import_and_sanitize_topo_info(&host_info.topo_info);
        if (ret < 0) {
            log_error("Failed to copy and sanitize topology information");

Since we are here, could you also print the ret value?


pal/src/host/linux-sgx/pal_main.c line 693 at r1 (raw file):

     * checkpointed and restored during forking of the child process(es). */
    if (parent_stream_fd < 0) {
        if ((ret = init_passthrough_etc_files(&host_info)) < 0) {

I would move topo_info initialization to here as well, otherwise you have very similar ...only for the first process code snippets.


pal/src/host/linux-sgx/pal_misc.c line 25 at r1 (raw file):

#include "toml_utils.h"
#include "topo_info.h"
#include "linux_utils.h"

Why?

!TODO use this commit message:

[LibOS,PAL,Docs] Introduce etc passthrough (currently only 'hostname')

The first `etc` passthrough file is the `/etc/hostname`. Gramine reads
the host file to global PAL state, and LibOS uses it to create a pseudo file
with its content. Gramine sanitizes hostname. It requires that it is a valid
domain. This is a difference from Linux, as Linux accepts any hostname value.
That said, Linux doesn't assume that the root user tries to exploit it through
hostname, and Gramine should.

Gramine uses pseudofs for `etc` passthrough.

Signed-off-by: Mariusz Zaborski <[email protected]>
Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 48 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @dimakuv, @mkow, and @oshogbo)


-- commits line 2 at r1:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'd prefer the name of the file in the one-liner. So maybe append (currently only 'hostname')

Done.


-- commits line 5 at r1:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

the option -> the host file

Done.


Documentation/manifest-syntax.rst line 152 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We should also add what happens if the user specifies /etc/hostname through this manifest option and through e.g. sgx.trusted_files (iiuc, the new manifest option wins and overrides the host file).

Done.


libos/include/libos_internal.h line 156 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would prefer set_hostname(). We don't add libos_ prefix to any functions other than syscall-entrypoint functions, so the current name is confusing to me.

Done.


libos/src/libos_init.c line 427 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

It's important that init_mount_etc goes after init_mount, right? This forces the override of /etc/hostname even if the user specified it in fs.mounts manifest option. I would add a quick comment about this.

Done.


libos/src/fs/libos_fs.c line 78 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not init_etcfs for uniformity with the above functions?

Done.


libos/src/fs/etc/fs.c line 7 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

passthrough -> FS?

Here, at this LibOS FS layer, it seems strange to talk about "passthrough" since here the files are not backed by anything from the host, but rather backed by the PAL-state objects.

I would also add that LibOS assumes that contents of all etc files were already sanitized.

Done.


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

UPDATE: It seems that this creates a "parentless" hostname pseudo-file that can be used as a backing storage for /etc/hostname in the below function. I'm still struggling to find the exact place in the FS code where this connection happens.

Yes, that's the case.

When I have implemented that, it takes me a moment to figure this out.

So in pseufods (how I understand this), we can create a structured directory: /sys/something. Then we mount the whole /sys into chosen mount point (using mount_fs).

Creating a node with a parent (for example etc) will take precedence over whole /etc. So, for example, if the user would do fs.mount={/etc}, we wouldn't show him the /etc/ because we would over mount the directory with our pseudo directory. We also can't provide a mount point for a single file in this case (/etc/hostname) - this is not supported.

I find that we create a parentless node and mount it wherever we want.

https://github.com/gramineproject/gramine/blob/master/libos/src/fs/libos_fs_pseudo.c#L533
Here we simply add a parentless node.

And during lookup for a given mountpoint we are doing:
lookup function->pseudo_lookup->pseudo_find->pseudo_find_root, we find "hostname" pseudofile.

We mount this pseudofile hostname to /etc/hostname instead of /hostname


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do we do this unconditionally? shouldn't we check for g_pal_public_state->passthrough_etc_files? Or we don't care about creating an unused pseudo-file?

I don't think we care about creating it while they are not mounted.
As well if we do it only for passthrough_etc_files, we won't be able to receive the checkpoint. But this might depend on my misunderstand of manifest and that we maybe can trust it? (Manifest issue discussed below).


libos/src/fs/etc/fs.c line 41 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What is this URI? Is it useless? I would just call it "dummy" then, or set it to NULL (if possible).

UPDATE: Apparently it's not useless, but even after looking at the code, I still don't get the connection between this and "parentless" hostname pseudo-file above.

The path is where we will mount the file, and the uri is the file we want to mount.
In this case we are mounting a file hostname from the pseudofs which we just created in init_etc function.


libos/src/fs/etc/fs.c line 56 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do you need to propagate the boolean g_pal_public_state->passthrough_etc_files? It can never change in the child, as it is read from the same manifest file (which cannot be changed).

I assumed that in theory we don't know if it the same manifest, and we don't trust the host.
Or we are reading somehow exactly the same manifest file?


libos/src/sys/libos_uname.c line 21 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This comment is not exactly correct now. nodename is propagated to the child on fork (though not synchronized afterwards). Maybe put a comment about this in a separate paragraph.

Not really. Please notice that if you do sethostname, and fork the hostname will be different in child and parent after fork (this is described detailed in PR description). This is one of the issues the value of /etc/hostname is not corresponding to the current name of hostname.


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why =? From what I read in the man pages, nodename doesn't need to be null-terminated.

I don't know to be honest.
This was introduced by @mkow in bcca391.


libos/test/regression/hostname.c line 3 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This one is not sorted

Done.


libos/test/regression/hostname.c line 11 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you maybe rename name -> expected_name? I got confused. Here and everywhere in this test.

Done.


libos/test/regression/hostname.c line 17 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Instead of these two lines, we typically use err(). Here and everywhere.

Done.


libos/test/regression/hostname.c line 40 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Hm, there is a a corner case of ENAMETOOLONG (see man page's NOTES). It means that the hostname is itself correct but too long. But I guess we can ignore it.

ACK.


libos/test/regression/hostname.c line 40 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please add a space between %s (for tag) and gethostname

Done.


libos/test/regression/hostname.c line 45 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please add a space between %s (for tag) and gethostname

Done.


libos/test/regression/hostname.c line 45 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

typo: doesn't

Done.


libos/test/regression/hostname.c line 47 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Here you can use errx()

Done.


libos/test/regression/hostname.c line 58 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

exist

Done.


libos/test/regression/hostname.c line 58 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

If the expected etc hostname ...

Done.


libos/test/regression/hostname.c line 62 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

shouldn't exist

Done.


libos/test/regression/hostname.c line 80 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

removing

Done.


libos/test/regression/hostname.c line 89 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

%s /etc/hostname doesn't have ...

Done.


libos/test/regression/hostname.c line 101 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Sometimes you have messages that use the tag like this: in %s. If you specify the empty string as the tag, then the message looks weird (...in \n). I suggest to add some tag here like "parent" or "main".

Done.


libos/test/regression/hostname.c line 104 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

fork etc_hostname

Done.


libos/test/regression/hostname.c line 106 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We use TEST OK typically.

Done.


libos/test/regression/hostname_pass_etc.manifest.template line 9 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe as a test also add smth like this:

  { path = "/etc/hostname", uri = "file:{{ binary_dir }}/hostname" }, # dummy file to check that `libos.passthrough_etc_files` overrides

Done.


libos/test/regression/hostname_pass_etc.manifest.template line 14 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why? Just remove this.

Done.


libos/test/regression/test_libos.py line 8 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would prefer

import socket

... socket.gethostname() ...

It's more clear to me when I read Python code.

Done.


libos/test/regression/test_libos.py line 1125 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Hm, the third argument (the empty string) relies on the fact that the generic manifest (manifest.template) doesn't have smth like fs.mounts = [ { path = "/etc", uri = "file:/etc" } ]. I would add a comment here, this is not obvious (and may change in the future).

Done.


libos/test/regression/test_libos.py line 1125 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why is it always the case that the second argument is localhost? Can't gethostname() return something real (because we unconditionally call libos_set_hostname in LibOS)?

I'll need to look at the PAL implementation... UPDATE: Ok, you set g_pal_public_state->hostname = "localhost" during PAL init time.

ACK.


pal/include/host/linux-common/host_info.h line 11 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Do we really a new header file for this? Or there will be more functions in follow-up PRs?

Otherwise I would just stash this func declaration in the already-existing file linux_utils.h.

UPDATE: Ok, you should move stuff from topo_info.h to this new file, since you consider the name "host_info" more generic (based on your PAL C code).

Yes, I will add here rest of functions corresponding to host_info (like resolv.conf ect.).


pal/src/host/linux/pal_main.c line 134 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please make it more specific, that the failure happened while parsing libos.passthrough_etc_files

UPDATE: in the linux-sgx PAL, you write this: log_error("Cannot parse 'libos.passthrough_etc_files'");

Done.


pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Any reason why you moved it here?

No not really.


pal/src/host/linux-common/host_info.c line 7 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Hm, do you want to put stuff from topo_info.c here? Because that file also "exposes host information".

Alternatively, let's keep them separate, but please rename the files to something more specific, like etc_host_info.

Done.


pal/src/host/linux-common/host_info.c line 13 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do you need this include?

We don't it was when I have actually read /etc/hostname.


pal/src/host/linux-common/host_info.c line 23 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Meh, can we return some UNIX error code? Actually, why not the error code from uname() itself?

You mean return value or errno?


pal/src/host/linux-sgx/host_main.c line 18 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Seems not sorted

Done.


pal/src/host/linux-sgx/host_main.c line 920 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

additional

Done.


pal/src/host/linux-sgx/pal_main.c line 250 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please explain the sanitization rules in a top-level comment. For example, why there is a limit on 63 chars?

I assumed that hostname has to be a valid domain.
And this is a little messy:

https://www.rfc-editor.org/rfc/rfc2181

The length of any one label is limited to between 1 and 63 octets.  A full domain name is limited to 255 octets (including the separators). 

https://www.ietf.org/rfc/rfc0952.txt

A "name" (Net, Host, Gateway, or Domain name) is a text string up o 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.).  Note that periods are only allowed when they serve to delimit components of "domain style names". [...] The first character must be an alpha character. 

https://www.rfc-editor.org/rfc/rfc1123

The DNS defines domain name syntax very generally -- a string of labels each containing up to 63 8-bit octets, separated by dots, and with a maximum total of 255 octets.  Particular applications of the DNS are permitted to further constrain the syntax of the domain names they use, although the DNS deployment has led to some applications allowing more general names.  In particular, Section 2.1 of this document liberalizes slightly the syntax of a legal Internet host name that was defined in RFC-952 [DNS:4].

Microsoft even states the size is 253:
https://web.archive.org/web/20190518124533/https://devblogs.microsoft.com/oldnewthing/?p=7873

So rules I have applied:

  • Hostname up to 255 characters (including '\0'), so 254 + '\0'
  • Single label up to 63
  • Not starting or ending with . or -
  • Character subset: A-Z, a-z, 0-9, -, .

pal/src/host/linux-sgx/pal_main.c line 257 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Aren't you supposed to also check that ptr - hostname > PAL_HOSTNAME_MAX and fail if it is?

Otherwise the malicious host can force this enclave code into an infinite loop.

Done.


pal/src/host/linux-sgx/pal_main.c line 637 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Since we are here, could you also print the ret value?

Done.


pal/src/host/linux-sgx/pal_main.c line 693 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would move topo_info initialization to here as well, otherwise you have very similar ...only for the first process code snippets.

Done.

Code quote:

init_passthrough_etc_files

pal/src/host/linux-sgx/pal_misc.c line 25 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why?

Done.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 18 files at r2, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)


Documentation/manifest-syntax.rst line 153 at r2 (raw file):

- ``/etc/hostname``

This option takes precedence over ``fs.mount``.

fs.mounts (forgot s)


Documentation/manifest-syntax.rst line 154 at r2 (raw file):

This option takes precedence over ``fs.mount``.
This means that files provided in it will be overridden with the ones

files provided in it -> etc files provided via fs.mounts


libos/src/libos_init.c line 427 at r2 (raw file):

    RUN_INIT(init_mount);
    RUN_INIT(init_std_handles);
    /* The init_mount_etcfs takes precedence over users fs.mounts, and because of that,

users -> user's


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

And during lookup for a given mountpoint we are doing:
lookup function->pseudo_lookup->pseudo_find->pseudo_find_root, we find "hostname" pseudofile.

Ah, that's the sequence I was missing! So this "rootless" file is actually a "root-by-itself", and thus will be found during the lookup over pseudo-files.

Just to be sure, @pwmarcz, was that the intention with such "rootless" pseudo-files? Or are we abusing the implementation?


libos/src/fs/etc/fs.c line 41 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

The path is where we will mount the file, and the uri is the file we want to mount.
In this case we are mounting a file hostname from the pseudofs which we just created in init_etc function.

Yes, now I get it. Though the uri field name is misleading here -- URIs are supposed to be always PAL (host) backed objects, not objects emulated inside the LibOS. But whatever.


libos/src/fs/etc/fs.c line 56 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

I assumed that in theory we don't know if it the same manifest, and we don't trust the host.
Or we are reading somehow exactly the same manifest file?

Yes, the child reads the same manifest file.

That's because the child has the same MRENCLAVE as the parent. And MRENCLAVE reflects the complete manifest file. And the parent and the child perform mutual SGX-based attestation during fork, so that they make sure that their peer's MRENCLAVE is the same as their own.

In the end, this chain of reasoning leads to the claim that "parent and child enclaves are guaranteed to use the exact same manifest file".

So you don't need to bother about this boolean.


libos/src/sys/libos_uname.c line 21 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Not really. Please notice that if you do sethostname, and fork the hostname will be different in child and parent after fork (this is described detailed in PR description). This is one of the issues the value of /etc/hostname is not corresponding to the current name of hostname.

Ah, yes, you're right. I confused the two objects:

  • On LibOS initialization, we set hostname to the value of g_pal_public_state->hostname (which is inited by PAL in main process or inherited in checkpoint in child processes)
  • On sethostname(), we update hostname via g_current_uname.nodename (this object is not tied to the above one, and not checkpointed).

So the parent process may have updated g_current_uname.nodename (which is the source for gethostname) but this change is not reflected in g_pal_public_state->hostname. And the child received the latter string in the checkpoint, so the child returns this unchanged string in gethostname. So here's a discrepancy.

Meh, this is a mess with sethostname(). Whatever, no sane app should use it anyway, and we already have a comment about this wrong behavior.


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

I don't know to be honest.
This was introduced by @mkow in bcca391.

Let's wait for @mkow's review then.


libos/test/regression/hostname_pass_etc.manifest.template line 10 at r2 (raw file):

  { path = "/lib", uri = "file:{{ gramine.runtimedir(libc) }}" },
  { path = "/hostname", uri = "file:{{ binary_dir }}/hostname" },
  # dummy file to check that `libos.passthrough_etc_files` overrides

...overrides /etc/hostname unconditionally


libos/test/regression/test_libos.py line 1124 at r2 (raw file):

    def test_070_hostname_localhost(self):
        # The manifest doesn't contain etc passthrough.

The manifest -> The generic manifest (manifest.template)


pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

No not really.

Can we move it back? In the future, when we'll look at this commit, we'll scratch our heads -- why did we move this?


pal/src/host/linux-sgx/pal_main.c line 250 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

I assumed that hostname has to be a valid domain.
And this is a little messy:

https://www.rfc-editor.org/rfc/rfc2181

The length of any one label is limited to between 1 and 63 octets.  A full domain name is limited to 255 octets (including the separators). 

https://www.ietf.org/rfc/rfc0952.txt

A "name" (Net, Host, Gateway, or Domain name) is a text string up o 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.).  Note that periods are only allowed when they serve to delimit components of "domain style names". [...] The first character must be an alpha character. 

https://www.rfc-editor.org/rfc/rfc1123

The DNS defines domain name syntax very generally -- a string of labels each containing up to 63 8-bit octets, separated by dots, and with a maximum total of 255 octets.  Particular applications of the DNS are permitted to further constrain the syntax of the domain names they use, although the DNS deployment has led to some applications allowing more general names.  In particular, Section 2.1 of this document liberalizes slightly the syntax of a legal Internet host name that was defined in RFC-952 [DNS:4].

Microsoft even states the size is 253:
https://web.archive.org/web/20190518124533/https://devblogs.microsoft.com/oldnewthing/?p=7873

So rules I have applied:

  • Hostname up to 255 characters (including '\0'), so 254 + '\0'
  • Single label up to 63
  • Not starting or ending with . or -
  • Character subset: A-Z, a-z, 0-9, -, .

Please add the "rules I have applied" list as the top-level comment. And please add the links you provided in the same comment.

Copy link
Contributor

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

And during lookup for a given mountpoint we are doing:
lookup function->pseudo_lookup->pseudo_find->pseudo_find_root, we find "hostname" pseudofile.

Ah, that's the sequence I was missing! So this "rootless" file is actually a "root-by-itself", and thus will be found during the lookup over pseudo-files.

Just to be sure, @pwmarcz, was that the intention with such "rootless" pseudo-files? Or are we abusing the implementation?

Yes, that was the intended use. But I can see how that might be hard to find out.

The problem, I guess, is that the pseudo-fs abuses node name as an identifier by which you mount a node.

One way to modify this would be to add an explicit "register" step:

struct pseudo_node* hostname = pseudo_add_str(NULL, "hostname", &provide_etc_hostname);
pseudo_add_root("etc-hostname", hostname);

...

ret = mount_fs(&(struct libos_mount_params){
    .type = "pseudo",
    .path = "/etc/hostname",
    .uri = "etc-hostname",
});

Then, the pseudo-filesystem would maintain an explicit identifier-to-node mapping. More verbose, but perhaps less confusing. I'm not sure.

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 23 of 30 files reviewed, 14 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)


Documentation/manifest-syntax.rst line 153 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

fs.mounts (forgot s)

Done.


Documentation/manifest-syntax.rst line 154 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

files provided in it -> etc files provided via fs.mounts

Done.


libos/src/libos_init.c line 427 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

users -> user's

Done.


libos/src/fs/etc/fs.c line 56 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, the child reads the same manifest file.

That's because the child has the same MRENCLAVE as the parent. And MRENCLAVE reflects the complete manifest file. And the parent and the child perform mutual SGX-based attestation during fork, so that they make sure that their peer's MRENCLAVE is the same as their own.

In the end, this chain of reasoning leads to the claim that "parent and child enclaves are guaranteed to use the exact same manifest file".

So you don't need to bother about this boolean.

Done.


libos/src/sys/libos_uname.c line 21 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ah, yes, you're right. I confused the two objects:

  • On LibOS initialization, we set hostname to the value of g_pal_public_state->hostname (which is inited by PAL in main process or inherited in checkpoint in child processes)
  • On sethostname(), we update hostname via g_current_uname.nodename (this object is not tied to the above one, and not checkpointed).

So the parent process may have updated g_current_uname.nodename (which is the source for gethostname) but this change is not reflected in g_pal_public_state->hostname. And the child received the latter string in the checkpoint, so the child returns this unchanged string in gethostname. So here's a discrepancy.

Meh, this is a mess with sethostname(). Whatever, no sane app should use it anyway, and we already have a comment about this wrong behavior.

Yes, Borys was suggesting to remove sethostname(2), but I guess we have to discuss this further.


libos/test/regression/hostname_pass_etc.manifest.template line 10 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

...overrides /etc/hostname unconditionally

Done.


libos/test/regression/test_libos.py line 1124 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

The manifest -> The generic manifest (manifest.template)

Done.


pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we move it back? In the future, when we'll look at this commit, we'll scratch our heads -- why did we move this?

Oh, duh- Now I remember why I move it :)
I move get_host_info to be after we initialise manifest.root.
Topology don't need any information from manifest so it could be before it, but the etc passthrough requires it.


pal/src/host/linux-sgx/pal_main.c line 250 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please add the "rules I have applied" list as the top-level comment. And please add the links you provided in the same comment.

Done.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @oshogbo)


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Yes, that was the intended use. But I can see how that might be hard to find out.

The problem, I guess, is that the pseudo-fs abuses node name as an identifier by which you mount a node.

One way to modify this would be to add an explicit "register" step:

struct pseudo_node* hostname = pseudo_add_str(NULL, "hostname", &provide_etc_hostname);
pseudo_add_root("etc-hostname", hostname);

...

ret = mount_fs(&(struct libos_mount_params){
    .type = "pseudo",
    .path = "/etc/hostname",
    .uri = "etc-hostname",
});

Then, the pseudo-filesystem would maintain an explicit identifier-to-node mapping. More verbose, but perhaps less confusing. I'm not sure.

Thanks for explanations. I don't think the proposed code snippet helps that much. Anyway, now at least two people know how it works :)


pal/include/host/linux-common/host_info.h line 11 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Yes, I will add here rest of functions corresponding to host_info (like resolv.conf ect.).

Ok, then you shouldn't move stuff from topo_info.h into here. Resolving.


pal/src/host/linux-common/host_info.c line 23 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

You mean return value or errno?

No, errno is not available here. The DO_SYSCALL() macro calls the raw syscall, not the Glibc wrapper around it (recall that the Glibc wrapper moves the error code from the raw syscall result to errno global variable).

So you should simply do like this:

    int ret = DO_SYSCALL(uname, &c_uname);
    if (ret < 0)
        return ret;

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 29 of 30 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)


pal/src/host/linux-common/host_info.c line 23 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

No, errno is not available here. The DO_SYSCALL() macro calls the raw syscall, not the Glibc wrapper around it (recall that the Glibc wrapper moves the error code from the raw syscall result to errno global variable).

So you should simply do like this:

    int ret = DO_SYSCALL(uname, &c_uname);
    if (ret < 0)
        return ret;

Done.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @oshogbo)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 30 files at r1, 6 of 18 files at r2, 3 of 7 files at r3, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @oshogbo, and @pwmarcz)


-- commits line 27 at r4:
I'd rephrase this whole section when rebasing, all these super short sentences read weirdly to me.


Documentation/manifest-syntax.rst line 139 at r4 (raw file):

manifest, the application will get ``argv = [ <libos.entrypoint value> ]``.

etc passthrough

Suggestion:

``/etc``

Documentation/manifest-syntax.rst line 147 at r4 (raw file):

    (Default: false)

This specifies whether to passthrough extra runtime files from ``/etc``.

Suggestion:

from host's ``/etc``

Documentation/manifest-syntax.rst line 147 at r4 (raw file):

    (Default: false)

This specifies whether to passthrough extra runtime files from ``/etc``.

Just a note: this will need to be rephrased after we start supporting non-Linux hosts (it'll be generated there based on host configs).

Code quote:

This specifies whether to passthrough extra runtime files from ``/etc``.

libos/include/libos_internal.h line 156 at r4 (raw file):

long pal_to_unix_errno(long err);

long set_hostname(char* name, size_t len);

We return error codes via int in our codebase

Suggestion:

int set_hostname

libos/src/fs/etc/fs.c line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks for explanations. I don't think the proposed code snippet helps that much. Anyway, now at least two people know how it works :)

So, do I understand correctly, that right now "hostname" string passed to pseudo_add_str() is both the filename and a global identifier by which you can mount this node?
If that's the case, then I really like the idea with explicit registration, as the filenames can collide quite easily after we add support for more files, plus the current way is quite confusing.


libos/src/fs/etc/fs.c line 41 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, now I get it. Though the uri field name is misleading here -- URIs are supposed to be always PAL (host) backed objects, not objects emulated inside the LibOS. But whatever.

+1, this was also confusing to me, but see the other discussion.


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

nodename doesn't need to be null-terminated

But that's exactly why we need >=? We need to have space in the buffer to append that NULL byte ourselves. If len == sizeof(g_current_uname.nodename) then the memset below will overflow the buffer.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @oshogbo)


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

both the filename and a global identifier

No, it's only a global identifier. I don't know what you mean exactly by "the filename". Actually, it could be anything, we can rename "hostname" -> "hostname_backing_pseudo_file". And then use this renamed string in init_mount_etcfs() below. Well, maybe we should do it?


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

We need to have space in the buffer to append that NULL byte ourselves

Why do we need to append the NULL byte? We're emulating sethostname() syscall, and it may return a non-NULL-terminated string.

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 26 of 30 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)


Documentation/manifest-syntax.rst line 139 at r4 (raw file):

manifest, the application will get ``argv = [ <libos.entrypoint value> ]``.

etc passthrough

Done.


Documentation/manifest-syntax.rst line 147 at r4 (raw file):

    (Default: false)

This specifies whether to passthrough extra runtime files from ``/etc``.

Done.


libos/include/libos_internal.h line 156 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

We return error codes via int in our codebase

Done.


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

hostname_backing_pseudo_file
Maybe something like that?

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @oshogbo)


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

hostname_backing_pseudo_file
Maybe something like that?

I'm fine with this rename.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 30 files at r1, 3 of 18 files at r2, 3 of 7 files at r3, 1 of 1 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @oshogbo)


Documentation/manifest-syntax.rst line 139 at r4 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.

What about the slash?


libos/src/fs/etc/fs.c line 28 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm fine with this rename.

Ah, I misread the call below, the mount path specifies the file name. Then I'm fine with the current version.


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

Why do we need to append the NULL byte?

Because that's what Linux does, although the implementation is a bit obfuscated: https://elixir.bootlin.com/linux/v5.19.3/source/kernel/sys.c#L1355

it may return a non-NULL-terminated string.

It can't.


libos/test/regression/hostname.c line 43 at r5 (raw file):

    if (strcmp(buf, expected_name) != 0) {
        errx(1, "%s gethostname doesn't match hostname (expected: %s, got: %s)",

Suggestion:

gethostname result doesn't

libos/test/regression/hostname.c line 44 at r5 (raw file):

    if (strcmp(buf, expected_name) != 0) {
        errx(1, "%s gethostname doesn't match hostname (expected: %s, got: %s)",
               tag, expected_name, buf);

please fix alignment


libos/test/regression/hostname.c line 55 at r5 (raw file):

    /*
     * If the etc expected name was not provided, assume that etc shouldn't exist.

Suggestion:

assume that /etc/hostname shouldn't exist

libos/test/regression/hostname.c line 68 at r5 (raw file):

    }

    int ret = read(fd, buf, sizeof(buf));

Missing proper retval checks


libos/test/regression/hostname.c line 74 at r5 (raw file):

    /*
     * Sometimes etc hostname might have a trailing '\n', gramine is removing it,

Suggestion:

/etc/hostname

libos/test/regression/hostname.c line 74 at r5 (raw file):

    /*
     * Sometimes etc hostname might have a trailing '\n', gramine is removing it,

Suggestion:

Gramine

libos/test/regression/hostname.c line 84 at r5 (raw file):

    if (strcmp(buf, expected_name) != 0) {
        err(1, "%s /etc/hostname don't have a expected value (expected: %s, got: %s)",
               tag, expected_name, buf);

please fix alignment


libos/test/regression/hostname_pass_etc.manifest.template line 11 at r5 (raw file):

  { path = "/hostname", uri = "file:{{ binary_dir }}/hostname" },
  # dummy file to check that `libos.passthrough_etc_files` overrides
  # /etc/hosntame unconditionally

Suggestion:

hostname

pal/src/pal_main.c line 24 at r5 (raw file):

    /* Enable log to catch early initialization errors; it will be overwritten in pal_main(). */
    .log_level = PAL_LOG_DEFAULT_LEVEL,
    .hostname = "localhost"

Suggestion:

.hostname = "localhost",

pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Oh, duh- Now I remember why I move it :)
I move get_host_info to be after we initialise manifest.root.
Topology don't need any information from manifest so it could be before it, but the etc passthrough requires it.

Uh, this is quite a bold move :)
@oshogbo, @dimakuv: Please verify that this is correct, i.e. none of the initializations in those 100 lines between the old and new positions relies on topology info being already initialized. I didn't find any place, but it looks dangerous, so we need to be very careful with it.


pal/src/host/linux/pal_main.c line 135 at r5 (raw file):

    /* Get host information only for the first process. This information will be
     * checkpointed and restored during forking of the child process(es). */
    if (!first_process) {

Why not move this to the top of this function?


pal/src/host/linux/pal_main.c line 147 at r5 (raw file):

    if (get_hostname(g_pal_public_state.hostname,
                          sizeof(g_pal_public_state.hostname)) < 0) {

please fix alignment


pal/include/host/linux-common/etc_host_info.h line 10 at r5 (raw file):

#include "pal.h"

/* Function to fetch the hostname */

I'd skip this comment, it's literally what the function name already says ;)


pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

#include <linux/utsname.h>

#include "etc_host_info.h"

this should be in the first, separate include group


pal/src/host/linux-common/etc_host_info.c line 19 at r5 (raw file):

    int ret;

    assert(hostname != NULL);

I'd drop this, there's no point in this check


pal/src/host/linux-common/etc_host_info.c line 20 at r5 (raw file):

    assert(hostname != NULL);
    assert(size > 0);

Please move this to right before hostname[size - 1] so it's obvious why it's needed

Code quote:

hostname[size - 1]

pal/src/host/linux-common/etc_host_info.c line 26 at r5 (raw file):

        return ret;

    size_t len = strlen(c_uname.nodename) + 1;

this is the size of the string, not length

Code quote:

size_t len = strlen(c_uname.nodename) + 1;

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)


Documentation/manifest-syntax.rst line 139 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What about the slash?

Done.


libos/test/regression/hostname.c line 43 at r5 (raw file):

    if (strcmp(buf, expected_name) != 0) {
        errx(1, "%s gethostname doesn't match hostname (expected: %s, got: %s)",

Done.


libos/test/regression/hostname.c line 44 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please fix alignment

Done.


libos/test/regression/hostname.c line 55 at r5 (raw file):

    /*
     * If the etc expected name was not provided, assume that etc shouldn't exist.

Done.


libos/test/regression/hostname.c line 68 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Missing proper retval checks

Done.


libos/test/regression/hostname.c line 74 at r5 (raw file):

    /*
     * Sometimes etc hostname might have a trailing '\n', gramine is removing it,

Done.


libos/test/regression/hostname.c line 74 at r5 (raw file):

    /*
     * Sometimes etc hostname might have a trailing '\n', gramine is removing it,

Done.


libos/test/regression/hostname.c line 84 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please fix alignment

Done.


libos/test/regression/hostname_pass_etc.manifest.template line 11 at r5 (raw file):

  { path = "/hostname", uri = "file:{{ binary_dir }}/hostname" },
  # dummy file to check that `libos.passthrough_etc_files` overrides
  # /etc/hosntame unconditionally

Done.


pal/src/pal_main.c line 24 at r5 (raw file):

    /* Enable log to catch early initialization errors; it will be overwritten in pal_main(). */
    .log_level = PAL_LOG_DEFAULT_LEVEL,
    .hostname = "localhost"

Done.


pal/src/host/linux/pal_main.c line 135 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not move this to the top of this function?

Because @dimakuv asked me to not propagate the libos.passthrough_etc_files to the child through checkpoint mechanisms, and get it from manifest file. I think that this value should be synced between the process.


pal/src/host/linux/pal_main.c line 147 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please fix alignment

Done.


pal/include/host/linux-common/etc_host_info.h line 10 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd skip this comment, it's literally what the function name already says ;)

Done.


pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

this should be in the first, separate include group

Done.


pal/src/host/linux-common/etc_host_info.c line 19 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd drop this, there's no point in this check

Done.


pal/src/host/linux-common/etc_host_info.c line 20 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please move this to right before hostname[size - 1] so it's obvious why it's needed

Done.


pal/src/host/linux-common/etc_host_info.c line 26 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

this is the size of the string, not length

Done.

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed tests dsoen't seem to be related to recent changes:

test_ltp.py:196: in test_ltp
    returncode, stdout, _stderr = run_command(full_cmd, timeout=timeout, can_fail=True)
../../../usr/lib/python3.6/site-packages/graminelibos/regression.py:148: in run_command
    raise AssertionError('Command {} timed out after {} s'.format(cmd, timeout))
E   AssertionError: Command ['gramine-direct', 'fdatasync01'] timed out after 30 s

Reviewable status: 23 of 30 files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 30 files at r1, 5 of 18 files at r2, 1 of 7 files at r3, 3 of 4 files at r5, 4 of 7 files at r6, all commit messages.
Reviewable status: 27 of 30 files reviewed, 41 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):
Why this whole PR talks about passthrough, where in fact we just expose some well defined structs on PAL level and emulate /etc completely in LibOS?


a discussion (no related file):
Actually why do we even emulate /etc/hostname? What uses it? From what I understand this is just a configuration file for early init scripts on a system, not runtime network configuration (like /etc/resolv.conf)



-- commits line 27 at r4:

Previously, mkow (Michał Kowalczyk) wrote…

I'd rephrase this whole section when rebasing, all these super short sentences read weirdly to me.

+1


-- commits line 20 at r6:

Suggestion:

"/etc"

Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Just a note: this will need to be rephrased after we start supporting non-Linux hosts (it'll be generated there based on host configs).

Why don't we name it accordingly from the beginning? Like this is not a passthrough, it's just an emulated "pseudo file" (which is a bit different from Linux /etc/hostname which is just a simple file).


libos/src/libos_init.c line 428 at r6 (raw file):

    RUN_INIT(init_std_handles);
    /* The init_mount_etcfs takes precedence over user's fs.mounts, and because of that,
     * it has to be called after init_mount. */

Why it's not called inside init_mount()? Now you mount over /etc/hostname in child even though it's already mounted (mounts are checkpointed), this stacks with each child...
Please rename this to mount_etcfs() and call in init_mount() (at the end probably)


libos/src/libos_init.c line 493 at r6 (raw file):

    RUN_INIT(init_sync_client);

    RUN_INIT(set_hostname, g_pal_public_state->hostname,

I don't like this - we should be keeping the list of global init functions as small as possible, it's already PITA to work with this code.
Unfortunately I don't know a good place for this initialization...


libos/src/libos_init.c line 493 at r6 (raw file):

    RUN_INIT(init_sync_client);

    RUN_INIT(set_hostname, g_pal_public_state->hostname,

Shouldn't this be done only in the first process?


libos/src/libos_init.c line 494 at r6 (raw file):

    RUN_INIT(set_hostname, g_pal_public_state->hostname,
             strlen(g_pal_public_state->hostname) + 1);

Suggestion:

 )

libos/src/fs/etc/fs.c line 8 at r6 (raw file):

/*
 * This file contains the implementation of `etc` FS.
 * LibOS assumes that contents of all etc files were already sanitized.

Why do we even have a notion of etc files here. We get info from PAL in the for of our choice, it has nothing to do with Linux /etc files.


libos/src/fs/etc/fs.c line 18 at r6 (raw file):

    __UNUSED(dent);
    /* Use the string (without null terminator) as file data */
    size_t size = strlen(g_pal_public_state->hostname);

Why we read this not g_current_uname->nodeinfo?


libos/src/fs/etc/fs.c line 28 at r6 (raw file):

}

int init_etcfs(void) {

Why this is a separate step from init_mount_etcfs()?


libos/src/fs/etc/fs.c line 29 at r6 (raw file):

int init_etcfs(void) {
    pseudo_add_str(NULL, "etc-passthrough-hostname", &provide_etc_hostname);

Ok, I've read the other discussion, so now I know how this works, but is it really the intended way of mounting single pseudo files? Looks like an abuse to me (especially since the list is called g_pseudo_roots)... But it all seems to be working correctly.
(Not blocking, this is not really related to this PR).


libos/src/fs/etc/fs.c line 39 at r6 (raw file):

        return 0;

    ret = mount_fs(&(struct libos_mount_params){

Suggestion:

return mount_fs

libos/src/fs/etc/fs.c line 56 at r6 (raw file):

    /* Propagate hostname */
    size_t off = ADD_CP_OFFSET(sizeof(g_pal_public_state->hostname));

Why do you use sizeof() here and strlen in all other places?

Update: ok, I think I know - not to send the actual size in checkpoint.


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

    const char* hostname = (const char*)(base + GET_CP_FUNC_ENTRY());
    memcpy(&g_pal_public_state->hostname, hostname, sizeof(g_pal_public_state->hostname));

Ughh, lets not overwrite g_pal_public_state. The fact that topology code does that is IMO a bad precedent, we shouldn't have done it that way...
Also this is not read by uname, so it's wrong.


libos/src/sys/libos_uname.c line 43 at r6 (raw file):

}

int set_hostname(char* name, size_t len) {

Suggestion:

const char* name

libos/src/sys/libos_uname.c line 54 at r6 (raw file):

long libos_syscall_sethostname(char* name, int len) {
    if (len < 0 || (size_t)len >= sizeof(g_current_uname.nodename))

This check is now redundant

Code quote:

(size_t)len >= sizeof(g_current_uname.nodename)

pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Uh, this is quite a bold move :)
@oshogbo, @dimakuv: Please verify that this is correct, i.e. none of the initializations in those 100 lines between the old and new positions relies on topology info being already initialized. I didn't find any place, but it looks dangerous, so we need to be very careful with it.

+1, please restore the old version. I don't see a reason for having a function for initializing both together (topo and hostname are quite unrelated). IMO just restore old topo init code and initialize hostname separately.


pal/src/host/linux-sgx/pal_ecall_types.h line 22 at r6 (raw file):

typedef struct {
    struct pal_topo_info  topo_info;
    bool                  passthrough_etc;

Why this is part of an ecall? We can get that info from a trusted source inside enclave (measured manifest).


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

    bool                  passthrough_etc;
    char                  hostname[PAL_HOSTNAME_MAX];
} pal_host_info_t;

I don't like this stashing topo and etc together. This causes some unnecessary changes to topo code, weird splits of sanitization and generally seems unnecessary. Why not handle those two things together? You only save one sgx_copy_to_enclave with the current changes, but introduce other, unnecessary complexity.

cc @dimakuv @mkow


pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.

Wait, no, we include host/standard headers first. Current order is wrong.


pal/src/host/linux-common/etc_host_info.c line 24 at r6 (raw file):

    size_t node_size = strlen(c_uname.nodename) + 1;
    memcpy(hostname, &c_uname.nodename,

Why do you sometimes use &c_uname.nodename and sometimes c_uname.nodename? Please unify


pal/src/host/linux-common/etc_host_info.c line 25 at r6 (raw file):

    size_t node_size = strlen(c_uname.nodename) + 1;
    memcpy(hostname, &c_uname.nodename,
           node_size > size ? size : node_size);

You'll need api.h for this

Suggestion:

MIN(node_size, size)

pal/src/host/linux-common/etc_host_info.c line 28 at r6 (raw file):

    assert(size > 0);
    hostname[size - 1] = '\0';

Suggestion:

0

dimakuv
dimakuv previously approved these changes Aug 24, 2022
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LTP failure looks unrelated indeed. Ignore it. We've seen it before.

Reviewed 7 of 7 files at r6, all commit messages.
Reviewable status: all files reviewed, 39 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why this whole PR talks about passthrough, where in fact we just expose some well defined structs on PAL level and emulate /etc completely in LibOS?

passthrough in the sense of "this file is directly passed from the host (and sanitized) to the app", no matter what is written in the manifest file.

The usage of this keyword conflicts a bit with our typical definition of "this file is directly passed without any sanitization from host to app, if this file is specified in the manifest file".

What would be the better term for this PR? First iteration of etc sanitization?



libos/src/fs/etc/fs.c line 29 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Ok, I've read the other discussion, so now I know how this works, but is it really the intended way of mounting single pseudo files? Looks like an abuse to me (especially since the list is called g_pseudo_roots)... But it all seems to be working correctly.
(Not blocking, this is not really related to this PR).

Indeed, everyone was confused by this :) Not related to this PR for sure.


libos/src/sys/libos_uname.c line 44 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why do we need to append the NULL byte?

Because that's what Linux does, although the implementation is a bit obfuscated: https://elixir.bootlin.com/linux/v5.19.3/source/kernel/sys.c#L1355

it may return a non-NULL-terminated string.

It can't.

OK, got it


pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

+1, please restore the old version. I don't see a reason for having a function for initializing both together (topo and hostname are quite unrelated). IMO just restore old topo init code and initialize hostname separately.

I'm fine with @boryspoplawski suggestion to split the code into two, leaving topo-initialization as is and adding hostname-initialization separately.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I don't like this stashing topo and etc together. This causes some unnecessary changes to topo code, weird splits of sanitization and generally seems unnecessary. Why not handle those two things together? You only save one sgx_copy_to_enclave with the current changes, but introduce other, unnecessary complexity.

cc @dimakuv @mkow

I don't mind either way.

If we decide to split, then we should split into struct pal_topo_info* ms_topo_info and struct pal_etc_info* ms_etc_info, where struct pal_etc_info* will currently contain only one hostname field. Later this new struct will be populated with more etc-related fields.

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 39 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)


Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why don't we name it accordingly from the beginning? Like this is not a passthrough, it's just an emulated "pseudo file" (which is a bit different from Linux /etc/hostname which is just a simple file).

@mkow, @dimakuv What do you think? Do you want to change the name?


libos/src/libos_init.c line 428 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why it's not called inside init_mount()? Now you mount over /etc/hostname in child even though it's already mounted (mounts are checkpointed), this stacks with each child...
Please rename this to mount_etcfs() and call in init_mount() (at the end probably)

Done.


libos/src/libos_init.c line 493 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Shouldn't this be done only in the first process?

No, this also is called after the checkpoint is received, because of that we don't have to call this in checkpoint/restore function.


libos/src/fs/etc/fs.c line 8 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why do we even have a notion of etc files here. We get info from PAL in the for of our choice, it has nothing to do with Linux /etc files.

Done.


libos/src/fs/etc/fs.c line 18 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why we read this not g_current_uname->nodeinfo?

Because the status of /etc/hostname and status of uname are two diffrent things.
If you will call sethostname it won't change the value of /etc/hostname, but it will change the value of uname.


libos/src/fs/etc/fs.c line 28 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why this is a separate step from init_mount_etcfs()?

Done.


libos/src/fs/etc/fs.c line 29 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Indeed, everyone was confused by this :) Not related to this PR for sure.

Yes it was confirmed with Pawel.
Done.


libos/src/fs/etc/fs.c line 39 at r6 (raw file):

        return 0;

    ret = mount_fs(&(struct libos_mount_params){

There will be more files in these mount with next PRs, so I can change that if you really want but next PR will override this.


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Ughh, lets not overwrite g_pal_public_state. The fact that topology code does that is IMO a bad precedent, we shouldn't have done it that way...
Also this is not read by uname, so it's wrong.

The state of /etc/hostname file is not the same as uname, so we have to save them separately.
Also the uname reads its later on in the init phase (via set_hostname).


libos/src/sys/libos_uname.c line 43 at r6 (raw file):

}

int set_hostname(char* name, size_t len) {

Done.


libos/src/sys/libos_uname.c line 54 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This check is now redundant

It's not, because of the order of checks.
If we will check this later we might return the -EFAULT error first which is not what LTP test expects us.


pal/src/host/linux/pal_main.c line 424 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm fine with @boryspoplawski suggestion to split the code into two, leaving topo-initialization as is and adding hostname-initialization separately.

Done.


pal/src/host/linux-sgx/pal_ecall_types.h line 22 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why this is part of an ecall? We can get that info from a trusted source inside enclave (measured manifest).

I thought, we will just parse this information once, instead of doing this multiple time.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I don't mind either way.

If we decide to split, then we should split into struct pal_topo_info* ms_topo_info and struct pal_etc_info* ms_etc_info, where struct pal_etc_info* will currently contain only one hostname field. Later this new struct will be populated with more etc-related fields.

I just assumed that this is a logical group, as topo is also a "host_information" and I didn't want to grow the number of parameters in ecall.


pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Wait, no, we include host/standard headers first. Current order is wrong.

According to the docs:

Includes should be grouped and then sorted lexicographically. Groups should be separated using a single empty line.

Groups:

Matching .h header for .c files.
Standard library headers.
Non-standard headers not included in Gramine’s repository (e.g. from external dependencies, like curl.h).
Gramine’s headers.

So either this is the second group or this is UB.
https://gramine.readthedocs.io/en/stable/devel/coding-style.html

Code quote (from pal/src/host/linux/pal_main.c):

    /* Get host topology information only for the first process. This information will be
     * checkpointed and restored during forking of the child process(es). */
    if (first_process) {
        ret = get_topology_info(&g_pal_public_state.topo_info);
        if (ret < 0)
            INIT_FAIL("get_topology_info() failed: %d", ret);
    }

pal/src/host/linux-common/etc_host_info.c line 24 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why do you sometimes use &c_uname.nodename and sometimes c_uname.nodename? Please unify

Done.


pal/src/host/linux-common/etc_host_info.c line 25 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

You'll need api.h for this

Done.


pal/src/host/linux-common/etc_host_info.c line 28 at r6 (raw file):

    assert(size > 0);
    hostname[size - 1] = '\0';

Done.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r7, all commit messages.
Reviewable status: all files reviewed, 33 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

passthrough in the sense of "this file is directly passed from the host (and sanitized) to the app", no matter what is written in the manifest file.

But it's not directly passed, it's very indirect. We make a copy of that, possibly changing the format and doing sanitization.

What would be the better term for this PR? First iteration of etc sanitization?

I don't care that much about the commit/PR description. My problem is that passthrough is used in the code and comments.



libos/src/libos_init.c line 493 at r6 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

No, this also is called after the checkpoint is received, because of that we don't have to call this in checkpoint/restore function.

But if we ever add checkpointing of uname info, this will break it.


libos/src/fs/libos_fs.c line 660 at r7 (raw file):

    mount_etcfs();

    return 0;

Suggestion:

    return mount_etcfs();

libos/src/fs/etc/fs.c line 28 at r6 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.

?


libos/src/fs/etc/fs.c line 39 at r6 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

There will be more files in these mount with next PRs, so I can change that if you really want but next PR will override this.

Yes, let other PRs change that if they need it. It doesn't make sense as is here.


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

The state of /etc/hostname file is not the same as uname, so we have to save them separately.
Also the uname reads its later on in the init phase (via set_hostname).

Ok, still, I don't like adding another place writing over g_pal_public_state, which was supposed to be constant and read-only.


libos/src/sys/libos_uname.c line 54 at r6 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

It's not, because of the order of checks.
If we will check this later we might return the -EFAULT error first which is not what LTP test expects us.

We do not care about such useless LTP tests, but whatever.


pal/src/host/linux/pal_main.c line 147 at r5 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.

This fits into one line


pal/src/host/linux/pal_main.c line 124 at r7 (raw file):

}

static void get_host_info(bool first_process) {

This is a very ambiguous name...

Code quote:

get_host_info

pal/src/host/linux/pal_main.c line 128 at r7 (raw file):

    ret = toml_bool_in(g_pal_public_state.manifest_root, "libos.passthrough_etc_files",
                       false, &g_pal_public_state.passthrough_etc_files);

Suggestion:

    int ret = toml_bool_in(g_pal_public_state.manifest_root, "libos.passthrough_etc_files", false,
                           &g_pal_public_state.passthrough_etc_files);

pal/src/host/linux/pal_main.c line 431 at r7 (raw file):

    }

    get_host_info(first_process);

Why not

Suggestion:

if (first_process)
    get_host_info();

pal/src/host/linux-sgx/pal_ecall_types.h line 22 at r6 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

I thought, we will just parse this information once, instead of doing this multiple time.

But then you need to sanitize it and you need to make sure it's the same in child as it is in parent. Pointless.
(also the whole manifest is parsed anyway, so we save on one look up time, rather than parsing)


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

I just assumed that this is a logical group, as topo is also a "host_information"

But they are/might be quite different (topo vs etc) w.r.t. layout, copying and sanitization, especially when etc gets extended.

and I didn't want to grow the number of parameters in ecall.

In ecall it's wrapped in a struct anyway. You just added a nesting level instead of another argument, overall number of parameters increased anyway.


pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Matching .h header for .c files.

Huh, I wasn't aware of this rule.

TBH it makes no sense to me, at least in most cases, in some it might, but minority. I would remove it. This is unrelated to this PR, but I think it will still take some time to merge, so we can discuss it while we are here.
@dimakuv @mkow

Why I'm saying it doesn't make sense to me is cases like: https://github.com/gramineproject/gramine/blob/master/libos/src/bookkeep/libos_vma.c#L25 which I believe is the majority of cases. I don't think moving this header on top of that file is better than what we have now.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 8 files at r7, all commit messages.
Reviewable status: all files reviewed, 33 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

passthrough in the sense of "this file is directly passed from the host (and sanitized) to the app", no matter what is written in the manifest file.

But it's not directly passed, it's very indirect. We make a copy of that, possibly changing the format and doing sanitization.

What would be the better term for this PR? First iteration of etc sanitization?

I don't care that much about the commit/PR description. My problem is that passthrough is used in the code and comments.

So about passthrough -> sanitization?



Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

@mkow, @dimakuv What do you think? Do you want to change the name?

Yeah, I'm fine with changing the name. But I still don't know what is suggested instead. I would be ok with libos.sanitize_etc_files or libos.expose_etc_files.


libos/src/libos_init.c line 493 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

But if we ever add checkpointing of uname info, this will break it.

Maybe just put a comment explaining this, and add checkpointing of uname in a follow up PR?


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Ok, still, I don't like adding another place writing over g_pal_public_state, which was supposed to be constant and read-only.

@boryspoplawski What do you suggest? To have a copy of g_pal_public_state->hostname in the LibOS, and initialize this copy in the first process and then checkpoint-restore this copy (instead of overwriting g_pal_public_state->hostname)? And use this copy everywhere where we currently use g_pal_public_state->hostname as the source of hostname?

I'd be fine with this approach.


pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Matching .h header for .c files.

Huh, I wasn't aware of this rule.

TBH it makes no sense to me, at least in most cases, in some it might, but minority. I would remove it. This is unrelated to this PR, but I think it will still take some time to merge, so we can discuss it while we are here.
@dimakuv @mkow

Why I'm saying it doesn't make sense to me is cases like: https://github.com/gramineproject/gramine/blob/master/libos/src/bookkeep/libos_vma.c#L25 which I believe is the majority of cases. I don't think moving this header on top of that file is better than what we have now.

I also wasn't aware of this rule :)

I don't think we follow this rule closely. I would remove it as well.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 33 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

So about passthrough -> sanitization?

Rather emulation. We do not pass&sanitize a file, we gather info in the PAL (it doesn't have to read /etc and definitely could on e.g. Windows) and use that info to emulate /etc/* files in LibOS.



Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yeah, I'm fine with changing the name. But I still don't know what is suggested instead. I would be ok with libos.sanitize_etc_files or libos.expose_etc_files.

Both are not what we actually do. emulate_etc_files maybe? Though something better is preferred :)


libos/src/libos_init.c line 493 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe just put a comment explaining this, and add checkpointing of uname in a follow up PR?

ok


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@boryspoplawski What do you suggest? To have a copy of g_pal_public_state->hostname in the LibOS, and initialize this copy in the first process and then checkpoint-restore this copy (instead of overwriting g_pal_public_state->hostname)? And use this copy everywhere where we currently use g_pal_public_state->hostname as the source of hostname?

I'd be fine with this approach.

Ok, I guess we can leave it as it is, because it will be trivial to fix if we ever want to.

But we should discuss what we do with this problem in general - information that comes from the host (possibly untrusted) and is used by LibOS + checkpointed by LibOS (so it is the same in the child process). Currently we stash it ingo g_pal_public_state and LibOS overwrites it.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r7.
Reviewable status: all files reviewed, 33 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

Rather emulation. We do not pass&sanitize a file, we gather info in the PAL (it doesn't have to read /etc and definitely could on e.g. Windows) and use that info to emulate /etc/* files in LibOS.

+1 to emulation



Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Both are not what we actually do. emulate_etc_files maybe? Though something better is preferred :)

I'm fine with emulate


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Ok, I guess we can leave it as it is, because it will be trivial to fix if we ever want to.

But we should discuss what we do with this problem in general - information that comes from the host (possibly untrusted) and is used by LibOS + checkpointed by LibOS (so it is the same in the child process). Currently we stash it ingo g_pal_public_state and LibOS overwrites it.

Probably split into two objects? g_pal_public_state (guaranteed to be constant) and g_pal_public_initial_state (only used by the parent and copied into LibOS object, which is later checkpointed).

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 30 files at r1, 1 of 18 files at r2, 1 of 7 files at r3, 7 of 7 files at r6, 8 of 8 files at r7, all commit messages.
Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1 to emulation

I didn't complain about this naming because IMO both passthrough and emulation are not really correct here, it's something in between. The contents are passed through, but the file/data format is emulated. So, I'm fine with both names.


a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

Actually why do we even emulate /etc/hostname? What uses it? From what I understand this is just a configuration file for early init scripts on a system, not runtime network configuration (like /etc/resolv.conf)

I'm not sure either, but AFAIR the idea of @oshogbo approach was to first implement a simpler file and get the review feedback, and then resolv.conf.
@dimakuv is the reporter of #689, so let's wait for his opinion on this.



Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm fine with emulate

I'm fine with emulate_etc_files (although see my comment in a similar discussion above).


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Probably split into two objects? g_pal_public_state (guaranteed to be constant) and g_pal_public_initial_state (only used by the parent and copied into LibOS object, which is later checkpointed).

Dima's proposal sounds good, although this should be separate from this PR (and probably low-prio?).


libos/test/regression/hostname.c line 68 at r5 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.

Hmm, I see that other tests have their own copies of such code wrapped into read_all(). Could you try moving it to a common.c or something and deduplicate?


pal/src/host/linux/pal_main.c line 135 at r5 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Because @dimakuv asked me to not propagate the libos.passthrough_etc_files to the child through checkpoint mechanisms, and get it from manifest file. I think that this value should be synced between the process.

Ah, ok, makes sense.


pal/src/host/linux/pal_main.c line 124 at r7 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This is a very ambiguous name...

I think if we keep the corresponding pal_host_info struct then it makes sense. Any suggestion for something better?


pal/src/host/linux/pal_main.c line 431 at r7 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why not

+1


pal/src/host/linux-sgx/host_main.c line 908 at r7 (raw file):

}

static int get_host_info(pal_host_info_t* host_info, int parent_stream_fd) {

ditto - I like Borys' suggestion to move that if outside

Code quote:

int parent_stream_fd

pal/src/host/linux-sgx/pal_ecall_types.h line 22 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

But then you need to sanitize it and you need to make sure it's the same in child as it is in parent. Pointless.
(also the whole manifest is parsed anyway, so we save on one look up time, rather than parsing)

Good point, this should be taken from trusted source, because it's possible to do so.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

If we decide to split, then we should split into struct pal_topo_info* ms_topo_info

What does ms_ mean and why would we use it there?

I just assumed that this is a logical group

+1, although we'll probably change it later anyways, when moving this to g_pal_public_initial_state?


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r7 (raw file):

    bool                  passthrough_etc;
    char                  hostname[PAL_HOSTNAME_MAX];
} pal_host_info_t;

I think we don't use tagged structures in new code?

CC: @boryspoplawski, @dimakuv

Code quote:

pal_host_info_t

pal/src/host/linux-sgx/pal_main.c line 269 at r7 (raw file):

        return false;

    while (*ptr != '\0' && ptr - hostname < PAL_HOSTNAME_MAX) {

Possible out-of-bounds read

Code quote:

*ptr != '\0'

pal/src/host/linux-sgx/pal_main.c line 270 at r7 (raw file):

    while (*ptr != '\0' && ptr - hostname < PAL_HOSTNAME_MAX) {
        if ((*ptr >= 'a' && *ptr <= 'z') ||

I find the following version more readable for range checking: (kindof emulates A <= x <= B from Python)

Suggestion:

'a' <= *ptr && *ptr <= 'z'

pal/src/host/linux-sgx/pal_main.c line 289 at r7 (raw file):

    }

    if (*ptr != '\0')

Also OOB read


pal/src/host/linux-sgx/pal_main.c line 310 at r7 (raw file):

    } else {
        memcpy(g_pal_public_state.hostname, shallow_host_info->hostname,
               sizeof(g_pal_public_state.hostname) - 1);

Please zero the last byte here explicitly to make reasoning about security of this code easier (and make it more refactoring-safe).


pal/src/host/linux-sgx/pal_main.c line 316 at r7 (raw file):

}

static int get_host_info(bool first_process, pal_host_info_t *shallow_host_info) {

Suggestion:

pal_host_info_t* shallow_host_info

pal/src/host/linux-sgx/pal_main.c line 316 at r7 (raw file):

}

static int get_host_info(bool first_process, pal_host_info_t *shallow_host_info) {

ditto - why not move it outside?

Code quote:

bool first_process

pal/src/host/linux-sgx/pal_main.c line 647 at r7 (raw file):

    /* get host information */
    pal_host_info_t host_info = {0};
    assert(uptr_host_info != NULL);

What's the point of this check?


pal/src/host/linux-sgx/pal_main.c line 649 at r7 (raw file):

    assert(uptr_host_info != NULL);
    if (!sgx_copy_to_enclave(&host_info, sizeof(host_info), uptr_host_info,
                             sizeof(host_info))) {

I think that the whole point of this argument was to specify the size of the second object? (to assert that they are the same)

Code quote:

sizeof(host_info)

pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I also wasn't aware of this rule :)

I don't think we follow this rule closely. I would remove it as well.

I actually like it, easier to notice if the matching header was included, but it's not a strong preference.
Some historical context: We inherited it from Google C++ styleguide.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

The contents are passed through

Content of what? It's just on Linux PAL that this is content of some /etc/ files, other PALs can get this info from somewhere else (different files, syscalls/api calls, whatever). IMO emulation fits - we emulate these files out of some data we get from PAL.


a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

I'm not sure either, but AFAIR the idea of @oshogbo approach was to first implement a simpler file and get the review feedback, and then resolv.conf.
@dimakuv is the reporter of #689, so let's wait for his opinion on this.

Sure, but why this file? I would say we should never have this file emulated at all.



libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Dima's proposal sounds good, although this should be separate from this PR (and probably low-prio?).

Yes, definitely not in this PR.


pal/src/host/linux/pal_main.c line 124 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think if we keep the corresponding pal_host_info struct then it makes sense. Any suggestion for something better?

My point is that host info is a very broad term, whereas here we will probably just get some network related configuration. I don't have a good name in mind, but I feel like we should somehow point this is network related config.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think we don't use tagged structures in new code?

CC: @boryspoplawski, @dimakuv

True, we don't, please just use struct X.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

The contents are passed through

Content of what? It's just on Linux PAL that this is content of some /etc/ files, other PALs can get this info from somewhere else (different files, syscalls/api calls, whatever). IMO emulation fits - we emulate these files out of some data we get from PAL.

By "contents" I meant the values we get from the host, e.g. in next PRs we'll add the list of predefined DNS translations (/etc/hosts on Linux, system32/something_I_forgot/hosts on Windows, etc., doesn't matter) - we don't really emulate these contents, we don't even really process them - only ensure that they look ok-ish and encode them into Linux etc files format.


a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

Sure, but why this file? I would say we should never have this file emulated at all.

Not saying we need to merge this one if it's not actually needed, I just explained the context.
Other PRs should be very similar, so it's worth reviewing this one and arriving at a good version even if we won't merge it in the end.



pal/src/host/linux/pal_main.c line 124 at r7 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

My point is that host info is a very broad term, whereas here we will probably just get some network related configuration. I don't have a good name in mind, but I feel like we should somehow point this is network related config.

Ah, wait, topology is actually not processed here. Then maybe just get_host_etc_configs?

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

By "contents" I meant the values we get from the host.

Exactly, that's the point - we get them, they don't have to come from /etc, it could very well be some parsed output of some command.

we don't really emulate these contents,

We provide /etc files which expose certain information. We do not provide real files there, just some pseudo, so IMO we DO emulate their contents - we take a data structure of our choice and turn that into a pseudo file in /etc.

we don't even really process them - only ensure that they look ok-ish and encode them into Linux etc files format.

We do process them, especially on SGX PAL, where we have to sanitize them. That will be especially true with proposed format of parsing Linux's /etc/resolv.conf where we parse the content into an arbitrary struct of our choice.


a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Not saying we need to merge this one if it's not actually needed, I just explained the context.
Other PRs should be very similar, so it's worth reviewing this one and arriving at a good version even if we won't merge it in the end.

We most likely want to merge most of this PR anyway, because it establishes some ground for all others. Let's hear from @oshogbo why this file was picked as starter



pal/src/host/linux/pal_main.c line 124 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ah, wait, topology is actually not processed here. Then maybe just get_host_etc_configs?

I guess, should work

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)

a discussion (no related file):

Exactly, that's the point - we get them, they don't have to come from /etc, it could very well be some parsed output of some command.

Yes, I know, see "doesn't matter" in the reply above.

We provide /etc files which expose certain information. We do not provide real files there, just some pseudo, so IMO we DO emulate their contents - we take a data structure of our choice and turn that into a pseudo file in /etc.

What I mean is the contents / information from these structures is passed through, not emulated. We emulate only the specific representation of these data (in Linux etc files format).

We do process them, especially on SGX PAL, where we have to sanitize them. That will be especially true with proposed format of parsing Linux's /etc/resolv.conf where we parse the content into an arbitrary struct of our choice.

Yeah, but still, the meaning / contents are passed through, not emulated. Same with topology - I wouldn't say that we emulate topology info, rather that we're passing it through the host with sanitization.


Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):
In that sense whole Gramine is just one big passthrough, yet I wouldn't got so far to call it this way.

What I mean is the contents / information from these structures is passed through, not emulated. We emulate only the specific representation of these data (in Linux etc files format).

Agreed. That's why proposed option was emulate_etc_files, not emulate_network_config


!TODO use this commit message:

[LibOS,PAL,Docs] Introduce `etc` emulation (currently only 'hostname')

Gramine obtains information from a host, sanitizes it, and stores it in
the global PAL state. Later, LibOS uses it to create a pseudo file
(using pseudofs filesystem).

The difference between Linux and Gramine is that Gramine expected
the hostname to be a valid domain - as we assume the host is untrusted.
In the case of Linux, it doesn't presume that the root user
tries to exploit it through a malicious hostname.

Signed-off-by: Mariusz Zaborski <[email protected]>
Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 27 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

We most likely want to merge most of this PR anyway, because it establishes some ground for all others. Let's hear from @oshogbo why this file was picked as starter

Like @mkow mentioned I'm a new in Gramine community, so I wanted to start from something that will be most Gramine specific, and less functional specific. I wanted to understand fs layer and so on and establish ground for this project. And not discuss the parser semantic and so on.

Like I mentioned in PR I think this is how the /etc/hostname is used, however it was still requested in the #689 .


a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

In that sense whole Gramine is just one big passthrough, yet I wouldn't got so far to call it this way.

What I mean is the contents / information from these structures is passed through, not emulated. We emulate only the specific representation of these data (in Linux etc files format).

Agreed. That's why proposed option was emulate_etc_files, not emulate_network_config

Done.



-- commits line 20 at r6:
Done.


Documentation/manifest-syntax.rst line 147 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'm fine with emulate_etc_files (although see my comment in a similar discussion above).

Done.


libos/src/libos_init.c line 493 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

ok

Done.


libos/src/fs/libos_fs.c line 660 at r7 (raw file):

    mount_etcfs();

    return 0;

Done.


libos/src/fs/etc/fs.c line 28 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

?

I just marked this as done, because you left an comment that you figure this out, sorry it was my misunderstanding.

All file systems has this as a separate code.
First you have to register a pseudo file/fs, because otherwise the checkpoint mechanisms wont work.
And the mount code happens afterwords or don't happen at all (if I understand correctly).


libos/src/fs/etc/fs.c line 39 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Yes, let other PRs change that if they need it. It doesn't make sense as is here.

Done.


libos/test/regression/hostname.c line 68 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Hmm, I see that other tests have their own copies of such code wrapped into read_all(). Could you try moving it to a common.c or something and deduplicate?

Done.


pal/src/host/linux/pal_main.c line 147 at r5 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This fits into one line

Done.


pal/src/host/linux/pal_main.c line 124 at r7 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I guess, should work

Done.


pal/src/host/linux/pal_main.c line 128 at r7 (raw file):

    ret = toml_bool_in(g_pal_public_state.manifest_root, "libos.passthrough_etc_files",
                       false, &g_pal_public_state.passthrough_etc_files);

Done.


pal/src/host/linux/pal_main.c line 431 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

+1

Because we always read first the emulate_etc_files.
@dimakuv asked not to propagate this with checkpoints, and I would like to have the same global state in every process.
I also like the idea to parse the toml inside. But as this isn't obvious and we should move the toml parsing outside this function.


pal/src/host/linux-sgx/host_main.c line 908 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto - I like Borys' suggestion to move that if outside

Done.


pal/src/host/linux-sgx/pal_ecall_types.h line 22 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Good point, this should be taken from trusted source, because it's possible to do so.

Done.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

we'll probably change it later anyways, when moving this to g_pal_public_initial_state?

I think so.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r7 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

True, we don't, please just use struct X.

Done.


pal/src/host/linux-sgx/pal_main.c line 269 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Possible out-of-bounds read

Done.


pal/src/host/linux-sgx/pal_main.c line 270 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I find the following version more readable for range checking: (kindof emulates A <= x <= B from Python)

Done.


pal/src/host/linux-sgx/pal_main.c line 289 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Also OOB read

Done.


pal/src/host/linux-sgx/pal_main.c line 310 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please zero the last byte here explicitly to make reasoning about security of this code easier (and make it more refactoring-safe).

Done.

Code quote:

init_passthrough_etc_files

pal/src/host/linux-sgx/pal_main.c line 316 at r7 (raw file):

}

static int get_host_info(bool first_process, pal_host_info_t *shallow_host_info) {

Done.


pal/src/host/linux-sgx/pal_main.c line 316 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto - why not move it outside?

We are reading the data before verifying is it first process or not, as some parts aren't checkpointed.


pal/src/host/linux-sgx/pal_main.c line 647 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What's the point of this check?

Checking validity of pal_linux_main call.


pal/src/host/linux-sgx/pal_main.c line 649 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think that the whole point of this argument was to specify the size of the second object? (to assert that they are the same)

I don't see such assert.
However this makes more sense.
Done.

@oshogbo oshogbo mentioned this pull request Aug 31, 2022
Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 19 files at r8, all commit messages.
Reviewable status: 19 of 31 files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Like @mkow mentioned I'm a new in Gramine community, so I wanted to start from something that will be most Gramine specific, and less functional specific. I wanted to understand fs layer and so on and establish ground for this project. And not discuss the parser semantic and so on.

Like I mentioned in PR I think this is how the /etc/hostname is used, however it was still requested in the #689 .

The fact that something is requested doesn't mean we actually need it. Or that it's even useful at all. In that RFC issue it was present in a list of files that Docker shadows/emulates - this is because Docker is different from Graimne, it can set up networking stuff if it wants to (it has a separate network namespace), in Gramine this doesn't make sense.



Documentation/manifest-syntax.rst line 148 at r8 (raw file):

This specifies whether to emulate runtime files under ``/etc``.
In the case of SGX on Linux, this is achieved by taking the host's ``/etc``

Suggestion:

Linux SGX PAL

libos/test/regression/rename_unlink.c line 39 at r8 (raw file):

            warn("write");
            return -1;
        }

What's with this change, how's that related to the PR?
disclaimer: I've not yet reviewed the test part of this PR


pal/src/host/linux/pal_main.c line 431 at r7 (raw file):

I also like the idea to parse the toml inside. But as this isn't obvious and we should move the toml parsing outside this function.

Parse error.
Anyway, I think we should move toml parsing outside - it doesn't even match this function name (get_host_etc...)


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

+1, although we'll probably change it later anyways, when moving this to g_pal_public_initial_state?

What does it have to do with this? g_pal_public_initial_state is a trusted struct inside PAL, this is ecall interface


pal/src/host/linux-sgx/pal_ecall_types.h line 34 at r8 (raw file):

    int                     ms_parent_stream_fd;
    sgx_target_info_t*      ms_qe_targetinfo;
    struct pal_host_info*   ms_host_info;

Why did you add more spaces?

Suggestion:

struct pal_host_info* ms_host_info;

pal/src/host/linux-common/etc_host_info.c line 13 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I actually like it, easier to notice if the matching header was included, but it's not a strong preference.
Some historical context: We inherited it from Google C++ styleguide.

@oshogbo please fix, as we merged the PR removing this rule

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 19 files at r8, all commit messages.
Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

The fact that something is requested doesn't mean we actually need it. Or that it's even useful at all. In that RFC issue it was present in a list of files that Docker shadows/emulates - this is because Docker is different from Graimne, it can set up networking stuff if it wants to (it has a separate network namespace), in Gramine this doesn't make sense.

@boryspoplawski But why do you say that emulating /etc/hostname is useless?

If we choose to not emulate /etc/hostname, then we have two arguments:

  1. No applications use this file.
  2. When applications use this file, the file contents can be hard-coded on any system (via sgx.trusted_files = "file:/etc/hostname")

Which of these arguments is true in your opinion? How can we be sure in argument 1? How can we be sure in argument 2?

Or maybe there is argument 3?

Otherwise, if we cannot say for sure that /etc/hostname is never used (or never changes from one system to the other), then we must have this PR, right?



Documentation/manifest-syntax.rst line 148 at r8 (raw file):

This specifies whether to emulate runtime files under ``/etc``.
In the case of SGX on Linux, this is achieved by taking the host's ``/etc``

Linux-SGX PAL, that's the most correct name.


Documentation/manifest-syntax.rst line 149 at r8 (raw file):

This specifies whether to emulate runtime files under ``/etc``.
In the case of SGX on Linux, this is achieved by taking the host's ``/etc``
files, filter supported options, and sanitize them.

filtering, sanitizing


libos/src/fs/etc/fs.c line 69 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Yes, definitely not in this PR.

I created a GitHub issue for this: #872


libos/test/regression/rename_unlink.c line 39 at r8 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

What's with this change, how's that related to the PR?
disclaimer: I've not yet reviewed the test part of this PR

Same question from me -- can we split this into a separate PR? And why write_all() is not replaced by a func from rw_file.h?


pal/src/host/linux/pal_main.c line 431 at r7 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I also like the idea to parse the toml inside. But as this isn't obvious and we should move the toml parsing outside this function.

Parse error.
Anyway, I think we should move toml parsing outside - it doesn't even match this function name (get_host_etc...)

I'm fine with Borys's proposal to move the parsing of libos.emulate_etc_files boolean outside of the get_host_etc_configs() file.


pal/src/host/linux-sgx/pal_main.c line 316 at r7 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

We are reading the data before verifying is it first process or not, as some parts aren't checkpointed.

I think the suggestion is to move libos.emulate_etc_files parsing outside anyway, and then you can also move first_process check outside as well. Similar as requested in Linux PAL.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@boryspoplawski But why do you say that emulating /etc/hostname is useless?

If we choose to not emulate /etc/hostname, then we have two arguments:

  1. No applications use this file.
  2. When applications use this file, the file contents can be hard-coded on any system (via sgx.trusted_files = "file:/etc/hostname")

Which of these arguments is true in your opinion? How can we be sure in argument 1? How can we be sure in argument 2?

Or maybe there is argument 3?

Otherwise, if we cannot say for sure that /etc/hostname is never used (or never changes from one system to the other), then we must have this PR, right?

/etc/hostname is an initial hostname to set - think of it as a way of permanently setting hostname that survives reboots. Temporary changes are done by sethostname syscall, which ofc does not change /etc/hostname. Now what could possibly be changing /etc/hostname? Either

  1. manual user change - irrelevant to us,
  2. some platform bootstrapping / setup scripts - nothing like that should be run in Gramine.
    Now what could be reading it?
  3. Early init script to call sethostname on that name. Generally I don't think such scripts are runnable in Gramine (e.g. systemd 😄)
  4. Everything else should use gethostname as it can differ from /etc/hostname.

The argument slightly changes when we mix namespaces, but we do not support them.

All of the above is my thinking, it could be inaccurate or plain wrong.


Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

/etc/hostname is an initial hostname to set - think of it as a way of permanently setting hostname that survives reboots. Temporary changes are done by sethostname syscall, which ofc does not change /etc/hostname. Now what could possibly be changing /etc/hostname? Either

  1. manual user change - irrelevant to us,
  2. some platform bootstrapping / setup scripts - nothing like that should be run in Gramine.
    Now what could be reading it?
  3. Early init script to call sethostname on that name. Generally I don't think such scripts are runnable in Gramine (e.g. systemd 😄)
  4. Everything else should use gethostname as it can differ from /etc/hostname.

The argument slightly changes when we mix namespaces, but we do not support them.

All of the above is my thinking, it could be inaccurate or plain wrong.

@boryspoplawski I'm not talking about changing /etc/hostname.

Let me rephrase -- you write a manifest file, and you expect this manifest file to run on 100s of different machines (you prepared the Gramine app distribution once, and then you run it on different cloud machines and client machines). Assuming that your application reads /etc/hostname (at least once, during init) -- how do we enable this file in Gramine?

From what I understand, there is no good way of solving the above scenario currently. This PR allows this scenario.


Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@boryspoplawski I'm not talking about changing /etc/hostname.

Let me rephrase -- you write a manifest file, and you expect this manifest file to run on 100s of different machines (you prepared the Gramine app distribution once, and then you run it on different cloud machines and client machines). Assuming that your application reads /etc/hostname (at least once, during init) -- how do we enable this file in Gramine?

From what I understand, there is no good way of solving the above scenario currently. This PR allows this scenario.

Just to reiterate: we want to run some Gramine app on multiple machines, and these machines may have different hostnames (which is reflected in /etc/hostname).


Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):
No, it's not reflected in /etc/hostname it's reflected in gethostname output.

Assuming that your application reads /etc/hostname (at least once, during init) -- how do we enable this file in Gramine?

Ideally your application does the correct thing and uses gethostname.


Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

No, it's not reflected in /etc/hostname it's reflected in gethostname output.

Assuming that your application reads /etc/hostname (at least once, during init) -- how do we enable this file in Gramine?

Ideally your application does the correct thing and uses gethostname.

Ok, so you say that "No applications use this file".

How can we verify this theory? And do we want to be on the safe side and in this case use this PR (which allows safe use of /etc/hostname)?


Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 19 files at r8, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)

a discussion (no related file):

In that sense whole Gramine is just one big passthrough, yet I wouldn't got so far to call it this way.

I think that's quite a big jump...


a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, so you say that "No applications use this file".

How can we verify this theory? And do we want to be on the safe side and in this case use this PR (which allows safe use of /etc/hostname)?

I see at least two argument in favor of "No applications use this file":

  • Debian doesn't have this file.
  • gethostname() doesn't use it (it uses uname syscall instead), so an app would need to manually read /etc/hostname.

But of course software may be more broken than we think :)



-- commits line 59 at r8:
I don't like commit messages in this tense/grammar form, because it's not clear whether you're describing the state before this commit (to make an introduction to the changes described later) or actually the changes in this commit.


-- commits line 61 at r8:
This is even more misleading than the above :) You used past tense here, so it seems that you're describing what Gramine did before this commit.

Code quote:

Gramine expected

libos/src/libos_init.c line 490 at r8 (raw file):

    RUN_INIT(init_sync_client);

    /* XXX: this will break uname checkpointing. */

Suggestion:

this will break uname checkpointing (if we implement it)

libos/test/regression/rename_unlink.c line 39 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Same question from me -- can we split this into a separate PR? And why write_all() is not replaced by a func from rw_file.h?

I asked about this, because @oshogbo added another copy of this code to hostname.c test. It's just a test code, so IMO it can stay in this PR to not complicate things (but if it was in the main code then it should rather get split).


pal/src/host/linux-sgx/host_main.c line 619 at r8 (raw file):

/* Parses only the information needed by the untrusted PAL to correctly initialize the enclave. */
static int parse_loader_config(char* manifest, struct pal_enclave* enclave_info,
                               bool *emulate_etc) {

Suggestion:

bool* emulate_etc

pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

+1, although we'll probably change it later anyways, when moving this to g_pal_public_initial_state?

What does it have to do with this? g_pal_public_initial_state is a trusted struct inside PAL, this is ecall interface

Oh, this is ecall. Sorry, I misread it, I thought it's some global struct. Maybe we should prefix ecall arg structs with ecall_?


pal/src/host/linux-sgx/pal_main.c line 316 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think the suggestion is to move libos.emulate_etc_files parsing outside anyway, and then you can also move first_process check outside as well. Similar as requested in Linux PAL.

Yeah.


pal/src/host/linux-sgx/pal_main.c line 647 at r7 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Checking validity of pal_linux_main call.

It's not really checking anything, it's just an assert which doesn't land in release builds. And even if it was a real check, then it still makes little sense:

  • Why check for a NULL specifically? -1 is also for sure invalid, but you don't check for it. And misaligned pointers. And many more.
  • In terms of security, you have sgx_copy_to_enclave() call right underneath which can't do any insecure copying.

pal/src/host/linux-sgx/pal_main.c line 649 at r7 (raw file):

I don't see such assert.

There's usize > maxsize comparison inside it.


pal/src/host/linux-sgx/pal_main.c line 289 at r8 (raw file):

    }

    if (ptr - hostname >= PAL_HOSTNAME_MAX)

Please add a comment inside the body of this if: /* Missing NULL termination */

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)


libos/test/regression/rename_unlink.c line 39 at r8 (raw file):

I asked about this, because @oshogbo added another copy of this code to hostname.c test.

And how's this related? We already had abstractions for such functions.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Oh, this is ecall. Sorry, I misread it, I thought it's some global struct. Maybe we should prefix ecall arg structs with ecall_?

But it's not passed into ecall directly it's embedded in another struct that already has ecall_ prefix.
This is another argument for not holding these together - currently this struct IMO makes no sense and should be inlined. We can have a struct once we actually have something to put in there.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)


libos/test/regression/rename_unlink.c line 39 at r8 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I asked about this, because @oshogbo added another copy of this code to hostname.c test.

And how's this related? We already had abstractions for such functions.

It's not really related, but also not a big change. But if you want we can split it out into a separate PR/commit.


pal/src/host/linux-sgx/pal_ecall_types.h line 24 at r6 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

But it's not passed into ecall directly it's embedded in another struct that already has ecall_ prefix.
This is another argument for not holding these together - currently this struct IMO makes no sense and should be inlined. We can have a struct once we actually have something to put in there.

Ok, let's inline it then.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

I see at least two argument in favor of "No applications use this file":

  • Debian doesn't have this file.
  • gethostname() doesn't use it (it uses uname syscall instead), so an app would need to manually read /etc/hostname.

But of course software may be more broken than we think :)

Ok, so then we need to agree on the next steps now. Looks like two maintainers who know this area best (Borys and Michal) agree that we do not need to emulate /etc/hostname.

So looks like we need to do the following things:

  1. Close this PR (the parts of it will be used in the two new PRs described below).
  2. Create a first PR that will initialize uname.domainname in g_current_uname: https://github.com/gramineproject/gramine/blob/f1e1d22d29a08c05a6cf19188aa0241787608371/libos/src/sys/libos_uname.c#L32?plain=1
    • This should use the uname host syscall in PAL and propagate the value of uname.domainname to LibOS
    • LibOS should sanitize this value, similar to how it is done in the current PR
  3. Create a second PR that will emulate the file /etc/resolv.conf. A lot of the pseudo-files scaffolding should be taken from the current PR.

@mkow @boryspoplawski @oshogbo Does this sound correct?


Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @oshogbo)

a discussion (no related file):

Debian doesn't have this file.

@mkow What? Even debian wiki says you need to update this file when you change hostname. I don't have any debian at hand to check, but this doesn't feel right.

How about /etc/hosts before /etc/resolv.conf? It sounds much easier to add and would be mostly the same as this PR (so there 's even a change we could reuse it completely)


Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "squash! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @oshogbo)

a discussion (no related file):

Previously, boryspoplawski (Borys Popławski) wrote…

Debian doesn't have this file.

@mkow What? Even debian wiki says you need to update this file when you change hostname. I don't have any debian at hand to check, but this doesn't feel right.

How about /etc/hosts before /etc/resolv.conf? It sounds much easier to add and would be mostly the same as this PR (so there 's even a change we could reuse it completely)

@dimakuv: Sounds good.
@boryspoplawski: Maybe it's because it comes from Qubes OS Debian template and they've changed something? Dunno.


@oshogbo oshogbo closed this Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants