-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support time namespace #3876
Support time namespace #3876
Conversation
CC: @KentaTada |
|
CC: @AkihiroSuda |
CI is failing |
493a08f
to
c6f73ac
Compare
c87c035
to
d6c0d52
Compare
The CI is now passing after removing logrus print statement. |
Additional merge commit added while rebasing changes from main. I'll update my branch by removing the merge commit. |
79e1968
to
b16765e
Compare
b16765e
to
9a5d7b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not looking at this earlier, only one somewhat significant thing needs to be changed. And this unfortunately needs a rebase due to the just-merged id-mapped-mounts work.
This could also really use some integration tests. If you're not comfortable setting those up, I can do them in a follow-up PR.
libcontainer/configs/config.go
Outdated
@@ -212,6 +212,9 @@ type Config struct { | |||
// RootlessCgroups is set when unlikely to have the full access to cgroups. | |||
// When RootlessCgroups is set, cgroups errors are ignored. | |||
RootlessCgroups bool `json:"rootless_cgroups,omitempty"` | |||
|
|||
// TimeOffsets specifies the offset for supporting time namespaces. | |||
TimeOffsets map[string]specs.LinuxTimeOffset `json:"timeOffsets,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the rest of the state json:
TimeOffsets map[string]specs.LinuxTimeOffset `json:"timeOffsets,omitempty"` | |
TimeOffsets map[string]specs.LinuxTimeOffset `json:"time_offsets,omitempty"` |
libcontainer/container_linux.go
Outdated
// write boottime and monotonic offsets | ||
// 0 is default value in /proc/PID/timens_offsets | ||
// if 0, do not bootstrap data | ||
if c.config.TimeOffsets != nil { | ||
|
||
if c.config.TimeOffsets["boottime"].Nanosecs != 0 || c.config.TimeOffsets["boottime"].Secs != 0 { | ||
b, err := encodeTimeNs(c.config.TimeOffsets["boottime"]) | ||
if err != nil { | ||
return nil, err | ||
} | ||
r.AddData(&Bytemsg{ | ||
Type: BootTimeNsAttr, | ||
Value: append([]byte("boottime "), b...), | ||
}) | ||
} | ||
|
||
if c.config.TimeOffsets["monotonic"].Nanosecs != 0 || c.config.TimeOffsets["monotonic"].Secs != 0 { | ||
b, err := encodeTimeNs(c.config.TimeOffsets["monotonic"]) | ||
if err != nil { | ||
return nil, err | ||
} | ||
r.AddData(&Bytemsg{ | ||
Type: MonotonicNsAttr, | ||
Value: append([]byte("monotonic "), b...), | ||
}) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be written with one string by separating them with newlines, so there's no need to add multiple netlink entries for this. Just generate one string into a TimeOffsetsAttr
and write it in one shot in nsexec.c
.
libcontainer/nsenter/nsexec.c
Outdated
* Update timens offsets | ||
* set boottime and monotonic offsets | ||
*/ | ||
write_log(DEBUG, "set timens offsets %s", config.boottime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're outputting this debug information twice (both here and in update_timens). You can remove this one.
libcontainer/container_linux.go
Outdated
if c.config.TimeOffsets["boottime"].Nanosecs != 0 || c.config.TimeOffsets["boottime"].Secs != 0 { | ||
b, err := encodeTimeNs(c.config.TimeOffsets["boottime"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use 3 map lookups here, while 1 should be enough. Make encodeTimeNs check if the fields are all 0 and return empty string in such case, add a check for empty string below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also looks like this would be a good place to write something like
for k, v := c.Config.TimeOffsets {
...
}
making the code more generic and avoiding to use specific strings.
libcontainer/nsenter/nsexec.c
Outdated
@@ -95,6 +95,13 @@ struct nlconfig_t { | |||
/* Mount sources opened outside the container userns. */ | |||
char *mountsources; | |||
size_t mountsources_len; | |||
|
|||
/* Time NS settings for boottime and monotonic */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing period.
libcontainer/nsenter/nsexec.c
Outdated
* Update timens offsets | ||
* set boottime and monotonic offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing periods. Please write complete sentences in comments, otherwise it's harder to read (and maintain).
libcontainer/specconv/spec_linux.go
Outdated
if spec.Linux.TimeOffsets != nil { | ||
config.TimeOffsets = spec.Linux.TimeOffsets | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here if
is not needed -- worst case scenario is you assign nil
to config.TimeOffsets
.
(In other places if
is needed to avoid nil dereferences, but not here).
Thank you for detailed review. I'll update as per comments. |
9a5d7b4
to
f644e9a
Compare
libcontainer/container_linux.go
Outdated
// write boottime and monotonic time ns offsets. | ||
if c.config.TimeOffsets != nil { | ||
timeOffset := append([]byte("monotonic"), []byte(fmt.Sprintf(" %d %d\n", c.config.TimeOffsets["monotonic"].Secs, c.config.TimeOffsets["monotonic"].Nanosecs))...) | ||
timeOffset = append(timeOffset, []byte("boottime")...) | ||
r.AddData(&Bytemsg{ | ||
Type: TimeOffsetsAttr, | ||
Value: append(timeOffset, []byte(fmt.Sprintf(" %d %d", c.config.TimeOffsets["boottime"].Secs, c.config.TimeOffsets["boottime"].Nanosecs))...), | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little difficult to read. Maybe something more like this (untested):
// write boottime and monotonic time ns offsets. | |
if c.config.TimeOffsets != nil { | |
timeOffset := append([]byte("monotonic"), []byte(fmt.Sprintf(" %d %d\n", c.config.TimeOffsets["monotonic"].Secs, c.config.TimeOffsets["monotonic"].Nanosecs))...) | |
timeOffset = append(timeOffset, []byte("boottime")...) | |
r.AddData(&Bytemsg{ | |
Type: TimeOffsetsAttr, | |
Value: append(timeOffset, []byte(fmt.Sprintf(" %d %d", c.config.TimeOffsets["boottime"].Secs, c.config.TimeOffsets["boottime"].Nanosecs))...), | |
}) | |
} | |
// write boottime and monotonic time ns offsets. | |
if c.config.TimeOffsets != nil { | |
var offsetSpec bytes.Buffer | |
for clock, offset := range c.config.TimeOffsets { | |
fmt.Fprintf(&offsetSpec, "%s %d %d\n", clock, offset.Secs, offset.Nanosecs) | |
} | |
r.AddData(&Bytemsg{ | |
Type: TimeOffsetsAttr, | |
Value: offsetSpec.Bytes(), | |
}) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Go, printf-like functions are notoriously slow for some reason. I'd prefer string concatenation, but it's not a big deal since it's only done once during the container start here.
f644e9a
to
3791a45
Compare
3791a45
to
d29ead9
Compare
"time" namespace was introduced in Linux v5.6 support new time namespace to set boottime and monotonic time offset Example runtime spec "timeOffsets": { "monotonic": { "secs": 172800, "nanosecs": 0 }, "boottime": { "secs": 604800, "nanosecs": 0 } } Signed-off-by: Chethan Suresh <[email protected]>
d29ead9
to
ebc2e7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
return; | ||
write_log(DEBUG, "update /proc/self/timens_offsets to '%s'", map); | ||
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) { | ||
if (errno != EPERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EPERM
means The caller does not have the CAP_SYS_TIME capability.
Why ignore this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think this was mistakenly copied from the /proc/self/uid_map
and gid_map
helpers (which call the setuid newuidmap
/newgidmap
helper programs in case of -EPERM
).
return; | ||
write_log(DEBUG, "update /proc/self/timens_offsets to '%s'", map); | ||
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) { | ||
if (errno != EPERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think this was mistakenly copied from the /proc/self/uid_map
and gid_map
helpers (which call the setuid newuidmap
/newgidmap
helper programs in case of -EPERM
).
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) { | ||
if (errno != EPERM) | ||
bail("failed to update /proc/self/timens_offsets"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) { | |
if (errno != EPERM) | |
bail("failed to update /proc/self/timens_offsets"); | |
} | |
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) | |
bail("failed to update /proc/self/timens_offsets"); |
@@ -2321,6 +2321,18 @@ func (c *Container) bootstrapData(cloneFlags uintptr, nsMaps map[configs.Namespa | |||
}) | |||
} | |||
|
|||
// write boottime and monotonic time ns offsets. | |||
if c.config.Namespaces.Contains(configs.NEWTIME) && c.config.TimeOffsets != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we want this check to be done in the validator (at is stands, the code will silently ignore timensOffsets
being configured without CLONE_NEWTIME
enabled). But I can do this in a separate PR when I add the necessary integration tests for this feature.
|
||
// update timens offsets | ||
config.TimeOffsets = spec.Linux.TimeOffsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps you can move this to be right after these lines:
config.MaskPaths = spec.Linux.MaskedPaths
config.ReadonlyPaths = spec.Linux.ReadonlyPaths
config.MountLabel = spec.Linux.MountLabel
config.Sysctl = spec.Linux.Sysctl
and drop the comment since it's kind of obvious what we do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly OK to me, left a couple of nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just merge this, I'll send a PR to fix up the remaining nits (including the incorrect EPERM ignore logic) so we don't need to do more rounds of review.
Thanks @chethanah!
Any reason why the timenamespace was not added into the config.json that is generated by cc: @cyphar @chethanah |
The default config is more of an example than a source of best practices (it doesn't have seccomp rules, for instance) so we don't always update it to include every new setting but I suspect that the PR author simply forgot to add it. That being said, adding it would've resulted in errors on older kernels so there is a slight argument for not including it by default at the time, but we added support for this quite late and pre-4.18 kernels are very old at this point. Maybe we should just include it now. |
I have an initial set of changes in my fork https://github.com/akhilerm/runc/tree/add-time-ns-example, but still need to fix a few more tests. Will create the PR once the CI is passing. |
Background
Support time ns in runc
/proc/self/timens_offsets
BootTimeNsAttr
andMonotonicNsAttr
netlink message type to send to bootstrap programSTAGE_CHILD
after unsharing all the clone flags (stage-1)/proc/self/timens_offsets
to validate if time ns enablednsenter_test.go
only checks for invalid ns path and type for netlink requests. No specific_test.go
for time ns is required.timens.bats
integration test? - TBDUnit test
Discussion points
/proc/self/timens_offsets
stage1_pid
/stage2_pid
like done inupdate_uidmap
?STAGE_CHILD
?but after the first process has been created in or has entered the namespace, write(2)s on this file fail with the error EACCES.
STAGE_CHILD
after unshared ns