Skip to content

Conversation

dcantah
Copy link
Member

@dcantah dcantah commented Sep 17, 2025

Today, we have a sort of odd model all things considered. We talk to the APIServer to do the initial container creation, which mostly all of the work is just registering the runtime helper with launchd. After this registration, almost all communication from a client is talking directly to that runtime helper. This has a couple rather annoying issues in that we now need a sort of event/notification channel that the helper will establish with the APIServer for errors/start/exited cases.

If instead of talking to the runtime helper directly, we instead took a detour through the APIServer, the APIServer is clued into exactly what order of operations is occurring. This makes "did starting the container fail? Okay we should clean up" scenarios much simpler, and it also simplifies the clients quite a bit as they don't need this split brained client model, everyone just talks to the APIServer. This change is in pursuit of that. I have reworked our clients, the ContainerService and some of our XPC types to accomplish it.

The biggest "contract" change is in the SandboxService. Today we have on the flag when we register any runtime helper that makes any xpc messages wake up the registered process. This isn't great in scenarios where the process have may crashed, or it exited normally and we're just trying to invoke an RPC on it. Today the helper would spawn again and try and answer our request. It'd be much nicer if we have a connection object that will become invalid if the process that vended it to us is gone. To accomplish this, now the runtime helpers will listen on an anonymous xpc connection and vend endpoints from this via only one handler exposed by the SandboxService (createEndpoint). From that point onwards all communication will be through the endpoint the service vended a client.

@dcantah dcantah force-pushed the move-to-api-server-for-everythang branch 3 times, most recently from 4838bcf to e7e4dcb Compare September 18, 2025 03:29
@dcantah dcantah marked this pull request as ready for review September 18, 2025 03:34
@dcantah dcantah force-pushed the move-to-api-server-for-everythang branch 3 times, most recently from 1a56c99 to 6a497e6 Compare September 18, 2025 07:25
@dcantah dcantah force-pushed the move-to-api-server-for-everythang branch from 6a497e6 to 5d5e622 Compare September 18, 2025 18:43
jglogan
jglogan previously approved these changes Sep 19, 2025
@jglogan jglogan dismissed their stale review September 19, 2025 00:19

Pushed the wrong button

return container
}

private func gracefulStopContainer(_ lc: LinuxContainer, stopOpts: ContainerStopOptions) async throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note about code not modified in this PR:

private nonisolated func configureProcessConfig()

This should be able to become:

private static func configureProcessConfig()

Same for closeHandle().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another lowkey obsessive nit in the non-modified code above - move getDefaultNameserver out from between the configure...() funcs. All the private methods could probably stand to be reordered sensibly. In a swift file that's pushing 1200 lines it will help new developers.

@Mcrich23
Copy link
Contributor

Mcrich23 commented Sep 19, 2025

Hey! So sorry, but in an effort to make plugin development more possible #603 and #635 have lead to the CLI folder being renamed to ContainerCommands and that will impact the merging of your pull request. Just an FYI, so you understand the issue when you are resolving conflicts.

@dcantah dcantah force-pushed the move-to-api-server-for-everythang branch 2 times, most recently from 560d05a to 07707d5 Compare September 19, 2025 10:23
@dcantah dcantah marked this pull request as draft September 19, 2025 10:23
@dcantah
Copy link
Member Author

dcantah commented Sep 19, 2025

Converting to draft because there's a small cz change that I want in that will allow us to "fix" some behavior in stop()

Today, we have a sort of odd model all things considered. We
talk to the APIServer to do the initial container creation, which
mostly all of the work is just registering the runtime helper with
launchd. After this registration, almost all communication from a client
is talking directly to that runtime helper. This has a couple rather annoying
issues in that we now need a sort of event/notification channel that the helper
will establish with the APIServer for errors/start/exited cases.

If instead of talking to the runtime helper directly, we instead took a detour
through the APIServer, the APIServer is clued into exactly what order of operations
is occurring. This makes "did starting the container fail? Okay we should clean
up" scenarios much simpler, and it also simplifies the clients quite a bit
as they don't need this split brained client model, everyone just talks to the
APIServer. This change is in pursuit of that. I have reworked our clients, the
ContainerService and some of our XPC types to accomplish it.

The biggest "contract" changes are in the SandboxService. The first is today we have
on the flag when we register any runtime helper that makes any xpc messages wake up the
registered process. This isn't great in scenarios where the process may have crashed, or
it exited normally and we're just trying to invoke an RPC on it. Today the helper would
spawn again and try and answer our request. It'd be much nicer if we have a connection
object that will become invalid if the process that vended it to us is gone. To accomplish
this, now the runtime helpers will listen on an anonymous xpc connection and vend endpoints
from this via only one handler exposed by the SandboxService (createEndpoint). From that point
onwards all communication will be through the endpoint the service vended a client.

The second change is the runtime helper will not exit on its own when the container exits, and
the event mechanism has been removed. Now the APIServer simply calls wait() to listen for container
exit in the background, and once we get an exit we will explicitly tell the helper to shutdown.
The rationale is if shutdown is driven by the APIServer now, we can be certain we received everything
we need from the helpers before they power down.
@dcantah dcantah force-pushed the move-to-api-server-for-everythang branch from 07707d5 to 66fef47 Compare September 19, 2025 19:04
@dcantah dcantah requested a review from jglogan September 19, 2025 19:26
@dcantah dcantah marked this pull request as ready for review September 19, 2025 19:26
@dcantah dcantah requested a review from wlan0 September 19, 2025 19:26
@dcantah dcantah merged commit 444064d into apple:main Sep 20, 2025
37 of 38 checks passed
Copy link
Contributor

@dkovba dkovba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if let h {
request.set(key: key, value: h)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code repeats 4 times. Consider extracting it into helper methods in XPCMessage:

 static func stdioKey(for index: Int) throws -> XPCKeys {
      switch index {
      case 0: return .stdin
      case 1: return .stdout
      case 2: return .stderr
      default:
          throw ContainerizationError(.invalidArgument, message: "invalid fd \(index)")
      }
  }

 static func setStdioHandles(on request: XPCMessage, stdio: 
  [FileHandle?]) throws {
      for (index, handle) in stdio.enumerated() {
          if let handle {
              request.set(key: stdioKey(for: index), value: handle)
          }
      }
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John had commented the same, I'm going to do that (and some other cleanups) in a followup

Copy link
Contributor

@adityaramani adityaramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

jglogan added a commit that referenced this pull request Sep 23, 2025
## Motivation and Context
#654 forgot to include the shutdown XPC that was added in #628.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants