Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[request for help] I can't ssh into the admin container because of networking issues #3486

Closed
dorin-ionita opened this issue Sep 25, 2023 · 3 comments
Labels
status/needs-info Further information is requested type/bug Something isn't working

Comments

@dorin-ionita
Copy link

Hello,

I have managed to build Bottlerocket OS and boot it on a bare-metal machine. Next, in my attempt to get a remote shell, I have created the following 2 toml files that I put on the root of the 12th partition:

$ cat user-data.toml
version = 3

[settings.kubernetes]
standalone-mode = true

[settings.host-containers.admin]
enabled = true
$ cat net.toml
version = 3

["ec:0d:9a:ae:12:60".static4]
addresses = ["10.55.253.30/24"]

This is the result:
no-name

My guess from it is that it needs the DNS server.

So I go ahead and try to do just that:

$ cat net.toml
version = 3

["ec:0d:9a:ae:12:60".static4]
addresses = ["10.55.253.30/24"]

[settings.dns]
name-servers = ["8.8.8.8"]

And now it seems that the file can't be parsed by the tool that is supposed to take these configuration files and apply them to the networking stack.
image

Have you seen something similar before? Any idea why it happens? Is this how the configuration files are supposed to look like?

In both cases it seems I can't ssh into the system using ssh ec2-user@<ip>. I suppose that's as expected since I think the ssh daemon exists only within the container I try to start.

Image I'm using:
Both x86_64-metal-k8s-1.23 and x86_64-metal-k8s-1.28.

What I expected to happen:
Be able to ssh into the admin container in order to access a remote shell.

What actually happened:
Either name resolution error or error applying the configuration files.

How to reproduce the problem:
Build the images named above and boot it with the configuration files specified here.

@dorin-ionita dorin-ionita added status/needs-triage Pending triage or re-evaluation type/bug Something isn't working labels Sep 25, 2023
@jpculp
Copy link
Member

jpculp commented Sep 25, 2023

Thanks for reaching out! Instead of net.toml, can you try passing the dns settings via user-data.toml?

@yeazelm
Copy link
Contributor

yeazelm commented Sep 25, 2023

The first configuration you had was correct, but might be missing routing information (See the section for version 2 where we discuss routes).

route (map): Static route; multiple routes can be added. (cannot be used in conjunction with DHCP)

  • to ("default" or IP address with prefix, required): Destination address.
  • from (IP address): Source IP address.
  • via (IP address): Gateway IP address. If no gateway is provided, a scope of link is assumed.
  • route-metric (integer): Relative route priority.

Its hard to tell if that configuration would be correct for your network but if you aren't on the same /24 you may need some routing information as well.

As for the error:

[settings.dns]
name-servers = ["8.8.8.8"]

should be in the user-data.toml as @jpculp called out, unfortunately, the errors on boot are pretty cryptic when net.toml contains settings it doesn't support and we have an issue calling out error messaging could be improved.

@arnaldo2792 arnaldo2792 added status/needs-info Further information is requested and removed status/needs-triage Pending triage or re-evaluation labels Sep 26, 2023
@dorin-ionita
Copy link
Author

Hi,

I managed to get the remote shell for the bare-metal machine. Your advice helped:

  • on the DNS side I moved the configuration to user-data.toml as you suggested and the sundog error in the printscreen attached above vanished.
  • I still couldn't get the remote shell (ping/traceroute couldn't reach the machine), so I tried your suggestion about checking the route. Indeed the 2 machines (the bare-metal and the one trying to access it) were not on the same subnetwork as I wrongly thought they were. What I did was to add the default gateway as route in the net.toml config file and finally I got access to the remote shell.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/needs-info Further information is requested type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants