Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headscale does not start with large networks in ACLs #1372

Closed
pkrivanec opened this issue Apr 22, 2023 · 6 comments
Closed

Headscale does not start with large networks in ACLs #1372

pkrivanec opened this issue Apr 22, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@pkrivanec
Copy link
Contributor

Hi,

headscale v0.22.1 does not work with large networks (10.0.0.0/8) in ACLs.
The v0.21.0 version had no problem with this configuration.

Bug description

Headscale does not start correctly and uses 100% CPU and memory.

To Reproduce
Use acl.hujson:

{
    "groups": {
        "group:test": [
            "testuser"
        ]
    },
    "acls": [
        {
            "action": "accept",
            "src": [
                "group:test"
            ],
            "dst": [
                "tag:exitnode:*",
                "10.0.0.0/8:*"
            ]
        }
    ]
}

Context info

  • Version of headscale used 0.22.1
  • OS Rocky Linux release 9.1
  • Kernel version 5.14.0-162.23.1.el9_1.x86_64
@pkrivanec pkrivanec added the bug Something isn't working label Apr 22, 2023
@Chluz
Copy link

Chluz commented Apr 22, 2023

Hi, Same issue here. I'm actually using large networks in order to allow access through exit nodes to the internet only, so quite problematic.

{
      "action": "accept",
      "src": ["group:external"],
      "dst": ["tag:internet-external:0",
               "0.0.0.0/5:*",
               "8.0.0.0/7:*",
               "11.0.0.0/8:*",
               "12.0.0.0/6:*",
               "16.0.0.0/4:*",
               "32.0.0.0/3:*",
               "64.0.0.0/2:*",
               "128.0.0.0/3:*",
               "160.0.0.0/5:*",
               "168.0.0.0/6:*",
               "172.0.0.0/12:*",
               "172.32.0.0/11:*",
               "172.64.0.0/10:*",
               "172.128.0.0/9:*",
               "173.0.0.0/8:*",
               "174.0.0.0/7:*",
               "176.0.0.0/4:*",
               "192.0.0.0/9:*",
               "192.128.0.0/11:*",
               "192.160.0.0/13:*",
               "192.169.0.0/16:*",
               "192.170.0.0/15:*",
               "192.172.0.0/14:*",
               "192.176.0.0/12:*",
               "192.192.0.0/10:*",
               "193.0.0.0/8:*",
               "194.0.0.0/7:*",
               "196.0.0.0/6:*",
               "200.0.0.0/5:*",
               "208.0.0.0/4:*"
              ]
    },

@pkrivanec
Copy link
Contributor Author

pkrivanec commented Apr 22, 2023

Hi,

I started debugging and found a possible problem with the generation of the ACLPeerCacheMap.

The generateACLPeerCacheMap function in acls.go creates a struct for each destination IP.
This will not scale for large network ranges.

If I understand the code correctly, the code should only look if the peer is included in the destination prefix.
It might be better to store a list of destination prefixes for each source and use the IPNet.Contains function to check if the IP is included.

@kradalby
Copy link
Collaborator

Yes, this is kind of known, the logic was not correct, and the current working version is not attempted optimised, the main goal was for it to be correct. The goal would be to rewrite it later to provide a version that does not do something silly like listing all IPs.

As @pkrivanec the way to do this properly is by doing ip prefixes instead of single IPs, but that required more planning and rewrites, and we needed to get the tests right first, before we can start changing it.

@codyro
Copy link

codyro commented Apr 26, 2023

This unfortunately is a blocker for our organization too.

Edit: I noticed #1377 has a potential fix. I'll give that patch a try and report back in that issue.

kradalby added a commit to kradalby/headscale that referenced this issue Apr 27, 2023
kradalby added a commit to kradalby/headscale that referenced this issue Apr 27, 2023
@kradalby
Copy link
Collaborator

I have rewritten this whole thing in #1381, this is an unoptimised, but "way smarter and cleaner" solution, I have added @Chluz case as a test case so we don't regress.

The initial version was a lot of cladding on a suboptimal temp solution, but this should be closer to something we can live with. I think it should not need optimalisations before it's useful, but I'll look into that as part of ACL refactoring.

The previous few PRs was mostly to get tests that we could be confident in so we now can refactor it without too much worry. I appreciate that it lead to one really bad decision, but now we can write a faster solution without breaking things.

Would be great if someone tests #1381 while I am working on it, and apologies for the inconvenience.

@kradalby kradalby mentioned this issue Apr 27, 2023
kradalby added a commit to kradalby/headscale that referenced this issue Apr 27, 2023
kradalby added a commit to kradalby/headscale that referenced this issue May 1, 2023
kradalby added a commit to kradalby/headscale that referenced this issue May 2, 2023
kradalby added a commit to kradalby/headscale that referenced this issue May 2, 2023
juanfont pushed a commit that referenced this issue May 3, 2023
@kradalby
Copy link
Collaborator

resolved in next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants