Skip to content

Add HTTP proxy support for WebSocket leaf node connections#7242

Merged
neilalexander merged 1 commit intonats-io:mainfrom
danbailey1000:proxy-feature-clean
Oct 3, 2025
Merged

Add HTTP proxy support for WebSocket leaf node connections#7242
neilalexander merged 1 commit intonats-io:mainfrom
danbailey1000:proxy-feature-clean

Conversation

@danbailey1000
Copy link
Copy Markdown

Changes

  • Add proxy configuration to RemoteLeafOpts in server/opts.go
  • Implement HTTP CONNECT tunnel establishment in server/leafnode.go
  • Support both authenticated and non-authenticated proxy connections

Use Case

Enables WebSocket leaf node connections in corporate environments where direct connections are blocked by firewalls, but HTTP proxy access is available.

Configuration Example

leafnodes {
  remotes [{
    urls: ["ws://remote-server.example.com:443/ws"]
    
    proxy {
      url: "http://corporate-proxy.company.com:8080"
      username: "proxy_user"
      password: "proxy_pass"
      timeout: "30s"
    }
  }]
}

Signed-off-by: Dan Bailey danbailey1000@msn.com

Copy link
Copy Markdown
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. However, this PR cannot be accepted as-is. We would need configuration file parsing, options validation and some sort of test. But more importantly, I have a concern about how this would work. Suppose that you configure the server to have a remote with a non TLS connection, but the proxy uses TLS to communicate with the HUB that requires TLS, then this would not work because the INFO protocol received from the HUB would indicate that TLS is required and the leafnode side of things would not be able to handle that. The same if situation is reversed (leaf wants TLS but proxy connects to HUB without).

@danbailey1000
Copy link
Copy Markdown
Author

Thank you for the contribution. However, this PR cannot be accepted as-is. We would need configuration file parsing, options validation and some sort of test. But more importantly, I have a concern about how this would work. Suppose that you configure the server to have a remote with a non TLS connection, but the proxy uses TLS to communicate with the HUB that requires TLS, then this would not work because the INFO protocol received from the HUB would indicate that TLS is required and the leafnode side of things would not be able to handle that. The same if situation is reversed (leaf wants TLS but proxy connects to HUB without).

I have made changes based on your feedback

  • There is now a check that leafnode and Hub tls configuration need to be aligned (both enabled or both disabled) for the proxy to work. A mismatch will raise an error
  • Add a tests for this new proxy support feature

@danbailey1000
Copy link
Copy Markdown
Author

@kozlovic I have addressed your feedback ready for another review, Thanks Dan

@danbailey1000
Copy link
Copy Markdown
Author

@kozlovic thanks for your review. I have made another round of changes ready for another review. Thanks Dan

@danbailey1000
Copy link
Copy Markdown
Author

@kozlovic thanks for review. I have made changes suggested ready for review. Thanks Dan

Copy link
Copy Markdown
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have enabled the workflow run and the linter/checks spotted few things. I would recommend that you rebase and squash all your commits into 1 and you need to have a signoff (see https://github.com/nats-io/nats-server/blob/main/CONTRIBUTING.md)

@danbailey1000
Copy link
Copy Markdown
Author

I have enabled the workflow run and the linter/checks spotted few things. I would recommend that you rebase and squash all your commits into 1 and you need to have a signoff (see https://github.com/nats-io/nats-server/blob/main/CONTRIBUTING.md)

@kozlovic I have made changes and squashed into one signed commit. Thanks Dan

@danbailey1000
Copy link
Copy Markdown
Author

@kozlovic I noticed an issue with my squash commit. Now corrected. Thanks Dan

@neilalexander neilalexander added the post-freeze We'll come back to this after the freeze period label Sep 9, 2025
Copy link
Copy Markdown
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the requested changes again. The main issue is that we need to ensure that the check for the proxy response is not consuming data from the remote server, which we do right now which would cause some of the tests to fail.

We need to make those tests robust in that you should delay the closing of the connection as to not cause the read to fail, but also set some deadlines so that if there is an issue we don't block there forever.

Naming your tests with the same prefix would make it easier to run them all (with -count=10 -failfast or something like that to detect flappers). I would recommend that you run them locally with -count more than 1 to see if you catch flappers like I did.

Thank you for your patience!

@kozlovic
Copy link
Copy Markdown
Member

kozlovic commented Sep 9, 2025

@danbailey1000 Also, have you tried it with a real http proxy and a NATS Server hub that requires TLS? I wonder if it will even work since most TLS actors will fail if the remote is sending plain text first, which by default NATS Server does (sends the INFO protocol in plain text when getting a TCP connection). We did add the ability to do "TLS first" (see TLSHandshakeFirst option in both leafnode and leafnode remote options), so there should be a way to make it work, but I would appreciate if you had a chance to actually run the feature in real life setup and report that it works as expected. Thanks!

@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Sep 9, 2025

usually when putting a proxy in front of nats it should work by using the TLS first / tls_handshake_first from the remote client side making the connection then on the server port in the tls config putting an empty block like this along with the allow_non_tls flag so that INFO announces tls_available to the clients:

tls {}
allow_non_tls: true

This setup works for client connections 4222 at least.

@danbailey1000
Copy link
Copy Markdown
Author

@danbailey1000 Also, have you tried it with a real http proxy and a NATS Server hub that requires TLS? I wonder if it will even work since most TLS actors will fail if the remote is sending plain text first, which by default NATS Server does (sends the INFO protocol in plain text when getting a TCP connection). We did add the ability to do "TLS first" (see TLSHandshakeFirst option in both leafnode and leafnode remote options), so there should be a way to make it work, but I would appreciate if you had a chance to actually run the feature in real life setup and report that it works as expected. Thanks!

@kozlovic I have added some tests that uses real nats hub and leaf servers configured via proxy and using TLS and using the tlshandshakefirst and confirmed it is working as expected.

@danbailey1000
Copy link
Copy Markdown
Author

danbailey1000 commented Sep 10, 2025

@kozlovic @wallyqs I have pushed changes that address the points in your last review. I can squash again in single commit if you want. Thx Dan

Copy link
Copy Markdown
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That went too far ;-) By real tests, I meant that you have an actual HTTP proxy and you verified that it was working as expected. Please undo some of the changes that you've made in these last commits (remove all added tests files and the mockup server) and I will review again. Note that the use of proxyTunnelConn is similar to our tlsMixConn. Maybe we could consolidate or you could use tlsMixConn as-is.

Running your tests I got:

=== RUN   TestLeafNodeHttpProxyWithoutTLSHandshakeFirstFailure
==================
WARNING: DATA RACE
Read at 0x00c0007e3383 by goroutine 1762:
  testing.(*common).logDepth()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1053 +0x8c
  testing.(*common).log()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1046 +0x80
  testing.(*common).Logf()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1097 +0x58
  github.com/nats-io/nats-server/v2/server.(*mockTLSFirstNATSServer).handleTLSConnection()
      /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/leafnode_proxy_test.go:1129 +0x3a8
  github.com/nats-io/nats-server/v2/server.(*mockTLSFirstNATSServer).handleConnection()
      /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/leafnode_proxy_test.go:1099 +0x208
  github.com/nats-io/nats-server/v2/server.(*mockTLSFirstNATSServer).acceptConnections.gowrap1()
      /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/leafnode_proxy_test.go:1074 +0x58

Previous write at 0x00c0007e3383 by goroutine 1755:
  testing.tRunner.func1()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1779 +0x5dc
  runtime.deferreturn()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/runtime/panic.go:610 +0x5c
  testing.(*T).Run.gowrap1()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1851 +0x40

Goroutine 1762 (running) created at:
  github.com/nats-io/nats-server/v2/server.(*mockTLSFirstNATSServer).acceptConnections()
      /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/leafnode_proxy_test.go:1074 +0x34
  github.com/nats-io/nats-server/v2/server.createMockTLSFirstNATSServer.gowrap1()
      /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/leafnode_proxy_test.go:1059 +0x40

Goroutine 1755 (finished) created at:
  testing.(*T).Run()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1851 +0x684
  testing.runTests.func1()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:2279 +0x7c
  testing.tRunner()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:1792 +0x180
  testing.runTests()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:2277 +0x77c
  testing.(*M).Run()
      /Users/ik/dev/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.darwin-arm64/src/testing/testing.go:2142 +0xb68
  github.com/nats-io/nats-server/v2/server.TestMain()
      /Users/ik/dev/go/src/github.com/nats-io/nats-server/server/sublist_test.go:1607 +0x374
  main.main()
      _testmain.go:5533 +0x114
==================
=== RUN   TestLeafNodeHttpProxyWithoutTLSHandshakeFirstFailure/connection_fails_without_TLS_handshake_first
=== NAME  TestLeafNodeHttpProxyWithoutTLSHandshakeFirstFailure
    testing.go:1490: race detected during execution of test
=== NAME  TestLeafNodeHttpProxyWithoutTLSHandshakeFirstFailure/connection_fails_without_TLS_handshake_first
    leafnode_proxy_test.go:1031: ✅ Connection correctly failed without TLSHandshakeFirst option
--- FAIL: TestLeafNodeHttpProxyWithoutTLSHandshakeFirstFailure (0.00s)
    --- PASS: TestLeafNodeHttpProxyWithoutTLSHandshakeFirstFailure/connection_fails_without_TLS_handshake_first (0.00s)

Not asking to fix, instead remove those tests...

@danbailey1000
Copy link
Copy Markdown
Author

@kozlovic tests now removed. Thx

  • Not sure why the tests did not run for you. I am running on windows machine, and I noticed other many tests do not pass on windows as I guess they are designed to execute on linux machines.
  • I am not the admin of the real proxy and NATS servers where we ran into the proxy issues and NATS was not using tls at the moment hence I made a test starting an actual NATS server and leaf and simulated proxy. I think this gives more confidence that these changes work with tls also.

Copy link
Copy Markdown
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review and apply the proposed fix for the tests. Also, you would need to squash all your commits (and possibly rebase to take advantage of the latest changes in main - if any) because each commit needs to be signed off. If I were to trigger the CI tests, it would fail because of that. Thanks again and appreciate your patience.

@danbailey1000
Copy link
Copy Markdown
Author

Please review and apply the proposed fix for the tests. Also, you would need to squash all your commits (and possibly rebase to take advantage of the latest changes in main - if any) because each commit needs to be signed off. If I were to trigger the CI tests, it would fail because of that. Thanks again and appreciate your patience.

@kozlovic Patch now applied (thanks for that :)). Squashed changes into one commit.

Copy link
Copy Markdown
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for your contribution!

@neilalexander
Copy link
Copy Markdown
Member

Will need to hold merging for now as we're in a feature freeze ahead of 2.12. Will come back to this afterwards. Thanks!

@acormier-maia
Copy link
Copy Markdown

acormier-maia commented Sep 11, 2025

@danbailey1000 I have made some small changes to your PR and tested using a proxy with a regular tcp nats connection which was successful. AFAIK an HTTP proxy which implements the CONNECT method should not care whether the protocol on top is HTTP or raw TCP (nats).

PR is up here: https://github.com/danbailey1000/nats-server/pull/1/files

@danbailey1000
Copy link
Copy Markdown
Author

@danbailey1000 I have made some small changes to your PR and tested using a proxy with a regular tcp nats connection which was successful. AFAIK an HTTP proxy which implements the CONNECT method should not care whether the protocol on top is HTTP or raw TCP (nats).

PR is up here: https://github.com/danbailey1000/nats-server/pull/1/files

@acormier-maia thanks a lot. Ideally we can also include the tcp support. This PR has been approved already so I am not sure the best approach now. Maybe wait until this PR is merged, and then we can open another PR to add the tcp support. @kozlovic what would you recommend? thx

@kozlovic
Copy link
Copy Markdown
Member

@danbailey1000 Yes, maybe wait for it to be merged and then another PR can be submitted.

@neilalexander
Copy link
Copy Markdown
Member

Would you mind rebasing on top of latest main please?

@danbailey1000 danbailey1000 requested a review from a team as a code owner October 1, 2025 18:13
@danbailey1000
Copy link
Copy Markdown
Author

Would you mind rebasing on top of latest main please?

@neilalexander now done. Thanks

const sharedSysAccDelay = 250 * time.Millisecond

// establishHTTPProxyTunnel establishes an HTTP CONNECT tunnel through a proxy server
func establishHTTPProxyTunnel(proxyURL, targetHost string, timeout time.Duration, username, password string) (net.Conn, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would really rather us not build and parse HTTP requests/responses by hand here when the Go standard library has the ability to do this for us, i.e. something like this:

	conn, err := natsDialTimeout("tcp", proxyAddr.Host, timeout)
	if err != nil {
		return nil, fmt.Errorf("failed to connect to proxy: %w", err)
	}

	req := &http.Request{
		Method: http.MethodConnect,
		URL:    &url.URL{Opaque: targetHost}, // Opaque is required here
		Host:   targetHost,
		Header: make(http.Header),
	}
 	// do we need auth here?
	req.Header.Set("Proxy-Authorization", "Basic " + ...)

	if err := req.Write(conn); err != nil {
		conn.Close()
		return nil, fmt.Errorf("req.Write: %w", err)
	}

	resp, err := http.ReadResponse(bufio.NewReader(conn), req)
	if err != nil {
		conn.Close()
		return nil, fmt.Errorf("http.ReadResponse: %w", err)
	}
	if resp.StatusCode != http.StatusOK {
		conn.Close()
		return nil, fmt.Errorf("status code: %s", resp.Status)
	}

	// now use "conn"

We already have net/http imported so this is fine to use.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neilalexander I have changed this to now use the standard http library. Thx

Enables NATS leaf node servers to connect through corporate HTTP proxies to NATS Hub servers.

Configuration options:
- proxy_url: HTTP proxy server URL (http/https schemes only)
- username/password: Optional proxy authentication credentials
- Validates proxy compatibility with WebSocket URLs and TLS requirements

Signed-off-by: Dan Bailey <danbailey1000@msn.com>
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@neilalexander neilalexander merged commit 78e2dac into nats-io:main Oct 3, 2025
44 of 45 checks passed
neilalexander added a commit that referenced this pull request Oct 3, 2025
Includes the following:

- #7374
- #7373
- #7377
- #7380
- #7382
- #7381
- #7364
- #7384
- #7385
- #7388
- #7386
- #7391
- #7242

Signed-off-by: Neil Twigg <neil@nats.io>
kozlovic added a commit that referenced this pull request Oct 18, 2025
Saw the test TestConfigCheck flap. The reason is that the proxy
validation was done when parsing the "proxy" field, but there was
no guarantee that this field would be parsed after "urls", which
may lead to validation being successful (because there was no
URLs added yet). Move the check after all fields have been parsed
for a given remote.

Relates to #7242

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
neilalexander pushed a commit that referenced this pull request Oct 30, 2025
Saw the test TestConfigCheck flap. The reason is that the proxy
validation was done when parsing the "proxy" field, but there was
no guarantee that this field would be parsed after "urls", which
may lead to validation being successful (because there was no
URLs added yet). Move the check after all fields have been parsed
for a given remote.

Relates to #7242

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
neilalexander pushed a commit that referenced this pull request Oct 30, 2025
Saw the test TestConfigCheck flap. The reason is that the proxy
validation was done when parsing the "proxy" field, but there was
no guarantee that this field would be parsed after "urls", which
may lead to validation being successful (because there was no
URLs added yet). Move the check after all fields have been parsed
for a given remote.

Relates to #7242

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

post-freeze We'll come back to this after the freeze period

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants