-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote concurrency limited by gRPC connections #11801
Comments
Thanks for filing the bug @EricBurnett ! A sample repo through which you can reproduce the issue from the client side: https://github.com/gkousik/bazel-parallelism-test The above Bazel repo contains 500 genrules, with a random number value for each action-env so as to NOT get cache-hits. Each action sleeps for 10 seconds and then generates an output file. When you run build on that repo with a remote-execution pool with 500 bots, I would expect the build to complete in 10 seconds, since all rules are parallelizable and so with 500 bots, I would expect all of them to be sent to RBE at once. What happens though is that, the build proceeds in batches of 200 actions and takes 30+ seconds to complete no matter how many times I run the build.
|
Hitting the same issue. We would have a back-end that scales to O(1000s) of concurrent actions but this is limiting the parallelism to O(100). |
This depends on the server-side configuration. For a server written using grpc-java, the documentation of the NettyServerBuilder says:
I am not sure Bazel should be circumventing the server-side configuration by creating multiple connections in this case. |
@ulfjack I don't think REAPI servers are (or should be) using a per connection limit to indicate to clients the max parallelism they want to support. If the goal is to tell clients an overall limit they want honoured (per IP or endpoint-wide), that should go into Capabilities so that the client can apply it. If it's per-connection, it should not also be interpreted as a limit on the number of connections. Per @werkt , default is unlimited and bazel happily works with servers allowing many more RPCs. The most common case I'm aware of for MAX_CONCURRENT_STREAMS to be set is from load balancers, where multiple connections are a legitimate and appropriate way to spread load across multiple LBs and backends. Are you aware of anyone using it to mean something else? |
Good point. I think one of the goals of HTTP2 was to only have one connection from the client to the server rather than multiple, but that may not apply to gRPC. If you have an LB with a single IP, should the client create multiple connections? The intention of the load balancer is to balance the load to the actual servers, so it isn't immediately obvious to me that you'd need that - the load balancers can terminate the HTTP2 connection and distribute the individual requests to different servers (i.e., a HTTP2 proxy). Do we need to load balance to the load balancers? If so, why would the service not announce multiple IPs? I appreciate that it may not be possible in some cases to control Maybe we should ask the gRPC folks what they think? |
@ulfjack grpc/grpc#21386 (comment) the recommendation from gRPC folks was to create additional connections and is leaning towards "Create a pool of connections and round-robin across it" (as opposed to reactively creating additional connections when a certain threshold of concurrent RPCs is reached). |
Ah, ok. Sorry, I forgot about that in my previous post. What's the right behavior, then? Inspect |
Ulf:
I think so - e.g. consider a 64-core well-networked builder doing a byte-heavy build. My understanding is this will either be throughput constrained (possibly due to the fact that most LBs use at most one core per connection, possibly from HTTP2 bottlenecks itself), or will hotspot the LB (remembering that the bandwidth required of the LB is 2x the bandwidth of the client, since it has to proxy it to the backend as well). I haven't actually sat down and measured it yet though - I know it's hard to exceed ~2Gbps on a single stream, but I haven't yet evaluated our effective per-connection bandwidth limit.
Not an expert in this area, but my understanding: Anycast and DNS. With DNS caching layers IP-per-proxy can lead to roving hotspots and generally scales poorly, and with Anycast it's straightforward to put multiple servers behind one IP anyways. So you'll often see multiple IPs, but a small number like "2" that are functioning more as independent failure domains than having any logical connection to the servers backing them. See e.g. https://aws.amazon.com/blogs/networking-and-content-delivery/using-static-ip-addresses-for-application-load-balancers/ for an example of this pattern. To be clear, this is not universal: in some setups load balancing with one-connection-per-ip is probably just fine. But even in these scenarios, using a few extra connections (e.g. 2 or 3 connections per IP) should not cause any harm - bazel connections will be comparatively few and heavy vs what you might see from "average" clients on the network (web browsers, apps, etc), and should in no way strain anything on "number of connections" grounds.
From what we've seen from java gRPC, and heard from the team when we file bugs there, inspecting MAX_CONCURRENT_STREAMS will be hard as it's learned fairly late in the process (after connections are starting to be opened). But I've looked around and I've also not seen any evidence of anyone defaulting particularly low values there, with 100+ being normal. So my suggestion is to pick a target number of connections as --jobs/50*, learn the number of IPs available, and open ceil(ips/target_connections) per IP. (If the number of IPs is also not knowable at that time, I guess we could just assume 2?) I think this should work fairly well for all REAPI environments - at the small end, someone trying to do a --jobs=2000 build against a single server would see 40 connections be opened, which shouldn't be a problem at all for that server (servers can generally handle 10s of thousands of open connections); at the large end that same build against a LB setup will spread the load over 40 different proxy servers and should see higher effective throughput as a result. *why 50 and not 100? gRPC behaviour again: RPCs are bound to connections early and then queue against them. Uniformly distributing random length RPCs to connections sees them get full non-uniformly, so some connections will have queueing while others have spare capacity. Leaving 2x headroom should make that very unlikely. |
From #11801 > due to bazel displaying full parallelism even if actions are actually queued waiting for gRPC connections This PR makes Bazel display running actions only if it is actually executing remotely as a first step to fix parallelism limitation. With `--jobs=500`, the progress bar is displayed as: ``` [1 / 501] 500 actions running ``` However, the number of actual running actions by remote workers is not 500 since some actions are pending because of the limitation. With this change, the progress bar now become: ``` [1 / 501] 500 actions, 200 running ``` Closes #11922. PiperOrigin-RevId: 326604891
FYI: the fixed size channel pool is replaced with a dynamic channel pool which will open new connections on demand. de8f69d |
Description of the problem / feature request:
For remote RPCs, bazel only opens one connection per server (resolved IP). This constrains the maximum parallelism of a build - gRPC limits the number of in-flight requests per connection to MAX_CONCURRENT_STREAMS, which is a per-server HTTP2 setting, often set to 100 or 128.
In short, this means that when talking to a server resolved to a single IP, bazel can only have ~100 remote executions in flight, regardless of the setting of --jobs. If the server resolves to more IPs, this may be linearly increased, but rarely above 500. Most remote builds can exceed 100x parallelism, and many (esp. builds with long-duration actions) could exceed 500.
Note that due to bazel displaying full parallelism even if actions are actually queued waiting for gRPC connections, it's hard to know just how many users this affects - more than realize it, I'm quite sure. But I've had at least 3 independent projects root cause this reason for limited parallelism at this point, so it seems to be fairly frequent in reality (at least amongst RBE users).
The necessary fix in bazel is to replace the netty 'round_robin' load balancer with one that can open more than 1 connection (subchannel) per resolved address. Maybe by dynamically adding more when bottlenecked, but feeding in a target number of connections as e.g.
max(2, jobs/50)
would probably do just fine.Feature requests: what underlying problem are you trying to solve with this feature?
Enable higher build parallelism for remote builds
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Run a remote build with --jobs=1000 and observe bazel showing '1000 remote'. Then check server perspective and realize bazel never had more than X00 RPCs in flight.
What operating system are you running Bazel on?
Linux (but applies to all)
What's the output of
bazel info release
?3.4.1
Have you found anything relevant by searching the web?
cc @buchgr @bergsieker @gkousik @ulfjack
The text was updated successfully, but these errors were encountered: