Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.eclipse.jetty.client.HttpRequest.send() never returns #8584

Closed
cowwoc opened this issue Sep 14, 2022 · 29 comments · Fixed by #8635
Closed

org.eclipse.jetty.client.HttpRequest.send() never returns #8584

cowwoc opened this issue Sep 14, 2022 · 29 comments · Fixed by #8635
Labels
Bug For general bugs on Jetty side

Comments

@cowwoc
Copy link
Contributor

cowwoc commented Sep 14, 2022

Jetty version(s): 11.0.11

Java version/vendor (use: java -version)

openjdk 18.0.2.1 2022-08-18
OpenJDK Runtime Environment (build 18.0.2.1+1-1)
OpenJDK 64-Bit Server VM (build 18.0.2.1+1-1, mixed mode, sharing)

OS type/version Microsoft Windows [Version 10.0.19044.2006]

Description
Do not report security issues here! See Jetty Security Reports.

How to reproduce?

I'm seeing an intermittent situation where invoking org.eclipse.jetty.client.HttpRequest.send() blocks forever even though I'm using the default idle timeout of 30 seconds.

Here is the stack dump: stacktrace.txt

Here is a summary from Intellij:

248 waiting threads::

"worker-6@6744" prio=5 tid=0x29 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
	  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
	  at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:95)
	  at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:771)
	  at io.bazari.client.java.util.HttpRequestContext.send(HttpRequestContext.java:129)
	  at io.bazari.collector.splinterlands.processor.DownloadBlock.process(DownloadBlock.java:89)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient.processNextBlock(SplinterlandsClient.java:105)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient$$Lambda$7755/0x0000000801682268.run(Unknown Source:-1)
	  at io.bazari.common.exception.WrappingException.wrap(WrappingException.java:73)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient.lambda$run$0(SplinterlandsClient.java:152)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient$$Lambda$7753/0x0000000801681e48.run(Unknown Source:-1)
	  at io.bazari.client.java.processor.WorkScheduler.lambda$execute$0(WorkScheduler.java:76)
	  at io.bazari.client.java.processor.WorkScheduler$$Lambda$7754/0x0000000801682058.run(Unknown Source:-1)
	  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
	  at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
	  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
	  at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

8 runnable threads:

"HttpClient-1254@10319" prio=5 tid=0x4e6 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at sun.nio.ch.WEPoll.wait(WEPoll.java:-1)
	  at sun.nio.ch.WEPollSelectorImpl.doSelect(WEPollSelectorImpl.java:111)
	  at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
	  - locked <merged>(a sun.nio.ch.WEPollSelectorImpl)
	  - locked <merged>(a sun.nio.ch.Util$2)
	  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
	  at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:180)
	  at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:187)
	  at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:604)
	  at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:541)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:446)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:239)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
	  at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$1992/0x000000080115e000.run(Unknown Source:-1)
	  at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
	  at java.lang.Thread.run(Thread.java:833)

1 waiting thread:

"HttpClient@61b65d54-scheduler-1@9457" prio=5 tid=0x427 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506)
	  at java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3464)
	  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3435)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1623)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1170)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

34 waiting threads:

"sshd-SshClient[5b88b8e]-nio2-thread-1@7421" daemon prio=5 tid=0x26a nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506)
	  at java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3464)
	  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3435)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1623)
	  at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

main thread waiting on:

"main@1" prio=5 tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
	  at java.util.concurrent.Semaphore.acquire(Semaphore.java:318)
	  at io.bazari.client.java.processor.WorkScheduler$FlowControl.beforeTask(WorkScheduler.java:160)
	  at io.bazari.client.java.processor.WorkScheduler.execute(WorkScheduler.java:69)
	  at io.bazari.collector.splinterlands.client.SplinterlandsClient.run(SplinterlandsClient.java:152)
	  at io.bazari.collector.Collector.run(Collector.java:47)
	  at io.bazari.collector.Collector.main(Collector.java:28)

1 runnable thread:

"logback-1@1586" daemon prio=5 tid=0x1a nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

2 waiting threads:

"Thread-1002@8083" daemon prio=5 tid=0x41c nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

1 waiting thread:

"scheduler-0@18576" prio=5 tid=0x83a nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

996 waiting threads:

"HttpClient-1053@9359" prio=5 tid=0x41d nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
	  at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:219)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:975)
	  at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1025)
	  at java.lang.Thread.run(Thread.java:833)

1 waiting thread:

"Common-Cleaner@23678" daemon prio=8 tid=0x15 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
	  at jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
	  at java.lang.Thread.run(Thread.java:833)
	  at jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:162)

1 runnable thread:

"Thread-1@6701" daemon prio=5 tid=0x22 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at sun.nio.ch.Iocp.getQueuedCompletionStatus(Iocp.java:-1)
	  at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:323)
	  at java.lang.Thread.run(Thread.java:833)

1 sleeping thread:

"idle-connection-reaper@3799" daemon prio=5 tid=0x1d nid=NA sleeping
  java.lang.Thread.State: TIMED_WAITING
	  at java.lang.Thread.sleep(Thread.java:-1)
	  at software.amazon.awssdk.http.apache.internal.conn.IdleConnectionReaper$ReaperTask.run(IdleConnectionReaper.java:150)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	  at java.lang.Thread.run(Thread.java:833)

1 runnable thread:

"Reference Handler@23674" daemon prio=10 tid=0x2 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at java.lang.ref.Reference.waitForReferencePendingList(Reference.java:-1)
	  at java.lang.ref.Reference.processPendingReferences(Reference.java:253)
	  at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:215)

1 waiting thread:

"Finalizer@23675" daemon prio=8 tid=0x3 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:176)
	  at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:183)

3 runnable threads:

"Signal Dispatcher@23676" daemon prio=9 tid=0x4 nid=NA runnable
  java.lang.Thread.State: RUNNABLE

As far as I can tell, my worker threads are waiting on org.eclipse.jetty.client.HttpRequest.send() to return and it's not clear what Jetty is waiting on. Any ideas?

@cowwoc cowwoc added the Bug For general bugs on Jetty side label Sep 14, 2022
@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 14, 2022

A few more points:

  • I cannot share the application because it's commercial.
  • I am using a single HttpClient instance with 1000 different proxies.
  • Each proxy is mapped to a separate SOCKS tunnel that runs over SSH to a different remote computer.

Here is my HttpClient initialization code:

QueuedThreadPool clientExecutor = new QueuedThreadPool(64000000);
clientExecutor.setName(HttpClient.class.getSimpleName());

List<Proxy> proxies = scope.getNetworkProxies();

HttpClient client = new HttpClient();
client.setExecutor(scope.getClientExecutor());
client.setUserAgentField(new HttpField(HttpHeader.USER_AGENT, USER_AGENT));
client.setMaxConnectionsPerDestination(64000000);

List<ProxyConfiguration.Proxy> clientProxies = client.getProxyConfiguration().getProxies();
int proxyNumber = 0;
for (Proxy proxy : proxies)
{
	InetSocketAddress address = (InetSocketAddress) proxy.address();
	Socks4Proxy socksProxy = new Socks4Proxy(new Origin.Address(address.getHostString(), address.getPort()), false);
	TaggedProxy taggedProxy = new TaggedProxy(socksProxy, "proxy" + proxyNumber++);
	clientProxies.add(taggedProxy);
}
try
{
	client.start();
}
catch (Exception e)
{
	throw WrappingException.wrap(e);
}

I doubt I'll be able to reproduce this with Jetty verbose logging enabled because the issue might be timing-related.

@joakime
Copy link
Contributor

joakime commented Sep 14, 2022

Is this HttpClient only using HTTP/1 or is it setup for HTTP/2 as well?

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 14, 2022

@joakime I'm fairly sure I'm only using HTTP/1, but how can I check to be sure?

PS: I tried reproducing the issue with DEBUG logs enabled and was unable to, but it occurs (eventually) if Jetty's logging is set to WARN.

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 14, 2022

Another thing that comes to mind is that I'm using Intellij's Hotswap feature to modify the application's code without restarting the JVM. Is it possible that hotswap is somehow screwing up Jetty's state?

I'll let you know if I run across this bug without using hotswap.

@joakime joakime assigned joakime and unassigned joakime Sep 15, 2022
@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 19, 2022

This looks like a real bug. I've reproduced it without the use of hot swap, even running completely from command-line without any sort of debugging taking place.

How do we go about debugging this further?

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 20, 2022

In the minutes preceding this hang, I see request response time increases from a few seconds to multiple minutes.

I'm not sure if this is thread starvation or some sort of network throttling introduced by the servers I am hitting.

Is it theoretically possible for a server to keep a connection alive forever yet send no actual data over the wire (as a form of network throttling)? If so, are there any sort of timeouts I can set on the client side that would force the connection to timeout?

UPDATE:
Sorry, I see now that this is what HttpClient.setIdleTimeout() is meant to handle and I am using the default value of 30 seconds. Maybe the connection is not totally idle? Maybe the server is responding extremely slow (one character every 29 seconds)? Is there a way to detect/handle that possibility?

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 20, 2022

I think I found a solution. I registered event listeners with Request.onRequestBegin() and Request.onResponseBegin() that invoke Request.abort() or Response.abort() if the overall request time surpasses some limit. I'm pretty sure that this will solve the aforementioned hangs.

I'll let you know in a couple of days.

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 22, 2022

Unfortunately, the hang still occurs. I would appreciate your advice on how to best track this back to the source. It's difficult to figure out because the issue is intermittent. Are there any sort of event listeners I should make use of? Any sort of logging I should enable?

Remember, if I enable too many logs the problem seems to go away... so it's probably some sort of race condition.

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 23, 2022

@joakime @sbordet
I managed to reproduce the problem with DEBUG logs enabled. Can one of the committers please work with me to isolate which portions of the log file you want me to share? I've got 16GB worth, so I assume I can't attach the full thing.

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 28, 2022

Any idea what would cause an HTTP request to stay in the queued state forever? I've got maxConnectionsPerDestination set to Integer.MAX_VALUE and the executor is Executors.newThreadPerTaskExecutor() which again should have no limit. In this case, do you ever expect requests to queue for a long time?

Also, does invoking Request.abort() also abort a response if it's in progress? Or do I need to invoke Response.abort() separately?

Thank you.

@sbordet
Copy link
Contributor

sbordet commented Sep 28, 2022

@cowwoc I'll look into this. Are your logs 16 GiB compressed?
Can you split the compressed file and attach the single split files?

@sbordet
Copy link
Contributor

sbordet commented Sep 29, 2022

@cowwoc the DEBUG logs will be great if you can provide them.

My impression is that this has something to do with the SOCKS configuration.

The DNS resolution should be subject to HttpClient.addressResolutionTimeout, but only if it's SocketAddressResolver.Async. Make sure you use that.

After the address is resolved, the connect attempt is subject to the HttpClient.connectTimeout.
So again it should not wait indefinitely.

After the connection is created, we have an EndPoint which is subject to HttpClient.idleTimeout.

It could be that we don't handle well the case where the SOCKS proxy just does not reply ever to the client.
I'll try to write a test case for this and report back.

What class is TaggedProxy?

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 29, 2022

@sbordet Sorry for the delayed response. The original logs I collected did not contain enough information to track the transition from application threads to Jetty's threads and back. As usually happens in such cases, adding logs made the problem harder to reproduce. I'll let you know as soon as I've got updated logs showing this problem.

What's weird is that I've added a scheduled task that invokes Request.abort() after ~30 seconds. You'd think that this would make the problem go away, but that doesn't seem to be the case.

Would synchronous SOCKS address resolution abort correctly when Request.abort() is invoked? Or could it block?

TaggedProxy is a mechanism you suggested in another issue. It enables users to select which SOCKS proxy a particular Request will use. Here is the implementation:

/**
 * Selects a proxy based on the Request's tag.
 */
private static final class TaggedProxy extends ProxyConfiguration.Proxy
{
	private final ProxyConfiguration.Proxy delegate;

	/**
	 * @param delegate the proxy to delegate to
	 * @param tag      the tag associated with the proxy
	 * @throws NullPointerException if {@code host} is null
	 */
	public TaggedProxy(ProxyConfiguration.Proxy delegate, Object tag)
	{
		this(delegate, delegate.getOrigin(), tag);
	}

	/**
	 * @param delegate       the proxy to delegate to
	 * @param delegateOrigin the origin of the delegate
	 * @param tag            the tag associated with the proxy
	 * @throws NullPointerException if {@code delegate} or {@code delegateOrigin} are null
	 */
	private TaggedProxy(ProxyConfiguration.Proxy delegate, Origin delegateOrigin, Object tag)
	{
		super(new Origin(delegateOrigin.getScheme(), delegateOrigin.getAddress(), tag,
			delegateOrigin.getProtocol()), delegate.getSslContextFactory());
		this.delegate = delegate;
	}

	@Override
	public boolean matches(Origin origin)
	{
		return super.matches(origin) && Objects.equals(origin.getTag(), getOrigin().getTag());
	}

	@Override
	public ClientConnectionFactory newClientConnectionFactory(ClientConnectionFactory connectionFactory)
	{
		return delegate.newClientConnectionFactory(connectionFactory);
	}

	@Override
	public Origin.Address getAddress()
	{
		return delegate.getAddress();
	}

	@Override
	public boolean isSecure()
	{
		return delegate.isSecure();
	}

	@Override
	public SslContextFactory.Client getSslContextFactory()
	{
		return delegate.getSslContextFactory();
	}

	@Override
	public Origin.Protocol getProtocol()
	{
		return delegate.getProtocol();
	}

	@Override
	public Set<String> getIncludedAddresses()
	{
		return delegate.getIncludedAddresses();
	}

	@Override
	public Set<String> getExcludedAddresses()
	{
		return delegate.getExcludedAddresses();
	}

	@Override
	public URI getURI()
	{
		return delegate.getURI();
	}
}

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 29, 2022

@sbordet Got it!

  1. I ran a debug session in Intellij with Jetty DEBUG level logs enabled.
  2. The application ran the following code in some grandiose loop:
AtomicReference<RequestState> requestState = new AtomicReference<>();
try
{
request.onRequestQueued(theRequest ->
{
	requestState.set(RequestState.REQUEST_QUEUED);
	log.info("[{}] requestQueued, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestBegin(theRequest ->
{
	requestState.set(RequestState.REQUEST_BEGIN);
	log.info("[{}] requestBegin, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestContent((theRequest, content) ->
{
	requestState.set(RequestState.REQUEST_CONTENT);
	log.info("[{}] requestContent, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestHeaders(theRequest ->
{
	requestState.set(RequestState.REQUEST_HEADERS);
	log.info("[{}] requestHeaders, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestSuccess(theRequest ->
{
	requestState.set(RequestState.REQUEST_SUCCESS);
	log.info("[{}] requestSuccess, tag: {}",
		System.identityHashCode(theRequest), theRequest.getTag());
});
request.onRequestFailure((theRequest, cause) ->
{
	requestState.set(RequestState.REQUEST_FAILURE);
	log.info("[{}] requestFailure, tag: {}",
		System.identityHashCode(theRequest), theRequest.getTag(), cause);
});
request.onRequestCommit(theRequest ->
{
	requestState.set(RequestState.REQUEST_COMMIT);
	log.info("[{}] requestCommit, tag: {}", System.identityHashCode(theRequest), theRequest.getTag());
});
request.onResponseBegin(theResponse ->
{
	requestState.set(RequestState.RESPONSE_BEGIN);
	log.info("[{}] ResponseBegin, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseContent((theResponse, content) ->
{
	requestState.set(RequestState.RESPONSE_CONTENT);
	log.info("[{}] responseContent, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseHeaders(theResponse ->
{
	requestState.set(RequestState.RESPONSE_HEADERS);
	log.info("[{}] responseHeaders, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseSuccess(theResponse ->
{
	requestState.set(RequestState.RESPONSE_SUCCESS);
	log.info("[{}] responseSuccess, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag());
});
request.onResponseFailure((theResponse, failure) ->
{
	requestState.set(RequestState.RESPONSE_FAILURE);
	log.info("[{}] responseFailure, tag: {}", System.identityHashCode(theResponse.getRequest()),
		theResponse.getRequest().getTag(), failure);
});
request.onComplete(result ->
{
	requestState.set(RequestState.COMPLETE);
	log.info("[{}] complete, tag: {}", System.identityHashCode(result.getRequest()),
		result.getRequest().getTag(), result.getFailure());
});
response = request.send();

[...]

private enum RequestState
{
	REQUEST_QUEUED,
	REQUEST_BEGIN,
	REQUEST_CONTENT,
	REQUEST_HEADERS,
	REQUEST_SUCCESS,
	REQUEST_FAILURE,
	REQUEST_COMMIT,
	RESPONSE_BEGIN,
	RESPONSE_CONTENT,
	RESPONSE_HEADERS,
	RESPONSE_SUCCESS,
	RESPONSE_FAILURE,
	COMPLETE
}
  1. Eventually I got the error I was looking for: 14:38:08.226 [main] INFO i.b.c.j.p.WorkScheduler$FlowControl.beforeTask() - Waiting to acquire a permit: FlowControl.WorkScheduler
  2. I paused the debugging session and took a thread dump.
  3. There were 6 threads that were blocked on HttpRequest.send().
  4. I jumped into the stack of each one and checked the value of requestState. They were all in the REQUEST_SUCCESS state.
  5. Assuming that the next event after onRequestSuccess() is onResponseBegin() it means that something is preventing the transition.

I am attaching the thread dump for your review. The log output is 47GB uncompressed. Are you able to extract RAR or 7ZIP formats if I use them instead of classic zip? The RAR compressed size should be around 2GB.

thread-dump.txt

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 29, 2022

The DNS resolution should be subject to HttpClient.addressResolutionTimeout, but only if it's SocketAddressResolver.Async.

It looks like this is the default so I should be good: https://github.com/eclipse/jetty.project/blob/e3b0e80857cd3f2218f925702b22466a0c4e8bf7/jetty-client/src/main/java/org/eclipse/jetty/client/HttpClient.java#L211

It could be that we don't handle well the case where the SOCKS proxy just does not reply ever to the client.
I'll try to write a test case for this and report back.

That, or the server itself could be hanging the response forever as a form of throttling.

Were you successful in reproducing this in a testcase?

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 30, 2022

Looking at https://github.com/apache/mina-sshd/blob/65b80b57c9a9b474e8b42a6db6908425e92680a6/sshd-core/src/main/java/org/apache/sshd/core/CoreModuleProperties.java#L280 and https://github.com/apache/mina-sshd/blob/65b80b57c9a9b474e8b42a6db6908425e92680a6/sshd-core/src/main/java/org/apache/sshd/core/CoreModuleProperties.java#L338 it looks like the default sshd read timeouts are infinity.

If the server does not send a reply then sshd will never time out and it is left to Jetty to enforce some sort of timeout. I assume you already handle this case if there is no proxy...? If so, I would expect the same timeout configuration option to get used for when a proxy exists.

@sbordet
Copy link
Contributor

sbordet commented Sep 30, 2022

I might have missed it, but how is SSH involved here? You open an SSH tunnel and you send SOCKS through that SSH tunnel?

Also, can you clarify, when you send the request, if you set a total timeout, an idle timeout and whether you use the blocking send() API?

sbordet added a commit that referenced this issue Sep 30, 2022
Fixed handling of the idle timeout in case the SOCKS proxy does not reply to the SOCKS bytes.

Signed-off-by: Simone Bordet <[email protected]>
@sbordet sbordet linked a pull request Sep 30, 2022 that will close this issue
@sbordet
Copy link
Contributor

sbordet commented Sep 30, 2022

@cowwoc can you build and try the changes in branch jetty-10.0.x-8584-socks-idle-timeout?
They should apply cleanly to Jetty 11.

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 30, 2022

I might have missed it, but how is SSH involved here? You open an SSH tunnel and you send SOCKS through that SSH tunnel?

Correct.

Also, can you clarify, when you send the request, if you set a total timeout, an idle timeout and whether you use the blocking send() API?

I couldn't find a mechanism to enforce total timeout in Jetty (let me know if I missed it) so I implemented my own. I schedule Request.abort() to run sometime in the future. If send() returns first, it cancels the event. Otherwise I am assuming the pending request/response should abort correctly.

I did not set an idle timeout before running into this bug. After I did, I decided to set a total-request timeout instead because a malicious server could game idleTimeout by sending bytes ~30 seconds apart. If the request or response body is very long the server could throttle the client for a very long period of time.

Yes, I am using the blocking send() API.

I'll let you know if jetty-10.0.x-8584-socks-idle-timeout works. Last time it took 2 hours to reproduce the hang so I'll only be able to get back to you in a couple of hours.

@sbordet
Copy link
Contributor

sbordet commented Sep 30, 2022

FTR, request.timeout(...) enforces a total timeout on the request/response cycle, no need to write your own scheduling of Request.abort().

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 30, 2022

@sbordet I'm 1:15 hours into the test and so far so good. Maybe it's a coincidence, but the transfer rates I'm getting now are way better than before... If more and more connections were converted into zombies in the past, it's conceivable that this would reduce the overall transfer speed.

Assuming that this fix works, is there a workaround I can use in production until you publish an official release? Or am I dependent on this private build?

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 30, 2022

Nevermind. I just got a hang :(
There is a chance the test was using the wrong Jetty build. I am going to clean/build and try again.

@cowwoc
Copy link
Contributor Author

cowwoc commented Sep 30, 2022

@sbordet I think your fix works. I left the application running for 5 hours and it didn't hang once.

Going back to my earlier question: is there a workaround that I can use in production until the fix makes it into an official release? I assume the next release is 3 months away which is an eternity for me :)

@sbordet
Copy link
Contributor

sbordet commented Oct 3, 2022

@cowwoc I think that if you set a total timeout to your requests you can workaround the immediate problem of requests never returning.
However, the proper fix would work better, so you'll have to wait for the next release.

@sbordet
Copy link
Contributor

sbordet commented Oct 4, 2022

@cowwoc do you confirm that the changes fix this issue?

@cowwoc
Copy link
Contributor Author

cowwoc commented Oct 4, 2022

@sbordet Yes. It looks fixed to me. Thank you.

sbordet added a commit that referenced this issue Oct 5, 2022
Fixed handling of the idle timeout in case the SOCKS proxy does not reply to the SOCKS bytes.

Signed-off-by: Simone Bordet <[email protected]>
@cowwoc
Copy link
Contributor Author

cowwoc commented Nov 7, 2022

@sbordet Apologies, but I think this bug is not fixed. Should I open a new bug report or do you prefer re-opening this ticket?

To clarify, I checked out the v11 branch (commit 77ceb41), built a local SNAPSHOT version and used that to run my application.

I ended up with a hang which again seems centered around send() never returning, in spite of a request timeout being set. I am attaching an updated thread dump for your review. Please note that I using the new thread dump format that is exported by jcmd pid Thread.dump_to_file because it is the only way to retrieve the stack trace of virtual threads.

thread-dump.txt

For example, take a look at Thread #71683 "BlockingTask-71427" virtual.

Also, you should note that I discovered a JDK bug a few days back that might be relevant: https://youtrack.jetbrains.com/issue/IDEA-305072

The above hang occurred while running my application under IDEA in debug mode. I have never seen this bug affect virtual threads before, but anything is possible...

Let me know if you need me to collect any other logs, and what you think of the thread dump.

@sbordet
Copy link
Contributor

sbordet commented Nov 8, 2022

@cowwoc, first of all virtual threads are experimental, preview, etc. -- could be an OpenJDK bug.
The stack trace alone unfortunately is not enough to diagnose any problem.

Does it happen when you run without virtual threads?

Do you have DEBUG logs for this failure?

@cowwoc
Copy link
Contributor Author

cowwoc commented Nov 23, 2022

@sbordet I think you're right. Upon digging further into the issue I noticed that the problem went away if I switched from thread-per-request to a fixed-size thread pool. Eventually this led me to the discovery of a memory leak in the JDK as well as in IDE I was using. Here are the relevant links in case anyone else is interested:

https://youtrack.jetbrains.com/issue/IDEA-305072
https://bugs.openjdk.org/browse/JDK-8296463

The JDK issues have been fixed. Once IDEA fixes their memory leak I will retest and hopefully the hang will disappear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants