Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(periodic): config for maxage/maxsize to prevent recording upload timeouts due to large filesize #96

Merged
merged 7 commits into from
Apr 18, 2023

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Apr 11, 2023

  • add configs for periodic maxage and maxsize
  • null out field on state transition
  • split between dedicated thread and worker pool

Fixes #95
Depends on #99

Adds two new config parameters for the maxage/maxsize JFR properties. These two properties could already be controlled previously for recordings that are uploaded when the host JVM is exiting, but periodically pushed recordings would always push the entire available JFR repository contents. It's likely that this results in overlapping recording chunks being pushed to the server frequently, wasting network bandwidth and storage capacity. The new config parameters can be tuned to minimize this waste. The maxsize is not applied by default, but the maxage is, and the default data age is taken as 1.5x the harvester period. This will result in some overlap of recording chunks on each push, but probably much less than previously for common harvester periods and JFR repository sizes.

Also included are some fixes to ensure that one thread is responsible for managing the harvester and the state that it controls, and to ensure that the harvester does not get into a bad spinning state that I've seen when uploads fail. I have also seen uploads fail when the server sends the registration refresh POST signal, which would cause the spinning behaviour. In this PR single uploads that overlap(?) with handling the registration signal may still fail, but they are handled gracefully and the periodic push resumes as normal on the next scheduled attempt.


"Spinning" fix testing

Use the following Cryostat smoketest.sh:

diff --git a/smoketest.sh b/smoketest.sh
index 4aebf722..e5622ad0 100755
--- a/smoketest.sh
+++ b/smoketest.sh
@@ -159,7 +159,7 @@ runDemoApps() {
         --env CRYOSTAT_AGENT_TRUST_ALL="true" \
         --env CRYOSTAT_AGENT_AUTHORIZATION="Basic $(echo user:pass | base64)" \
         --env CRYOSTAT_AGENT_REGISTRATION_PREFER_JMX="true" \
-        --env CRYOSTAT_AGENT_HARVESTER_PERIOD_MS=60000 \
+        --env CRYOSTAT_AGENT_HARVESTER_PERIOD_MS=15000 \
         --env CRYOSTAT_AGENT_HARVESTER_MAX_FILES=10 \
         --rm -d quay.io/andrewazores/quarkus-test:latest

CRYOSTAT_DISCOVERY_PING_PERIOD=30000 sh smoketest.sh. This sets the discovery callback POST ping signal to occur every 30 seconds. Every 15 seconds the agent should try to push a harvested JFR file to the server. Before this PR, the agent will get into a bad "spinning" state quite easily whenever it needs to reregister itself after the POST signal. After the PR the same root failure can still be observed, but results in the agent simply trying again later and succeeding.

maxage/maxsize testing

TODO determine a quicker way. I have observed some issues with the agent trying to push very large files and timing out when leaving the standard smoketest.sh setup running for long periods of time, but I'm not sure exactly how long this takes or if there are other extenuating circumstances that also contribute to the problems I have seen.

@andrewazores andrewazores added the feat New feature or request label Apr 11, 2023
@andrewazores andrewazores force-pushed the periodic-maxage-maxsize branch from c690b39 to d7f394d Compare April 11, 2023 23:46
@andrewazores andrewazores marked this pull request as ready for review April 11, 2023 23:47
@andrewazores andrewazores requested review from tthvo and maxcao13 April 11, 2023 23:48
README.md Outdated Show resolved Hide resolved
@andrewazores andrewazores force-pushed the periodic-maxage-maxsize branch from 1c49e60 to 4aab4d7 Compare April 13, 2023 15:17
@andrewazores
Copy link
Member Author

@tthvo or @maxcao13 , any more comments?

@maxcao13
Copy link
Member

I'll take a look.

@maxcao13
Copy link
Member

I assume the maxAge cannot be smaller than the period? I tried this setting the CRYOSTAT_AGENT_HARVESTER_MAX_AGE_MS to 600ms and viewing the data in Grafana still showed 15 sec of jfr recording time. I also tried max size = 100, but that didn't seem to have an effect either because the minimum recording chunk size is 1MB? I will try again with larger values, just to confirm.

@maxcao13
Copy link
Member

2023-04-14 23:18:37,406 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) JFR Harvester starting
�2023-04-14 23:18:37,573 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) JFR Harvester started using template "default" with period PT15S
�2023-04-14 23:18:37,574 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) Periodic uploads will contain approximately the most recent 600ms (PT0.6S) of data
�2023-04-14 23:18:37,579 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) On-stop uploads will contain approximately the most recent 1000 bytes (1000 bytes) of data
2023-04-14 23:18:37,655 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(1) RUNNING
2023-04-14 23:18:52,671 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) Snapshot(2) CLOSED
2023-04-14 23:18:52,743 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) POST 200 (quarkus-test-agent_default_20230414T231852Z.jfr -> http://localhost:8181/api/beta/recordings/iMsHuf1ZXI4QJNtVbcjEkJcnCv-o8FMMeY2YNqJAOfE=): 321 KB/PT0.077247823S
2023-04-14 23:19:07,664 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) Snapshot(3) CLOSED
2023-04-14 23:19:07,694 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) POST 200 (quarkus-test-agent_default_20230414T231907Z.jfr -> http://localhost:8181/api/beta/recordings/iMsHuf1ZXI4QJNtVbcjEkJcnCv-o8FMMeY2YNqJAOfE=): 330 KB/PT0.031873328S
2023-04-14 23:19:22,664 INFO  [io.cry.age.Harvester] (cryostat-agent-worker-1) Snapshot(4) CLOSED
2023-04-14 23:19:22,684 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) POST 200 (quarkus-test-agent_default_20230414T231922Z.jfr -> http://localhost:8181/api/beta/recordings/iMsHuf1ZXI4QJNtVbcjEkJcnCv-o8FMMeY2YNqJAOfE=): 319 KB/PT0.021000678S

image

@andrewazores
Copy link
Member Author

Yea, there are certain minimums that the JFR system itself within the JVM will enforce, like the minimum size of a chunk and therefore the minimum number of events that will be included. If that minimum threshold isn't met then the maxage/maxsize policies won't be applied. In order to get a short maxage like that to actually apply you'd need to be recording on a target application that is generating a lot more events per second.

@andrewazores andrewazores force-pushed the periodic-maxage-maxsize branch from 4aab4d7 to 2ed06ef Compare April 14, 2023 23:41
@maxcao13
Copy link
Member

maxcao13 commented Apr 15, 2023

One more question, I've set the harvesting period to 10min and the PING_PERIOD to the default (5min), in this case, the harvester will never upload its recording (Which makes sense, but I think having a warning somewhere in docs for this would be nice), but what's unexpected is that the onexit dump recording also never gets uploaded. Is that correct?

2023-04-14 23:51:13,187 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-2) POST /
2023-04-14 23:51:13,187 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-2) Using 'deflate' encoding
�2023-04-14 23:51:13,189 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-0) DELETE http://localhost:8181/api/v2.2/credentials/1 HTTP/1.1
2023-04-14 23:51:13,189 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-2) POST / : 204 2ms
�2023-04-14 23:51:13,208 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-1) DELETE http://localhost:8181/api/v2.2/credentials/1 : 200
2023-04-14 23:51:13,209 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-1) DELETE http://localhost:8181/api/v2.2/discovery/b20734cc-9828-4ad0-a36f-924e74574abc?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..u8o93wf4EVtpIIUF.RRxlYsF-F1VGGB0hIAm6SllTbcmmQ0mKJNN_nP79MrUeL4oKYZIcPXtEvG0qasH4QcNzk38IZnrsv-TH7mWU-yagGTkRJzJe9Vi9UG5gZoKSmmEy6uruVGvw4oct9xb7ZEwED72gTpaLFSzK6nuVPo2-45pXBJ9z5FMpxAY91vHyw6qDn79xYHp2iyxn6ddlPB3tk51XN4arbJ2Jg3c4Ul9glQyvhlYoV3BQYEMOUJcG7wXFpdDZIcG5UkoQytSWycrWq5grOZX_qak9VJR2Ajnuux_Fx0ClFpL97s2wNxASSJxsSdE-UdU2LHSJ5zBtEVMEJDAaZvNGaIg3ele4-RrlSaL_csd9HBxPlJyV_sqM9j6ast_j6oHTCjpy_kXg7aZooNLWU82wLBE5BJJtrtRzmtdQ9Eez8n-QMcp5kLqN7IbNGIsoCqcaxKOAIROlI2jDeGedLCyc0m90DPNzCsAjEeCmwyHSPDRMfiirA9Ig-oIzd3IsMp2khfaEl_IpafaVe2OSMHdZagrTU4I-7BVViC561YnyJKvQsOuhHIONsQMzmEzeVfrV_DxRl8rQ3Wma.B5UFhwMNEeDbW78T74HFiw HTTP/1.1
2023-04-14 23:51:13,226 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) DELETE http://localhost:8181/api/v2.2/discovery/b20734cc-9828-4ad0-a36f-924e74574abc?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..u8o93wf4EVtpIIUF.RRxlYsF-F1VGGB0hIAm6SllTbcmmQ0mKJNN_nP79MrUeL4oKYZIcPXtEvG0qasH4QcNzk38IZnrsv-TH7mWU-yagGTkRJzJe9Vi9UG5gZoKSmmEy6uruVGvw4oct9xb7ZEwED72gTpaLFSzK6nuVPo2-45pXBJ9z5FMpxAY91vHyw6qDn79xYHp2iyxn6ddlPB3tk51XN4arbJ2Jg3c4Ul9glQyvhlYoV3BQYEMOUJcG7wXFpdDZIcG5UkoQytSWycrWq5grOZX_qak9VJR2Ajnuux_Fx0ClFpL97s2wNxASSJxsSdE-UdU2LHSJ5zBtEVMEJDAaZvNGaIg3ele4-RrlSaL_csd9HBxPlJyV_sqM9j6ast_j6oHTCjpy_kXg7aZooNLWU82wLBE5BJJtrtRzmtdQ9Eez8n-QMcp5kLqN7IbNGIsoCqcaxKOAIROlI2jDeGedLCyc0m90DPNzCsAjEeCmwyHSPDRMfiirA9Ig-oIzd3IsMp2khfaEl_IpafaVe2OSMHdZagrTU4I-7BVViC561YnyJKvQsOuhHIONsQMzmEzeVfrV_DxRl8rQ3Wma.B5UFhwMNEeDbW78T74HFiw : 200
�2023-04-14 23:51:13,227 INFO  [io.cry.age.Registration] (cryostat-agent-worker-2) Deregistered from Cryostat discovery plugin [b20734cc-9828-4ad0-a36f-924e74574abc]
2023-04-14 23:51:13,228 INFO  [io.cry.age.Agent] (cryostat-agent-worker-2) Registration state: UNREGISTERED
2023-04-14 23:51:13,228 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) Harvester stopping
2023-04-14 23:51:13,228 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) Harvester stopped
�2023-04-14 23:51:13,231 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-0) GET http://localhost:8181/api/v2.2/credentials/1 HTTP/1.1
�2023-04-14 23:51:13,238 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) GET http://localhost:8181/api/v2.2/credentials/1 : 404
�2023-04-14 23:51:13,239 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) POST http://localhost:8181/api/v2.2/credentials HTTP/1.1
�2023-04-14 23:51:13,276 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-1) POST http://localhost:8181/api/v2.2/credentials : 201
2023-04-14 23:51:13,277 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-1) Defined credentials with id 3
�2023-04-14 23:51:13,277 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-1) POST http://localhost:8181/api/v2.2/discovery HTTP/1.1
2023-04-14 23:51:13,287 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-2) GET /
2023-04-14 23:51:13,287 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-2) Using 'deflate' encoding
2023-04-14 23:51:13,289 INFO  [io.cry.age.WebServer] (cryostat-agent-worker-2) GET / : 204 2ms
�2023-04-14 23:51:13,301 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-0) POST http://localhost:8181/api/v2.2/discovery : 201
�2023-04-14 23:51:13,302 INFO  [io.cry.age.Registration] (cryostat-agent-worker-0) Registered as c605a636-0399-4a75-98e4-d171fccbbf0a
2023-04-14 23:51:13,302 INFO  [io.cry.age.Agent] (cryostat-agent-worker-0) Registration state: REGISTERED
�2023-04-14 23:51:13,303 INFO  [io.cry.age.Registration] (cryostat-agent-worker-0) publishing self as service:jmx:rmi:///jndi/rmi://cryostat:9097/jmxrmi
2023-04-14 23:51:13,303 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-0) POST http://localhost:8181/api/v2.2/discovery/c605a636-0399-4a75-98e4-d171fccbbf0a?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..LJRQmJH2Qwax23Wc.grp8DRtT1whlm9uicIKk48-7yxvIGQGLsWECBBVYXMxbifHNQuQsfDS5JNbpl3SnFn7M_7tnnM8TSWGhfmV6hhSdbG1p1M-sJPW3VVjGzwvzbIsozwrjZiQIM6OQE_xqyR_MQD6mNdeDKyx-7uT163GiQkgNy3Bns2OiUcvuMYQDhotxq_Nu1er5g7mgnHCAsxrde7OZGpgq0Dqs2g2K1a--O6yJnrjRgYlrK5WhuEWJgXEAVuRLNLnw5A0SJZBGMdMB5bVrjm6j3s-j_iTT7H0Cqygid9Hk8sS2l8LaDy_V_i8pLe5xWTnHs016WBsSQUgh1pjrms1oHW6BcYk1ZjJJX-B2SLWuujijLALmlFgbvoFuJd0-F__jBR1DRcMY0ej4CbflehhhVRkAdNXf3OtYcrh96LDC8NHMpL43xcdUqlfBF5zI9m4m7FOvB5nlhuvpvpBOKNrMDUi_kDLom-0uTxspidm6UW6uR6jLrKIBfYif1kIIUksqzJNGLoQ9nn7-HWE1QSoGlc0ogVenXWQDvJQdOyT7c6z8iClGI6gQ40wB3drmN1WKILv95iov1fGq.lkRlmaWzZfGs6pQNt45_OA HTTP/1.1
2023-04-14 23:51:13,346 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) POST http://localhost:8181/api/v2.2/discovery/c605a636-0399-4a75-98e4-d171fccbbf0a?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..LJRQmJH2Qwax23Wc.grp8DRtT1whlm9uicIKk48-7yxvIGQGLsWECBBVYXMxbifHNQuQsfDS5JNbpl3SnFn7M_7tnnM8TSWGhfmV6hhSdbG1p1M-sJPW3VVjGzwvzbIsozwrjZiQIM6OQE_xqyR_MQD6mNdeDKyx-7uT163GiQkgNy3Bns2OiUcvuMYQDhotxq_Nu1er5g7mgnHCAsxrde7OZGpgq0Dqs2g2K1a--O6yJnrjRgYlrK5WhuEWJgXEAVuRLNLnw5A0SJZBGMdMB5bVrjm6j3s-j_iTT7H0Cqygid9Hk8sS2l8LaDy_V_i8pLe5xWTnHs016WBsSQUgh1pjrms1oHW6BcYk1ZjJJX-B2SLWuujijLALmlFgbvoFuJd0-F__jBR1DRcMY0ej4CbflehhhVRkAdNXf3OtYcrh96LDC8NHMpL43xcdUqlfBF5zI9m4m7FOvB5nlhuvpvpBOKNrMDUi_kDLom-0uTxspidm6UW6uR6jLrKIBfYif1kIIUksqzJNGLoQ9nn7-HWE1QSoGlc0ogVenXWQDvJQdOyT7c6z8iClGI6gQ40wB3drmN1WKILv95iov1fGq.lkRlmaWzZfGs6pQNt45_OA : 200
2023-04-14 23:51:13,347 INFO  [io.cry.age.Registration] (cryostat-agent-worker-2) Publish success
2023-04-14 23:51:13,348 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) JFR Harvester starting
�2023-04-14 23:51:13,348 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) JFR Harvester started using template "default" with period PT10M
�2023-04-14 23:51:13,348 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) Periodic uploads will contain approximately the most recent 60000ms (PT1M) of data
�2023-04-14 23:51:13,348 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) On-stop uploads will contain approximately the most recent 1200000 bytes (1 MB) of data
2023-04-14 23:51:13,359 INFO  [io.cry.age.Agent] (cryostat-agent-worker-2) Registration state: PUBLISHED
2023-04-14 23:51:13,368 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(1) STOPPED
2023-04-14 23:51:13,370 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(1) CLOSED
2023-04-14 23:51:13,371 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(1) CLOSED
2023-04-14 23:51:13,396 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(2) RUNNING
�2023-04-14 23:51:14,417 WARNING [io.cry.age.Harvester] (cryostat-agent-harvester) Could not upload exit dump file: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Read timed out
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
        at io.cryostat.agent.Harvester.lambda$5(Harvester.java:223)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
        at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at io.cryostat.agent.CryostatClient.executeQuiet(CryostatClient.java:346)
        at io.cryostat.agent.CryostatClient.lambda$24(CryostatClient.java:340)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        ... 6 more

�2023-04-14 23:51:15,420 WARNING [io.cry.age.Harvester] (cryostat-agent-harvester) Could not upload exit dump file: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Read timed out
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
        at io.cryostat.agent.Harvester.lambda$5(Harvester.java:223)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309)
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
        at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)
        at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
        at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
        at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at io.cryostat.agent.CryostatClient.executeQuiet(CryostatClient.java:346)
        at io.cryostat.agent.CryostatClient.lambda$24(CryostatClient.java:340)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        ... 6 more

2023-04-14 23:52:13,303 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-1) GET http://localhost:8181/api/v2.2/discovery/c605a636-0399-4a75-98e4-d171fccbbf0a?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..LJRQmJH2Qwax23Wc.grp8DRtT1whlm9uicIKk48-7yxvIGQGLsWECBBVYXMxbifHNQuQsfDS5JNbpl3SnFn7M_7tnnM8TSWGhfmV6hhSdbG1p1M-sJPW3VVjGzwvzbIsozwrjZiQIM6OQE_xqyR_MQD6mNdeDKyx-7uT163GiQkgNy3Bns2OiUcvuMYQDhotxq_Nu1er5g7mgnHCAsxrde7OZGpgq0Dqs2g2K1a--O6yJnrjRgYlrK5WhuEWJgXEAVuRLNLnw5A0SJZBGMdMB5bVrjm6j3s-j_iTT7H0Cqygid9Hk8sS2l8LaDy_V_i8pLe5xWTnHs016WBsSQUgh1pjrms1oHW6BcYk1ZjJJX-B2SLWuujijLALmlFgbvoFuJd0-F__jBR1DRcMY0ej4CbflehhhVRkAdNXf3OtYcrh96LDC8NHMpL43xcdUqlfBF5zI9m4m7FOvB5nlhuvpvpBOKNrMDUi_kDLom-0uTxspidm6UW6uR6jLrKIBfYif1kIIUksqzJNGLoQ9nn7-HWE1QSoGlc0ogVenXWQDvJQdOyT7c6z8iClGI6gQ40wB3drmN1WKILv95iov1fGq.lkRlmaWzZfGs6pQNt45_OA HTTP/1.1

@andrewazores
Copy link
Member Author

I'll have to give this more thought. I think this reveals a deeper bug about the Agent lifecycle and how the discovery ping is handled with re-registration. The Agent currently handles this by going through deregistration and re-registration internally, but the deregistration is what interrupts the periodic upload schedule and I think is also breaking the onexit upload.

@andrewazores andrewazores changed the title feat(periodic): add config for maxage/maxsize fix(periodic): config for maxage/maxsize to prevent recording upload timeouts due to large filesize Apr 15, 2023
@andrewazores andrewazores force-pushed the periodic-maxage-maxsize branch from 2ed06ef to 6c389fa Compare April 17, 2023 17:27
maxcao13
maxcao13 previously approved these changes Apr 18, 2023
Copy link
Member

@maxcao13 maxcao13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good now!

@andrewazores
Copy link
Member Author

Rebased with no changes for commit signing.

@andrewazores andrewazores merged commit d6daade into cryostatio:main Apr 18, 2023
@andrewazores andrewazores deleted the periodic-maxage-maxsize branch April 18, 2023 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Task] Add configuration for scheduled recording maxage/maxsize
3 participants