Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] smarts freezes after running for around 12 hours #2089

Closed
Edward11235 opened this issue Sep 28, 2023 · 1 comment · Fixed by #2088
Closed

[Bug Report] smarts freezes after running for around 12 hours #2089

Edward11235 opened this issue Sep 28, 2023 · 1 comment · Fixed by #2088
Labels
bug Something isn't working

Comments

@Edward11235
Copy link

Edward11235 commented Sep 28, 2023

High Level Description

I am using SMARTS 1.2.0 to train a model. After training for ~12 hours, SMARTS will always freeze. I talked to SMARTS team and they believe that it is related to SumoTrafficSimulation._cumulative_sim_seconds. This variable is probably not reset and grows very large over time. Below is the error message when I terminate the training:

^C^CTraceback (most recent call last):
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 260, in step
return self._step(agent_actions, time_delta_since_last_step)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 317, in _step
provider_state = self._step_providers(all_agent_actions)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1334, in _step_providers
provider_state = provider.step(
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 471, in step
self._last_provider_state = self._step(dt)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 482, in _step
self._traci_conn.simulationStep(self._cumulative_sim_seconds)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/utils/sumo.py", line 236, in _wrap_traci_method
return method(*args, **kwargs)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 366, in simulationStep
result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 228, in _sendCmd
return self._sendExact()
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 131, in _sendExact
result = self._recvExact()
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 109, in _recvExact
t = self._socket.recv(4 - len(result))
KeyboardInterrupt

Version

I used v1.2.0

Steps to reproduce the bug

If the current judgement is right and the bug is caused by self._cumulative_sim_seconds. Running the SMARTS for many episodes for a long time will reproduce the bug.

Running an experiment with a high average number of steps and a single map with more than one traffic variation is guaranteed to cause this issue.

System info

System info:
Ubuntu 20.04
Python 3.8

Date:
2023-09-26

Error logs and screenshots

^C^CTraceback (most recent call last):
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 260, in step
return self._step(agent_actions, time_delta_since_last_step)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 317, in _step
provider_state = self._step_providers(all_agent_actions)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1334, in _step_providers
provider_state = provider.step(
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 471, in step
self._last_provider_state = self._step(dt)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 482, in _step
self._traci_conn.simulationStep(self._cumulative_sim_seconds)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/utils/sumo.py", line 236, in _wrap_traci_method
return method(*args, **kwargs)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 366, in simulationStep
result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step)
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 228, in _sendCmd
return self._sendExact()
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 131, in _sendExact
result = self._recvExact()
File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 109, in _recvExact
t = self._socket.recv(4 - len(result))
KeyboardInterrupt

Impact (If known)

This bug will hinder training large models with SMARTS.

@Edward11235 Edward11235 added the bug Something isn't working label Sep 28, 2023
@Gamenot
Copy link
Collaborator

Gamenot commented Sep 28, 2023

Thanks for the report. This should be fixed by the 1.2.1 release and #2088.

@Gamenot Gamenot closed this as completed Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants