Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting temp directory does not really work #4641

Open
mitar opened this issue Apr 16, 2019 · 4 comments
Open

Setting temp directory does not really work #4641

mitar opened this issue Apr 16, 2019 · 4 comments
Labels
P3 Issue moderate in impact or severity

Comments

@mitar
Copy link
Member

mitar commented Apr 16, 2019

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Ray installed from (source or binary): binary
  • Ray version: 0.6.5
  • Python version: 3.6.7

Describe the problem

It looks like setting temp_dir does not really fully work. I test by first doing ray.init and setting temp_dir. After running our test suite there are few errors which do not happen otherwise.

Source code / logs

With having temp_dir set to /tmp/tmpz842giec/tmp. I see first:

[ERROR 2019-04-16 08:19:23,269 ray.worker] The log monitor on node 91b315995a8e failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/ray/log_monitor.py", line 268, in <module>
    log_monitor.run()
  File "/usr/local/lib/python3.6/dist-packages/ray/log_monitor.py", line 217, in run
    self.update_log_filenames()
  File "/usr/local/lib/python3.6/dist-packages/ray/log_monitor.py", line 92, in update_log_filenames
    log_filenames = os.listdir(self.logs_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpz842giec/tmp/logs'

After few workers are initialized and some tasks succeed, I see at one point:

WARNING: Logging before InitGoogleLogging() is written to STDERR
E0416 08:20:08.636682 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 50 more times
E0416 08:20:08.737330 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 49 more times
E0416 08:20:08.837546 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 48 more times
E0416 08:20:08.937810 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 47 more times
E0416 08:20:09.038133 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 46 more times
E0416 08:20:09.138423 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 45 more times
E0416 08:20:09.238644 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 44 more times
E0416 08:20:09.338886 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 43 more times
E0416 08:20:09.439121 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 42 more times
E0416 08:20:09.539324 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 41 more times
E0416 08:20:09.639588 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 40 more times
E0416 08:20:09.739853 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 39 more times
E0416 08:20:09.840093 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 38 more times
E0416 08:20:09.940340 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 37 more times
E0416 08:20:10.040524 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 36 more times
E0416 08:20:10.140738 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 35 more times
E0416 08:20:10.240947 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 34 more times
E0416 08:20:10.341194 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 33 more times
E0416 08:20:10.441453 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 32 more times
E0416 08:20:10.541712 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 31 more times
E0416 08:20:10.641887 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 30 more times
E0416 08:20:10.742102 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 29 more times
E0416 08:20:10.842368 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 28 more times
E0416 08:20:10.942625 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 27 more times
E0416 08:20:11.042860 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 26 more times
E0416 08:20:11.143100 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 25 more times
E0416 08:20:11.243371 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 24 more times
E0416 08:20:11.343677 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 23 more times
E0416 08:20:11.443930 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 22 more times
E0416 08:20:11.544179 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 21 more times
E0416 08:20:11.644368 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 20 more times
E0416 08:20:11.744597 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 19 more times
E0416 08:20:11.844858 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 18 more times
E0416 08:20:11.945101 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 17 more times
E0416 08:20:12.045375 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 16 more times
E0416 08:20:12.145651 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 15 more times
E0416 08:20:12.245941 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 14 more times
E0416 08:20:12.346261 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 13 more times
E0416 08:20:12.446516 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 12 more times
E0416 08:20:12.546814 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 11 more times
E0416 08:20:12.647054 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 10 more times
E0416 08:20:12.747303 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 9 more times
E0416 08:20:12.847488 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 8 more times
E0416 08:20:12.947697 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 7 more times
E0416 08:20:13.047881 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 6 more times
E0416 08:20:13.148118 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 5 more times
E0416 08:20:13.248319 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 4 more times
E0416 08:20:13.348466 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 3 more times
E0416 08:20:13.448645 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 2 more times
E0416 08:20:13.548853 10315 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpz842giec/tmp/sockets/plasma_store, retrying 1 more times

Tests seem to continue though.

In tests I also try to connect to the Plasma store using ray.worker.global_worker.plasma_client.store_socket_name as location of the socket. But when I set temp_dir it seems that is not providing anymore correct location because tests fails with:

    plasma_client = plasma.connect(ray.worker.global_worker.plasma_client.store_socket_name, release_delay=0)
  File "pyarrow/_plasma.pyx", line 790, in pyarrow._plasma.connect
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Could not connect to socket /tmp/tmpz842giec/tmp/sockets/plasma_store

It is interesting that all test succeed except for the one trying to connect to the Plasma store. I still worry why these additional errors/warnings.

@mitar mitar mentioned this issue May 5, 2019
4 tasks
@robertnishihara
Copy link
Collaborator

Should be fixed by #4605.

@mitar
Copy link
Member Author

mitar commented Jul 18, 2019

I tried on Ray 0.7.2 and it seems it is still not working correctly. If I set temp_dir, I get now two warnings:

The log monitor on node f9a90040e222 failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/ray/log_monitor.py", line 268, in <module>
    log_monitor.run()
  File "/usr/local/lib/python3.6/dist-packages/ray/log_monitor.py", line 217, in run
    self.update_log_filenames()
  File "/usr/local/lib/python3.6/dist-packages/ray/log_monitor.py", line 92, in update_log_filenames
    log_filenames = os.listdir(self.logs_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/logs'

And:

E0718 05:06:58.769970  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 20 more times
E0718 05:06:59.170716  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 19 more times
E0718 05:06:59.570971  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 18 more times
E0718 05:06:59.971236  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 17 more times
E0718 05:07:00.371511  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 16 more times
E0718 05:07:00.771813  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 15 more times
E0718 05:07:01.172076  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 14 more times
E0718 05:07:01.572371  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 13 more times
E0718 05:07:01.972647  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 12 more times
E0718 05:07:02.372856  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 11 more times
E0718 05:07:02.773128  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 10 more times
E0718 05:07:03.173375  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 9 more times
E0718 05:07:03.573657  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 8 more times
E0718 05:07:03.973937  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 7 more times
E0718 05:07:04.374251  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 6 more times
E0718 05:07:04.774463  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 5 more times
E0718 05:07:05.174671  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 4 more times
E0718 05:07:05.574985  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 3 more times
E0718 05:07:05.975301  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 2 more times
E0718 05:07:06.375502  2148 io.cc:168] Connection to IPC socket failed for pathname /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store, retrying 1 more times

And:

pyarrow.lib.ArrowIOError: Could not connect to socket /tmp/tmpyez0tctr/tmp/session_2019-07-18_05-05-33_721880_2148/sockets/plasma_store

@mitar mitar reopened this Jul 18, 2019
@mitar
Copy link
Member Author

mitar commented Jul 18, 2019

cc @suquark

@suquark
Copy link
Member

suquark commented Jul 18, 2019

Interesting. Let me look into it.

@simon-mo simon-mo added the P1 Issue that should be fixed within a few weeks label Mar 19, 2020
@ericl ericl added P3 Issue moderate in impact or severity and removed P1 Issue that should be fixed within a few weeks labels May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Issue moderate in impact or severity
Projects
None yet
Development

No branches or pull requests

5 participants