Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

pytest: bad address #460

Closed
eric-haibin-lin opened this issue Dec 9, 2018 · 11 comments
Closed

pytest: bad address #460

eric-haibin-lin opened this issue Dec 9, 2018 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@eric-haibin-lin
Copy link
Member

Sometimes CI is not very stable:

http://ci.mxnet.io/blue/organizations/jenkins/gluon-nlp/detail/PR-387/15/

+ pytest -v -n 4 -m 'not serial' --durations=50 scripts --cov scripts

Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1867308943 to reproduce.

============================= test session starts ==============================

platform linux2 -- Python 2.7.15, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 -- /var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/bin/python

cachedir: .pytest_cache

rootdir: /var/lib/jenkins/workspace/gluon-nlp-py2@2, inifile: pytest.ini

plugins: xdist-1.24.1, forked-0.2, cov-2.6.0, flaky-3.4.0

gw0 I / gw1 I / gw2 I / gw3 I


[gw0] linux2 Python 2.7.15 cwd: /var/lib/jenkins/workspace/gluon-nlp-py2@2


[gw1] linux2 Python 2.7.15 cwd: /var/lib/jenkins/workspace/gluon-nlp-py2@2

INTERNALERROR> Traceback (most recent call last):

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/_pytest/main.py", line 182, in wrap_session

INTERNALERROR>     config.hook.pytest_sessionstart(session=session)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/pluggy/hooks.py", line 284, in __call__

INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/pluggy/manager.py", line 67, in _hookexec

INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/pluggy/manager.py", line 61, in <lambda>

INTERNALERROR>     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/pluggy/callers.py", line 208, in _multicall

INTERNALERROR>     return outcome.get_result()

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/pluggy/callers.py", line 81, in get_result

INTERNALERROR>     _reraise(*ex)  # noqa

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/pluggy/callers.py", line 187, in _multicall

INTERNALERROR>     res = hook_impl.function(*args)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/xdist/dsession.py", line 81, in pytest_sessionstart

INTERNALERROR>     nodes = self.nodemanager.setup_nodes(putevent=self.queue.put)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/xdist/workermanage.py", line 67, in setup_nodes

INTERNALERROR>     nodes.append(self.setup_node(spec, putevent))

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/xdist/workermanage.py", line 71, in setup_node

INTERNALERROR>     gw = self.group.makegateway(spec)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/execnet/multi.py", line 127, in makegateway

INTERNALERROR>     io = gateway_io.create_io(spec, execmodel=self.execmodel)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/execnet/gateway_io.py", line 126, in create_io

INTERNALERROR>     return Popen2IOMaster(args, execmodel)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/execnet/gateway_io.py", line 20, in __init__

INTERNALERROR>     self.popen = p = execmodel.PopenPiped(args)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/execnet/gateway_base.py", line 178, in PopenPiped

INTERNALERROR>     return self.subprocess.Popen(args, stdout=PIPE, stdin=PIPE)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/subprocess32.py", line 614, in __init__

INTERNALERROR>     restore_signals, start_new_session)

INTERNALERROR>   File "/var/lib/jenkins/workspace/gluon-nlp-py2@2/conda/py2/lib/python2.7/site-packages/subprocess32.py", line 1393, in _execute_child

INTERNALERROR>     raise child_exception_type(errno_num, err_msg)

INTERNALERROR> OSError: [Errno 14] Bad address

script returned exit code 3


@leezu
Copy link
Contributor

leezu commented Dec 12, 2018

While I do not understand yet why Errno 14 is thrown, this error usually indicated a problem with the notebooks. It happened some times before too..

@eric-haibin-lin
Copy link
Member Author

Happened again in http://ci.mxnet.io/blue/organizations/jenkins/gluon-nlp/detail/PR-500/3/pipeline/
I'm not sure why it is flaky..

@szha szha self-assigned this Jan 4, 2019
@szha szha added the bug Something isn't working label Jan 4, 2019
@vanewu
Copy link
Contributor

vanewu commented Jan 4, 2019

@szha
Copy link
Member

szha commented Jan 10, 2019

I searched this problem and not much came up.

There are claims that this was a bug in py2.6 but we have occurrences in both py27 and py36

Searching for execnet Popen "errno 14" came up empty. Since it's related to the OS, this might mean this problem is unique to our CI environment. I'll try updating the machine when there's no test running.

@szha
Copy link
Member

szha commented Jan 10, 2019

Done. Let's see if this addresses the problem.

@szha
Copy link
Member

szha commented Jan 11, 2019

So far it hasn't been occured yet. I will leave the issue open for another couple of days.

@szha
Copy link
Member

szha commented Jan 14, 2019

Closing as I haven't seen this occurring again.

@szha szha closed this as completed Jan 14, 2019
@szha
Copy link
Member

szha commented Feb 28, 2019

@szha szha reopened this Feb 28, 2019
@szha
Copy link
Member

szha commented May 6, 2019

Since pytest-xdist uses multi-process as a way to speed up tests, bad address here indicates that a worker process has died, which usually indicates actual problems in code. I found this out because there was a recent regression that causes some scripts to run into exception which kills a testing worker process.

@szha szha closed this as completed May 6, 2019
@eric-haibin-lin
Copy link
Member Author

@leezu
Copy link
Contributor

leezu commented Sep 24, 2019

Tracked at apache/mxnet#13875

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants