Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 20 additions & 19 deletions docs/source/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -700,11 +700,11 @@ Temporary files and directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
each other's data. Also we want to get the temp files and directories removed at the end of each test that created
each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
them. Therefore, using packages like ``tempfile``, which address these needs is essential.

However, when debugging tests, you need to be able to see what goes into the temp file or directory and you want to
know it's exact path and not having it randomized on every test re-run.
However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
to know it's exact path and not having it randomized on every test re-run.

A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
Expand All @@ -720,32 +720,33 @@ Here is an example of its usage:

This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.

In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless
``after=False`` is passed to the helper function.

* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a
specific directory:
* Create a unique temporary dir:

.. code-block:: python

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
tmp_dir = self.get_auto_remove_tmp_dir()

``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.

* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the
temp results:
* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.

.. code-block:: python

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion
in the previous test run and want to make sure the that temporary directory is empty before the new test is run:
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
leave any data in there.

.. code-block:: python
* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
following behaviors:

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
- ``before=True``: the temporary dir will always be cleared at the beginning of the test.
- ``before=False``: if the temporary dir already existed, any existing files will remain there.
- ``after=True``: the temporary dir will always be deleted at the end of the test.
- ``after=False``: the temporary dir will always be left intact at the end of the test.

.. note::
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
Expand Down Expand Up @@ -799,7 +800,7 @@ or the ``xfail`` way:
@pytest.mark.xfail
def test_feature_x():

Here is how to skip a test based on some internal check inside the test:
- Here is how to skip a test based on some internal check inside the test:

.. code-block:: python

Expand All @@ -822,7 +823,7 @@ or the ``xfail`` way:
def test_feature_x():
pytest.xfail("expected to fail until bug XYZ is fixed")

Here is how to skip all tests in a module if some import is missing:
- Here is how to skip all tests in a module if some import is missing:

.. code-block:: python

Expand Down
92 changes: 63 additions & 29 deletions src/transformers/testing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,45 +516,47 @@ class solves this problem by sorting out all the basic paths and provides easy a
- ``repo_root_dir_str``
- ``src_dir_str``

Feature 2: Flexible auto-removable temp dirs which are guaranteed to get removed at the end of test.
Feature 2: Flexible auto-removable temporary dirs which are guaranteed to get removed at the end of test.

In all the following scenarios the temp dir will be auto-removed at the end of test, unless `after=False`.

# 1. create a unique temp dir, `tmp_dir` will contain the path to the created temp dir
1. Create a unique temporary dir:

::

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir()

# 2. create a temp dir of my choice and delete it at the end - useful for debug when you want to # monitor a
specific directory
``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.


2. Create a temporary dir of my choice, ensure it's empty before the test starts and don't
empty it after the test.

::

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test")
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")

# 3. create a temp dir of my choice and do not delete it at the end - useful for when you want # to look at the
temp results
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests
didn't leave any data in there.

::
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
3. You can override the first two options by directly overriding the ``before`` and ``after`` args, leading to the
following behavior:

# 4. create a temp dir of my choice and ensure to delete it right away - useful for when you # disabled deletion in
the previous test run and want to make sure the that tmp dir is empty # before the new test is run
``before=True``: the temporary dir will always be cleared at the beginning of the test.

::
``before=False``: if the temporary dir already existed, any existing files will remain there.

def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
``after=True``: the temporary dir will always be deleted at the end of the test.

``after=False``: the temporary dir will always be left intact at the end of the test.

Note 1: In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are
allowed if an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem
will get nuked. i.e. please always pass paths that start with `./`
Note 1: In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are
allowed if an explicit ``tmp_dir`` is used, so that by mistake no ``/tmp`` or similar important part of the
filesystem will get nuked. i.e. please always pass paths that start with ``./``

Note 2: Each test can register multiple temp dirs and they all will get auto-removed, unless requested otherwise.
Note 2: Each test can register multiple temporary dirs and they all will get auto-removed, unless requested
otherwise.

Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite.
This is useful for invoking external programs from the test suite - e.g. distributed training.
Expand All @@ -567,6 +569,7 @@ def test_whatever(self):
"""

def setUp(self):
# get_auto_remove_tmp_dir feature:
self.teardown_tmp_dirs = []

# figure out the resolved paths for repo_root, tests, examples, etc.
Expand Down Expand Up @@ -654,21 +657,42 @@ def get_env(self):
env["PYTHONPATH"] = ":".join(paths)
return env

def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
def get_auto_remove_tmp_dir(self, tmp_dir=None, before=None, after=None):
"""
Args:
tmp_dir (:obj:`string`, `optional`):
use this path, if None a unique path will be assigned
before (:obj:`bool`, `optional`, defaults to :obj:`False`):
if `True` and tmp dir already exists make sure to empty it right away
after (:obj:`bool`, `optional`, defaults to :obj:`True`):
delete the tmp dir at the end of the test
if :obj:`None`:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger, any purpose for why does utils/style_doc.py change the docstring is this strange way - I meant no new lines there - why do we need them?

Copy link
Contributor Author

@stas00 stas00 Nov 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should it perhaps say which file(s) should be restyled when it fails?

Traceback (most recent call last):
  File "utils/style_doc.py", line 465, in <module>
    main(*args.files, max_len=args.max_len, check_only=args.check_only)
  File "utils/style_doc.py", line 453, in main
    raise ValueError(f"{len(changed)} files should be restyled!")
ValueError: 1 files should be restyled!

Perhaps it doesn't matter, since it's automatic anyway... just a thought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists should always being with a new line before otherwise sphinx (sometimes) throws warnings. That's why the script wants to add them.

As for the warnings, I copied what black does (and it does not say which file should be restyled) :-)


- a unique temporary path will be created
- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=True`` if ``after`` is :obj:`None`
else:

- :obj:`tmp_dir` will be created
- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=False`` if ``after`` is :obj:`None`
before (:obj:`bool`, `optional`):
If :obj:`True` and the :obj:`tmp_dir` already exists, make sure to empty it right away if :obj:`False`
and the :obj:`tmp_dir` already exists, any existing files will remain there.
after (:obj:`bool`, `optional`):
If :obj:`True`, delete the :obj:`tmp_dir` at the end of the test if :obj:`False`, leave the
:obj:`tmp_dir` and its contents intact at the end of the test.

Returns:
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-created tmp
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-selected tmp
dir
"""
if tmp_dir is not None:

# defining the most likely desired behavior for when a custom path is provided.
# this most likely indicates the debug mode where we want an easily locatable dir that:
# 1. gets cleared out before the test (if it already exists)
# 2. is left intact after the test
if before is None:
before = True
if after is None:
after = False

# using provided path
path = Path(tmp_dir).resolve()

Expand All @@ -685,6 +709,15 @@ def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
path.mkdir(parents=True, exist_ok=True)

else:
# defining the most likely desired behavior for when a unique tmp path is auto generated
# (not a debug mode), here we require a unique tmp dir that:
# 1. is empty before the test (it will be empty in this situation anyway)
# 2. gets fully removed after the test
if before is None:
before = True
if after is None:
after = True

# using unique tmp dir (always empty, regardless of `before`)
tmp_dir = tempfile.mkdtemp()

Expand All @@ -695,7 +728,8 @@ def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False):
return tmp_dir

def tearDown(self):
# remove registered temp dirs

# get_auto_remove_tmp_dir feature: remove registered temp dirs
for path in self.teardown_tmp_dirs:
shutil.rmtree(path, ignore_errors=True)
self.teardown_tmp_dirs = []
Expand Down