Use real filesystem in ops.testing backend #960

weiiwang01 · 2023-07-03T11:11:15Z

This pull request refactors the testing module of the Ops library, switching its backend from an in-memory virtual filesystem to a real filesystem for simulating filesystem operations within containers. Furthermore, it exposes a new API, Harness.get_filesystem_root, enabling developers to interact with the simulated filesystem during test cases for both populating and inspecting the testing container filesystem.

Supersede #933

Changelog

Refactored the testing module to use a real filesystem for simulating container filesystem operations.
Introduced a new API, ops.testing.Harness.get_filesystem_root, for accessing and manipulating the simulated filesystem during testing.

benhoyt

Thanks for your continued effort here. This is really good work. I've left a bunch of comments. Mostly minor, but one discussion point about whether we should do the user/group stuff at all. I'll also try to schedule a meeting tomorrow to finish discussion on this.

benhoyt · 2023-07-05T04:23:36Z

ops/testing.py

+
+        """
+        if isinstance(container, str):
+            container_name = container


It was quite surprising to me that the str case worked without accessing the container first; I initially thought that backend._pebble_clients wouldn't yet be populated. But it does work, because instantiating the Model calls get_unit which calls Unit which calls ContainerMapping which calls Container which calls get_pebble, which finally populates _pebble_clients... phew!

I'm not sure you're doing anything wrong, it was just strange to me that this worked! :-)

Added a comment here as suggested.

This isn't something that I think we have a contract for it always working, but it does fit specifically in how Harness is meant to work. (It is ok for it to rely on details of the implementation, because it is shipped with those details and can be updated if those details change. It would not be good for arbitrary charms to depend on those details directly)
Just make sure we have a good test case for it, so that if we do change that behavior elsewhere, we know that we need to fix this code path for our users.

ops/testing.py

benhoyt · 2023-07-05T05:28:34Z

test/test_testing.py

+        self.assertTrue(foo_root.is_dir())
+        harness.begin()
+        container = harness.charm.unit.get_container("foo")
+        self.assertEqual(foo_root, harness.get_filesystem_root(container))


This is a good test as far as it goes, but I think we need a few more test cases to test that 1) files that the test writes via get_filesystem_root can then be pulled and listed, and 2) files pushed and make_dir'd can then be accessed via get_filesystem_root. Or am I missing something?

Yes, we already have a suite of tests for the filesystem, using the Pebble APIs (blackbox tests). I think I will add some more tests related to the get_filesystem_root method, since that's an exposed API now. Thanks!

I have added more unit tests for the get_filesystem_root method, thanks!

benhoyt · 2023-07-05T05:34:48Z

.github/workflows/framework-tests.yaml

@@ -38,6 +38,7 @@ jobs:
      matrix:
        os: [ubuntu-latest, macos-latest]
        python-version: ["3.8", "3.10"]
+        root: [true, false]


I don't love how this shows up in the test summary in GitHub, e.g., "ubuntu-latest, 3.10, false" -- it's unclear what "false" means. Also, I'd rather not run all the tests twice, and I don't think we need to or should be running all the tests as root anyway.

Can we use a suitable test name prefix, and then do something like we do with the RealPebble tests below? Perhaps a TestFilesystemAsRoot class and then use -k FilesystemAsRoot filter. I think having a separate job is cleaner (even though it's a bit more duplication).

I think the tests depend on the outcome of the discussion, I will defer the changes for now.

Co-authored-by: Ben Hoyt <[email protected]>

weiiwang01 · 2023-07-05T17:28:08Z

Thanks for your continued effort here. This is really good work. I've left a bunch of comments. Mostly minor, but one discussion point about whether we should do the user/group stuff at all. I'll also try to schedule a meeting tomorrow to finish discussion on this.

Thanks for the fantastic review! Yes, let's talk about the permission issue.

Another topic I want to discuss is about simulating storage mounts. Currently, we are using symbolic links to simulate storage mounts. However, actual storage mounts have some special properties that symbolic links can't simulate. For example, when a filesystem is attached to a directory, the original content of that directory becomes "shadowed". Additionally, it's an error to delete the mounting directory. Also, list_files should display the mounting directory as a directory, but right now, it shows as a symbolic link.

Of course, we can simulate these behaviors in the harness, but I feel like this introduces dishonesty into the design. I would prefer to let the user know that we are using symbolic links to simulate storage mounts and it's the users' responsibility to avoid tests involves unique properties of mounting directories.

benhoyt · 2023-07-07T02:09:36Z

Per yesterday's discussion, @weiiwang01's going to simplify by removing the user/group handling, and (for now at least) document where the test harness isn't a perfect match. As a consequence, we can always remove the tests running as root (we don't want to run tests as root or encourage charmers to do that). He's also going to remove the secrets.token_urlsafe call and just leave that out -- people shouldn't be testing exact contents of error messages anyway.

benhoyt

I think this is great. Just a couple of nit comments. And also get a second review (probably John's as we've discussed this with him).

ops/testing.py

benhoyt · 2023-07-10T00:45:25Z

ops/testing.py

+            rel_path = os.path.relpath(file_info.path, start=self._root)
+            rel_path = '/' if rel_path == '.' else '/' + rel_path
+            file_info.path = rel_path
+            file_info.name = "/" if rel_path == "/" else os.path.basename(rel_path)


Why isn't file_info.name correct already?

I think for the root directory, generated filename is the name of the temporary directory.

if it is only for the root, why set it always, vs just:

if rel_path == "/": // override the root name, since it should be considered the local root, without any filename file_info.name = "/"

It seems clearer than always setting it to the value it should already have. (and avoids computing basename() for every file in every directory if you are iterating over them)

John's suggestion looks clearer to me too.

ops/testing.py

benhoyt · 2023-07-10T00:49:09Z

test/test_testing.py

+    def test_pull_path(self):
+        (self.root / "foo").mkdir()
+        (self.root / "foo/bar").write_text("bar")
+        # TODO: pull_path doesn't pull empty directories, intended?


Yeah, if that's the case, that seems like a bug. Can you please open a separate canonical/operator issue for that?

weiiwang01 · 2023-07-10T02:58:17Z

Per yesterday's discussion, @weiiwang01's going to simplify by removing the user/group handling, and (for now at least) document where the test harness isn't a perfect match. As a consequence, we can always remove the tests running as root (we don't want to run tests as root or encourage charmers to do that). He's also going to remove the secrets.token_urlsafe call and just leave that out -- people shouldn't be testing exact contents of error messages anyway.

I've eliminated the ownership handling and restructured the code according to the recommendation. However, while writing the tests, I came across an problem: the pull_path method within ops.model.Container doesn't retrieve empty directories. I'm curious to know if this is by design, similar to how git operates.

benhoyt · 2023-07-10T03:15:08Z

Hi @weiiwang01 Thanks -- yep, I replied to your comments above. I've opened an issue on the pull_path empty directory issue: #968 -- also see the other comments above.

Co-authored-by: Ben Hoyt <[email protected]>

jameinel · 2023-07-10T19:47:33Z

ops/testing.py

+
+        """
+        if isinstance(container, str):
+            container_name = container


This isn't something that I think we have a contract for it always working, but it does fit specifically in how Harness is meant to work. (It is ok for it to rely on details of the implementation, because it is shipped with those details and can be updated if those details change. It would not be good for arbitrary charms to depend on those details directly)
Just make sure we have a good test case for it, so that if we do change that behavior elsewhere, we know that we need to fix this code path for our users.

jameinel · 2023-07-10T19:50:07Z

ops/testing.py

+        self._harness_storage_path = pathlib.Path(self._harness_tmp_dir.name) / "storages"
+        self._harness_container_path = pathlib.Path(self._harness_tmp_dir.name) / "containers"
+        self._harness_storage_path.mkdir()
+        self._harness_container_path.mkdir()


Its a bit of a shame to me that we have a bunch of mkdir() and ultimately rmdir() calls for every unit test that uses a Harness, even if those tests never want to access the filesystem. I suppose we could rework that in the future, though.

Yeah, I thought the same too. But then I did a timing test, and on my laptop I can os.mkdir() 10,000 directories in 0.1s and 100,000 in 1.6s (and a bit less than that for os.rmdir()). So 3 dir creates + deletes adds about 50 microseconds to each test. Linux FS operations are quick!

We can prefer /dev/shm as the tempdir to expedite the filesystem calls. But as the benchmark shows, it doesn't really have a significant influence on the overall performances.

jameinel · 2023-07-10T20:10:57Z

ops/testing.py

+            rel_path = os.path.relpath(file_info.path, start=self._root)
+            rel_path = '/' if rel_path == '.' else '/' + rel_path
+            file_info.path = rel_path
+            file_info.name = "/" if rel_path == "/" else os.path.basename(rel_path)


if it is only for the root, why set it always, vs just:

if rel_path == "/": // override the root name, since it should be considered the local root, without any filename file_info.name = "/"

It seems clearer than always setting it to the value it should already have. (and avoids computing basename() for every file in every directory if you are iterating over them)

benhoyt · 2023-07-10T23:24:16Z

Thanks. Let's just make that file_info.name = "/" tweak that John mentioned, and then I'll see about merging this!

benhoyt · 2023-07-12T01:41:35Z

Yeeha! Thanks for this @weiiwang01. This will be included with the next version of ops, 2.5.0, planned for release late July / early August.

weiiwang01 added 4 commits July 3, 2023 19:01

Use real file system in ops.testing backend

fcb6ae2

Fix the document problem

4db9d0b

Add more unit tests for ownership handling

fc70c5e

Fix a problem in ownership handling

dd61b57

benhoyt requested changes Jul 5, 2023

View reviewed changes

weiiwang01 and others added 10 commits July 6, 2023 00:35

Update documents

6c0c537

Co-authored-by: Ben Hoyt <[email protected]>

Update ops/testing.py documents

25ee2a2

Co-authored-by: Ben Hoyt <[email protected]>

Update ops/testing.py documents

664f5eb

Co-authored-by: Ben Hoyt <[email protected]>

Update ops/testing.py documents

299af6a

Co-authored-by: Ben Hoyt <[email protected]>

Update ops/testing.py documents

fdf178a

Co-authored-by: Ben Hoyt <[email protected]>

Update ops/testing.py

a6f0022

Co-authored-by: Ben Hoyt <[email protected]>

Remove unnecessary pyright ignores

9590a17

Minor refactoring on the testing module

a712fb0

Minor refactoring in the testing module

57a7fa2

Fix some formatting issues and bugs

8a59d5a

weiiwang01 added 2 commits July 9, 2023 16:22

Remove ownership handling in the testing module

e956a4a

Merge branch 'main' into main

b661acb

benhoyt approved these changes Jul 10, 2023

View reviewed changes

benhoyt requested a review from jameinel July 10, 2023 00:50

benhoyt mentioned this pull request Jul 10, 2023

pull_path doesn't retrieve empty directories #968

Closed

weiiwang01 and others added 2 commits July 10, 2023 23:18

Update ops/testing.py documents

9a954d1

Co-authored-by: Ben Hoyt <[email protected]>

Minor refactoring and document changes

21a0e8f

jameinel approved these changes Jul 10, 2023

View reviewed changes

weiiwang01 added 2 commits July 11, 2023 12:22

Refactor the root file info rewriting

41301c0

Merge branch 'main' into main

aec5409

Update the comment about pull_path

004ff26

benhoyt merged commit 770282b into canonical:main Jul 12, 2023
19 checks passed

benhoyt mentioned this pull request Jul 19, 2023

Enhancement request: access virtual filesystem in the test module #931

Closed

sed-i mentioned this pull request Aug 1, 2023

Do not use ops.testing._TestingFilesystem canonical/ops-scenario#48

Closed

Abuelodelanada mentioned this pull request Aug 15, 2023

TypeError: write() argument must be str, not bytes #990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use real filesystem in ops.testing backend #960

Use real filesystem in ops.testing backend #960

weiiwang01 commented Jul 3, 2023

benhoyt left a comment

benhoyt Jul 5, 2023

weiiwang01 Jul 10, 2023

jameinel Jul 10, 2023

benhoyt Jul 5, 2023

weiiwang01 Jul 5, 2023

weiiwang01 Jul 10, 2023

benhoyt Jul 5, 2023

weiiwang01 Jul 5, 2023

weiiwang01 commented Jul 5, 2023

benhoyt commented Jul 7, 2023

benhoyt left a comment

benhoyt Jul 10, 2023

weiiwang01 Jul 10, 2023

jameinel Jul 10, 2023

benhoyt Jul 10, 2023

benhoyt Jul 10, 2023

weiiwang01 commented Jul 10, 2023

benhoyt commented Jul 10, 2023

jameinel Jul 10, 2023

jameinel Jul 10, 2023

benhoyt Jul 10, 2023

weiiwang01 Jul 11, 2023

jameinel Jul 10, 2023

benhoyt commented Jul 10, 2023

benhoyt commented Jul 12, 2023

Use real filesystem in ops.testing backend #960

Use real filesystem in ops.testing backend #960

Conversation

weiiwang01 commented Jul 3, 2023

Changelog

benhoyt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weiiwang01 commented Jul 5, 2023

benhoyt commented Jul 7, 2023

benhoyt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weiiwang01 commented Jul 10, 2023

benhoyt commented Jul 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benhoyt commented Jul 10, 2023

benhoyt commented Jul 12, 2023