Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teuthology/schedule: Add "descr" option #1973

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aclamk
Copy link

@aclamk aclamk commented Jul 8, 2024

A convenience option.
When a large batch has many failed tests, "descr" makes it easy to schedule just selected ones. It is as simple as copy-paste description of failed test.

teuthology-suite -e [email protected] -vv -c aclamk-testing-ceres-2024-07-01-1011 -p 50
--descr "rados/singleton-bluestore/{all/cephtool mon_election/connectivity msgr-failures/none
msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{centos_latest}}"

Many tests may be rerun in this fashion; "descr" accepts multiple comma separated test descriptions.

@aclamk aclamk closed this Jul 8, 2024
@aclamk aclamk reopened this Jul 9, 2024
@@ -27,6 +27,32 @@

log = logging.getLogger(__name__)

def descr_to_yamls(descriptions, suites_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this methods lacks descriptions, and unit tests, could you please add some

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshtsk Please help. I need suggestion how unit test should look like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test case you've already put to the description, now just turn it into a code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're trying to use python typing for new code:

def yamls_from_desc(descriptions: str, suite_path: str) -> list[str]:

@aclamk aclamk force-pushed the wip-aclamk-rerun-by-description branch 2 times, most recently from ee7248b to bdedd70 Compare July 19, 2024 12:13
scripts/suite.py Outdated
@@ -126,6 +127,9 @@
2/<outof> ... <outof>-1/<outof> will schedule all
jobs in the suite (many more than once). If specified,
this value can be found in results.log.
--descr <descr> Use to rerun tests. Reschedule test based on their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmc what was the outcome after talk to author? Was our suggestion correct the feature more about scheduling a new suite rather than rerun?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case it must look like:
--descr Schedule a suite based on comma separated list of descriptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshtsk This is why english-ish users should get their texts double checked.

"""
Schedule the suite-run. Returns the number of jobs scheduled.
Function converts description of the run into sequence of .yaml files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct explanation. A description is string representation of a single case corresponding a job from which an arbitrary run consist of.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Fixing...

"""
Schedule the suite-run. Returns the number of jobs scheduled.
Function converts description of the run into sequence of .yaml files.
Input "descriptions" is the value of "--descr" parameter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sense to refer the parameter name, because it is external interface for the application.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

result = []
desc_tab = descriptions.split(',')
for d in desc_tab:
d = d.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad style reassigning vars

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by giving new name. I could otherwise invoke strip() twice, but I didn't like it.

Schedule the suite-run. Returns the number of jobs scheduled.
Function converts description of the run into sequence of .yaml files.
Input "descriptions" is the value of "--descr" parameter;
"description" consists of comma-separated list of run description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant plural form "descriptions" instead of singular "description". I suggest to move the splitting a description parameter out of the method, so you can use it like:

yamls = yamls_from_descr(self.args.descr.split(","), suites_path)

The descriptions for the "suites_path", however it is self description, and because of English, rename to "suite_path".

Comment on lines 578 to 583
descriptions="rerun/rados/objectstore/{backends/objectstore-bluestore-b supported-random-distro$/{centos_latest}}"
suites_path="/cephfs/github.com_ceph_build/qa/suites/"
Output:
('rados/objectstore/{backends/objectstore-bluestore-b supported-random-distro$/{ubuntu_latest}}',
['/cephfs/github.com_ceph_build/qa/suites/rados/objectstore/backends/objectstore-bluestore-b.yaml',
'/cephfs/github.com_ceph_build/qa/suites/rados/objectstore/supported-random-distro$/ubuntu_latest.yaml'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be more general, I would suggest to remove this examples from here and make a unittest case for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I run teuthology unittests?
Can I find somewhere a doc for this?
I might get how to code one by pattern-matching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tox -e py3
to run only part of it you might try: tox -e py3 -- teuthology/suite or even tox -e py3 -- teuthology/suite/test/test_util.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even a separate test:

tox -e py3 -- teuthology/suite/test/test_run_.py::TestRun::test_sha1_exists

if you have pytest installed, being in virtualenv enabled similar:

pytest teuthology/suite/test/test_run_.py::TestRun::test_sha1_exists

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshtsk Thanks, I was able to add unit tests in form of 3 separate, concrete examples. Please take a look.

Comment on lines 592 to 601
elif source[pos] == '}':
if base_pos != pos:
test_yamls.append(suites_path + "/" + prefix + source[base_pos:pos] + ".yaml")
return pos + 1
elif source[pos] == ' ':
if base_pos != pos:
test_yamls.append(suites_path + "/" + prefix + source[base_pos:pos] + ".yaml")
pos = pos + 1
base_pos = pos
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like code duplication.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. But I was faced with either repeat same code lines, or make more convoluted state processor.
You think "switch" will make it nicer?

arch = util.get_arch(self.base_config.machine_type)
suite_name = self.base_config.suite
suite_path = os.path.normpath(os.path.join(
def expand_descr(test_yamls, source, prefix, base_pos):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no descriptions for this method? I think since the function returns values and do some test_yaml modifications, it is probably better to describe briefly what is the idea behind and give a better name for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description.

Comment on lines 614 to 621
suites_path = os.path.normpath(os.path.join(
self.suite_repo_path,
self.args.suite_relpath,
'suites',
self.base_config.suite.replace(':', '/'),
'suites'
))
suite_path = os.path.normpath(os.path.join(
suites_path,
suite_name.replace(':', '/'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having suites_path and suite_path is confusing, there must be only one "suite_path"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there are many suites in path that contains all suites.
How should I name it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have to distinguish base suite directory from particular suite directory, maybe use another name base_suite_path or root_suite_path or anyways suite and suites is too confusing, because we have some nested suites.

@aclamk aclamk force-pushed the wip-aclamk-rerun-by-description branch from bdedd70 to 36d0c59 Compare August 2, 2024 15:52
A convenience option.
When a large batch has many failed tests, "descr" makes it easy to schedule
just selected ones. It is as simple as copy-paste description of failed test.

teuthology-suite -e [email protected] -vv -c aclamk-testing-ceres-2024-07-01-1011 -p 50 \
  --descr "rados/singleton-bluestore/{all/cephtool mon_election/connectivity msgr-failures/none \
  msgr/async-v2only objectstore/bluestore-comp-lz4 rados supported-random-distro$/{centos_latest}}"

Many tests may be rerun in this fashion; "descr" accepts multiple comma separated test descriptions.

Signed-off-by: Adam Kupczyk <[email protected]>
@aclamk aclamk force-pushed the wip-aclamk-rerun-by-description branch from 36d0c59 to 48c2263 Compare August 2, 2024 16:00
Function converts description of a job into sequence of .yaml files.
Input "descriptions" is a string containing comma-separated list of job descriptions.
Input "suites_path" is a posix path to dir containing all suites.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's way better, thank you, but could you please take a look at nice example of docstring usage https://github.com/ceph/teuthology/blob/main/teuthology/orchestra/remote.py#L221-L237

Comment on lines +63 to +64
for d in desc_tab:
dd = d.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating another variable you could use python style generator expression:

for d in (_.strip() for _ in descriptions.split(',')):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologise for the next example, but I was not able to refrain. As far as I understood this part of the code I would refactor it like that:

    def expand_path(paths: list[str], source: str, prefix: str, pos: int) -> int:
        """
        Expand description using production rules (explanation, not Chomsky context-free formalism):
        1) simplification: "A{BX}" => AB A{X}
        2) termination:    "A" => "A"
            
        :param paths:   pointer to the extendible list of yaml paths expanded from the source string
        :param source:  source string containing a single test description
        :param prefix:  suffix a string representation of a part of path
        :param pos:     base position in the source string to scan next portion of prefix
            
        :returns:       index of a character following closed brace, i.e. character "}".
        """
        p = i = pos
        while (i < len(source)):
            path = prefix + source[p:i]
            match (source[i]):
                case '{':
                    p = i = expand_path(paths, source, path, i + 1)
                    continue
                case '}':
                    if p != i:
                        paths.append(path)
                    return i + 1
                case ' ':
                    if p != i:
                        paths.append(path)
                    p = i = i + 1
                case _:
                    i = i + 1

    for d in (_.strip() for _ in descriptions.split(',')):
        yamls = []
        expand_path(yamls, d, "", 0)
        yield (d, [f"{base_path}/{_}.yaml" for _ in yamls])

@@ -360,3 +360,52 @@ def test_newest_success(
m_find_git_parents.assert_has_calls(
[call('ceph', 'ceph_sha1', 10)]
)

def test_dupa(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very funny, I guess you've used Polish native-speaking skills here ;-) may we have some more appropriate name, like: test_<func-name-we-test>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean:

    @pytest.mark.parametrize('desc, base, result', [
        [
            "rados/objectstore/{backends/objectstore-bluestore-b supported-random-distro$/{ubuntu_latest}}",
            "/cephfs/github.com_ceph_build/qa/suites",
            [
               ('rados/objectstore/{backends/objectstore-bluestore-b supported-random-distro$/{ubuntu_latest}}',
                ['/cephfs/github.com_ceph_build/qa/suites/rados/objectstore/backends/objectstore-bluestore-b.yaml',
                 '/cephfs/github.com_ceph_build/qa/suites/rados/objectstore/supported-random-distro$/ubuntu_latest.yaml'])
            ]
        ],
        [
            "rados/singleton-nomsgr/{all/health-warnings mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}",
            "/cephfs/home/user/src/git.ceph.com_ceph-c_aclamk-testing/qa/suites",
            [
                ("rados/singleton-nomsgr/{all/health-warnings mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}",
                 ['/cephfs/home/user/src/git.ceph.com_ceph-c_aclamk-testing/qa/suites/rados/singleton-nomsgr/all/health-warnings.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_aclamk-testing/qa/suites/rados/singleton-nomsgr/mon_election/connectivity.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_aclamk-testing/qa/suites/rados/singleton-nomsgr/rados.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_aclamk-testing/qa/suites/rados/singleton-nomsgr/supported-random-distro$/ubuntu_latest.yaml'])
            ]
        ],
        [
            "rados/cephadm/osds/{0-distro/centos_9.stream_runc 0-nvme-loop 1-start 2-ops/rm-zap-flag},"
            "rados/standalone/{supported-random-distro$/{ubuntu_latest} workloads/scrub}",
            "/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites",
            [
                ("rados/cephadm/osds/{0-distro/centos_9.stream_runc 0-nvme-loop 1-start 2-ops/rm-zap-flag}",
                 ['/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites/rados/cephadm/osds/0-distro/centos_9.stream_runc.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites/rados/cephadm/osds/0-nvme-loop.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites/rados/cephadm/osds/1-start.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites/rados/cephadm/osds/2-ops/rm-zap-flag.yaml'])
                ,
                ("rados/standalone/{supported-random-distro$/{ubuntu_latest} workloads/scrub}",
                 ['/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites/rados/standalone/supported-random-distro$/ubuntu_latest.yaml',
                  '/cephfs/home/user/src/git.ceph.com_ceph-c_squid/qa/suites/rados/standalone/workloads/scrub.yaml'])
            ]
        ],
    ])
    def test_yamls_from_desc(self, desc, base, result):
        res = list(run.descr_to_yamls(desc, base))
        print(f"Result: {res}")
        assert res == result 

I took the example you provided, but it would be more general if you check other test for matrix and build_matrix...

assert(X ==
[
("rados/singleton-nomsgr/{all/health-warnings mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}",
['/cephfs/home/akupczyk/src/git.ceph.com_ceph-c_aclamk-testing/qa/suites/rados/singleton-nomsgr/all/health-warnings.yaml',
Copy link
Contributor

@kshtsk kshtsk Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for test cases' data it is better to use impersonal path naming, but I am not hundred percent sure about this, don't thank.

if self.args.subset:
job_config.subset = '/'.join(str(i) for i in self.args.subset)
if self.args.descr:
job_config.suite = "rerun"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of "rerun" maybe it is better to define somehow a suite name from the descriptions, but this is another question, another good question is it supposed to have scheduled descriptions from different suites, this is very interesting.

@@ -564,6 +609,44 @@ def check_num_jobs(self, jobs_to_schedule):
if threshold and jobs_to_schedule > threshold:
util.schedule_fail(msg, dry_run=self.args.dry_run)

def prepare_configs(self):
suite_name = self.base_config.suite or "rerun"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment above, about detection of suite name for the description

@@ -574,29 +657,14 @@ def schedule_suite(self):
log.debug("Using '%s' as an arch" % arch)
else:
arch = util.get_arch(self.base_config.machine_type)
suite_name = self.base_config.suite
suite_path = os.path.normpath(os.path.join(
suite_name = self.base_config.suite or "rerun"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, suite_name should remain suite name, no "rerun" please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants