[CI] Reproducibility of test failure #351

Devjiu · 2023-04-04T14:54:27Z

Developer build and workflow differs from ci one a lot.

e.g. build.yml uses to (after conda set up, cpu case) build project: omniscidb/scripts/conda/build.sh, which set some environment variables inside in not transparent way. Developer using in same condition simply: cmake .. && make -j 32 && make install

pytest.yml and modin.yml also use different approach to build - it uses: $CONDA/bin/conda run -n ${{ env.CONDA_ENV }} sh -c "cmake .. -DENABLE_CUDA=off -DENABLE_CONDA=on -DENABLE_PYTHON=on -DCMAKE_BUILD_TYPE=release && make -j2 && make install"

test.yml (test-docker, test-l0-docker jobs also) uses to run sanity tests some omniscidb/scripts/conda/test.sh which uses get_cxx_include_path.sh and also sets some env variables. Developer simply uses make sanity_tests,

test-l0-docker.yml uses for sanity tests omniscidb/scripts/conda/intel-gpu-enabling-test.sh.

All this hiding makes fails reported by CI difficult to reproduce. In addition, duplication (cache/build) requires several CI code updates to keep it consistent and support new features for build and test.

The text was updated successfully, but these errors were encountered:

Garra1980 · 2023-04-05T13:53:14Z

Agree, at least difference in a way we run build in CI and the one described here - https://github.com/intel-ai/hdk#build - bothers me as well

alexbaden · 2023-04-05T14:18:31Z

Is there a specific failure that has been hard to reproduce? Other than the conda-forge build problems or differences in packages across the CI environments we currently test in, I have not experienced a failure that can be linked to build differences between my environment and the CI.

Devjiu · 2023-04-05T14:42:21Z

Is there a specific failure that has been hard to reproduce? Other than the conda-forge build problems or differences in packages across the CI environments we currently test in, I have not experienced a failure that can be linked to build differences between my environment and the CI.

Currently docker build looks difficult to work with, as for me. But this issue can be resolved with docker.io that already in todo list. So let's don't count them.

My point is that there are a lot of hidden environment variable changes, so there are a lot of places where you have to look for the missing configuration in case of failure.

I will share build examples when I run into issues like this.

You currently have several points in code, that changing environment, you pointing out, that you have not faced issues. Does it mean that have multiple places to setup env is fine?
I can write code that works but is hard to maintain. Let me write everything in one line to one file, and when you tell me that it is difficult to work with, I will say that I have no problems.

Devjiu · 2023-04-11T14:07:41Z

One of the mystic failures, that was fixed for unknown reason:

On Saturday (8.04.23) nightly sanity tests failed in docker with cuda https://github.com/intel-ai/hdk/actions/runs/4643017334/jobs/8217872122#step:8:4233
On the same day, it was retriggered and successfully passed. https://github.com/intel-ai/hdk/actions/runs/4643017334/jobs/8219242858

~~It means, that our testing can produce fake postive/negative on unknow reason.~~
[Upd] Issued with race condition on some of tests. (smth like JoinHashTable, not related with configuration, can be reproduced with increasing of threads number on any)

Devjiu · 2023-04-11T14:23:07Z

PR #369 increases reproducibility as soon as it possible to pull docker image (cuda/l0) from https://hub.docker.com/r/dataved/build.cuda/tags .

Devjiu added the tests label Apr 4, 2023

Devjiu mentioned this issue Apr 6, 2023

[CI] Create and verify building env once during nightly. #366

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Reproducibility of test failure #351

[CI] Reproducibility of test failure #351

Devjiu commented Apr 4, 2023

Garra1980 commented Apr 5, 2023

alexbaden commented Apr 5, 2023

Devjiu commented Apr 5, 2023

Devjiu commented Apr 11, 2023 •

edited

Loading

Devjiu commented Apr 11, 2023

[CI] Reproducibility of test failure #351

[CI] Reproducibility of test failure #351

Comments

Devjiu commented Apr 4, 2023

Garra1980 commented Apr 5, 2023

alexbaden commented Apr 5, 2023

Devjiu commented Apr 5, 2023

Devjiu commented Apr 11, 2023 • edited Loading

Devjiu commented Apr 11, 2023

Devjiu commented Apr 11, 2023 •

edited

Loading