Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(tests): Disable flaky tests on MacOS #4251

Closed
wants to merge 5 commits into from

Conversation

ktff
Copy link
Contributor

@ktff ktff commented Oct 1, 2020

Ref. #2978, #4056

I'll just keep disabling tests for Mac in this PR until we get a consistently passing check, and after that make a triage of them. This to avoid accumulation of broken tests behind a couple, hopefully, of them at moment

Tests disabled on Mac:

  • tcp_stream_detects_disconnect
  • file_update
  • topology tests

@ktff ktff added domain: tests Anything related to Vector's internal tests platform: macos Anything `macos` platform related labels Oct 1, 2020
@ktff ktff added this to the 2020-09-28 - Derezzed milestone Oct 1, 2020
@ktff ktff self-assigned this Oct 1, 2020
@jamtur01
Copy link
Contributor

jamtur01 commented Oct 1, 2020

cc @leebenson

ktff added 2 commits October 1, 2020 21:38
Signed-off-by: ktf <[email protected]>
@leebenson
Copy link
Member

Even if all tests pass, there's #4196 to contend with. I added some additional commentary to that issue this morning.

I'm not sure how to probe it further, without being able to SSH into an active runner and do some more extensive stack tracing against the compiled binary. I can't recreate this locally at all. I've seen segfaults in other environments where memory pressure was clearly the issue (i.e. actual "out of memory" errors in the console), so possibly this is a memory pressure thing. Not sure. It could also be a genuine segfault due to unsafe code in a lib somewhere, but why then would this only show up in macOS and not my own macOS environment at that, I don't know.

Not sure how to probe this any further without choosing a 'proper' CI where we have more control over the underlying hardware, OS and can SSH into the environment.

@leebenson
Copy link
Member

leebenson commented Oct 2, 2020

It could also be due to something I've done in GraphQL tests if, like @jszwedko commented, this was indeed introduced in #4187 (although I'm pretty sure I've come across this a time or two before; feels older than a few days.)

@jszwedko
Copy link
Member

jszwedko commented Oct 2, 2020

@leebenson It's a bit hacky, but I've actually used https://github.com/mxschmitt/action-tmate before to get SSH access to a running Github Action job for debugging a similar type of issue that only happens on CI.

@leebenson
Copy link
Member

Thanks @jszwedko, I was looking for something like that! Will give it a shot.

Signed-off-by: ktf <[email protected]>
@ktff
Copy link
Contributor Author

ktff commented Oct 2, 2020

@leebenson

Even if all tests pass, there's #4196 to contend with

That sounds like a tough issue. Then I would at least like to either confirm or eliminate the possibility that some smallish subset of tests can cause #4196. It's quite possible that some specific tests are corrupting the memory which then causes SEGFAULTs at random times. Although I'm now aware that this approach in PR may not work.

@ktff
Copy link
Contributor Author

ktff commented Oct 3, 2020

@leebenson as you reported in #4196 SEGFAULT started to happen after the tests where whole test groups can be the cause, and with recent bors upgrade this isn't viable approach so I'll close the PR.

@ktff ktff closed this Oct 3, 2020
@binarylogic binarylogic deleted the ktff/disable_tests_on_mac branch January 19, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: tests Anything related to Vector's internal tests platform: macos Anything `macos` platform related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants