Skip to content

Fix reverse DNS lookup for current hostname in GitHub Actions CI Runner VMs#7588

Closed
lhotari wants to merge 2 commits intotrinodb:masterfrom
lhotari:lh-fix-reverse-hostname-lookup
Closed

Fix reverse DNS lookup for current hostname in GitHub Actions CI Runner VMs#7588
lhotari wants to merge 2 commits intotrinodb:masterfrom
lhotari:lh-fix-reverse-hostname-lookup

Conversation

@lhotari
Copy link
Copy Markdown

@lhotari lhotari commented Apr 14, 2021

Fixes #7534

  • Add current ip address, long hostname and short hostname to /etc/hosts
  • Fixes reverse DNS lookup for current hostname
    and related issues such as java.net.BindException: Cannot assign requested address

Fixes trinodb#7534
- Fixes reverse DNS lookup for current hostname
  and related issues such as "java.net.BindException: Cannot assign requested address"
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Apr 14, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@lhotari
Copy link
Copy Markdown
Author

lhotari commented Apr 14, 2021

I have signed the CLA and sent it to cla@trino.io .

Comment thread .github/actions/tune-runner-vm/action.yml
@kokosing
Copy link
Copy Markdown
Member

Can you please try to stress the CI for phoenix tests by adding one more commit that should be removed before merging where phoenix job is executed several times and other jobs are removed. Just to see that this flaky issue is really gone.

Copy link
Copy Markdown
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome in the community. Thanks fixing this!

Comment thread .github/actions/tune-runner-vm/action.yml Outdated
Comment thread .github/actions/tune-runner-vm/action.yml Outdated
Comment thread .github/workflows/ci.yml
sudo apt-get clean
df -h
- uses: actions/checkout@v2
- uses: ./.github/actions/tune-runner-vm
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply this only for phoenix job.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a "the phoenix job" mean? I think it makes sense to apply the configuration to all jobs since there's no harm in doing that.

The intention of "tune-runner-vm" action is to be able to add tuning for GitHub Actions Runner VMs. After the reverse name lookup problem has been fixed in GitHub, it might be able to leave the tune-runner-vm action in place, but just make it a no-op in case some other optimizations are added later.

Some useful tuning to do in tune-runner-vm is to adjust other Linux kernel settings such as vm.swappiness or THP settings, for example:

          echo 1 | sudo tee /proc/sys/vm/swappiness
          echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

Depending on the type of builds, these might be useful.

Comment thread .github/actions/tune-runner-vm/action.yml Outdated
@kokosing
Copy link
Copy Markdown
Member

Can you please try to stress the CI for phoenix tests by adding one more commit that should be removed before merging where phoenix job is executed several times and other jobs are removed. Just to see that this flaky issue is really gone.

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 14, 2021

Can you please try to stress the CI for phoenix tests by adding one more commit that should be removed before merging where phoenix job is executed several times and other jobs are removed. Just to see that this flaky issue is really gone.

See #7592 for an example

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 14, 2021

Or #7593

@lhofhansl
Copy link
Copy Markdown
Member

Hi @lhotari thanks for fixing this. Curious... Could you briefly explain how this fixes the problem and why this used to cause the problem only sometimes?

@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Apr 15, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@lhotari
Copy link
Copy Markdown
Author

lhotari commented Apr 15, 2021

Hi @lhotari thanks for fixing this. Curious... Could you briefly explain how this fixes the problem and why this used to cause the problem only sometimes?

@lhofhansl I have reported the issue as actions/runner-images#3185 .
Since the hostnames are reused and IP addresses change, my assumption is that the Runner VM must have the ip address and short and fqdn hostnames entry in /etc/hosts to resolve the issue. The reason for the flakiness is some kind of DNS originated behavior. It might be a Azure issue since GitHub Runner VMs are Azure VMs and seem to use Azure's tooling for VM management. The solution should ensure that the reverse lookup for the current hostname returns it's IP address. A lot of network libraries break if this isn't the case.

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 15, 2021

Is it only problem when hostname is changed during test execution, or it's an issue with VM setup where own hostname is cached as resolving to something else than own IP?

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 15, 2021

I agree with @kokosing it would be preferably applied to phoenix job only (actually: the test task, which is a common configuration for module tests).

i am stress-testing the fix @ #7593 (comment)

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 16, 2021

The stress test looks great!
Compare https://github.com/trinodb/trino/runs/2347684843 and https://github.com/trinodb/trino/pull/7593/checks?check_run_id=2356428686

@lhotari let's limit this to test job only. I hope GHA fixes their infra/setup and we will just remove our workarond.

@martint we need CLA help here.

@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 16, 2021

Can you add a revert of #7613 as part of this PR please?

@findepi findepi mentioned this pull request Apr 21, 2021
@findepi
Copy link
Copy Markdown
Member

findepi commented Apr 22, 2021

Thank you @lhotari for your PR and more importantly -- seeing the problem to be fixed on GHA side.
This PR got obsoleted (see #7671 for current status), so let me close it.

Thanks!

@findepi findepi closed this Apr 22, 2021
@lhofhansl
Copy link
Copy Markdown
Member

BTW. Thank for figuring this out. We ran into the very same problem recently with tests in the Phoenix connectors and this helped us fix it there too :)

See: apache/phoenix-connectors#51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Testing Phoenix service is flaky: Cannot assign requested address

5 participants