Ensure api_fqdn Is Resolvable Within Milliseconds or log #75

sean-horn · 2015-01-30T01:47:50Z

ZenDesk 2718, Zendesk 2393

As a customer and employee of Chef, I would like Chef Server and Enterprise Chef server to warn me when name resolution is compromised with delay or failure.

Basic infrastructure like DNS should be reliable and performant, yes.
Chef Server should behave gracefully in the event of DNS misbehavior, or at the very least warn of the problem.

Currently, EC11.2.6 and presumably later versions are unable to commit sandboxes in the presence of multi second delays in node['api_fqdn'] resolution. The following is the only indication of the problem.

      2015-01-29_21:28:28.69569 [error] Checking presence of checksum: <<"59b2fe13f9c6a776b9f69eb00ac2b49f">> for org <<"c8d751a6b6d1445aa5cdc6c7552e4dee">>  from bucket "bookshelf" has taken longer than 5000 ms

      2015-01-29_21:28:28.72001 [error] {<<"method=PUT; path=/organizations/pedant-testorg-12311/sandboxes/c6c7552e4deef662ca00efba97d6f846; status=500; ">>, {error,{throw,{checksum_check_error,1},[{chef_wm_named_sandbox,validate_checksums_uploaded,2,[{file,"src/chef_wm_named_sandbox.erl"},{line,144}]}, {chef_wm_named_sandbox,from_json,2,[{file,"src/chef_wm_named_sandbox.erl"},{line,99}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,186}]}, {webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,142}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]}, {webmachine_decision_core,accept_helper,1,[{file,"src/webmachine_decision_core.erl"},{line,612}]},{webmachine_decision_core,decision,1,[{file,"src/webmachine_decision_core.erl"}, {line,517}]},{webmachine_decision_core,handle_request,2,[{file,"src/webmachine_decision_core.erl"},{line,33}]}]}}}

We should add monitoring of the time required for a gethostbyname or getaddress or whatever forces a name resolution. If it creeps above a default of 1000ms, we should begin to warn periodically in the erchef logfile.

The above would be much easier to diagnose and workaround if the erchef log contained something like this

2015-01-29_21:28:28.72001 [error] {<<"Heyo, I am seeing an average of 1800s delays for resolving chef-server1.something.local">>, {error,{throw,{name_resolution_check_error,1}

The text was updated successfully, but these errors were encountered:

sean-horn · 2015-01-30T08:01:49Z

A simple test in Ruby might be

time /opt/opscode/embedded/bin/ruby -e "require 'resolv'; p Resolv.getaddress('YOUR_API_FQDN')"

jeremiahsnapp · 2015-03-11T18:40:12Z

Zendesk 3319

markan · 2020-07-24T21:46:06Z

This might be a natural thing to put in the status endpoint/promethus endpoint. Something where we resolve FQDN and a ping of all critical nodes we talk to and log the status would simplify a lot of debugging efforts

sean-horn added the bug label Jan 30, 2015

sean-horn changed the title ~~Ensure api_fqdn Is Resolvable Within Milliseconds~~ Ensure api_fqdn Is Resolvable Within Milliseconds or log Jan 30, 2015

stevendanna added this to the accepted-minor milestone Jun 17, 2015

tas50 added Type: Bug Does not work as expected. and removed bug labels Jan 4, 2019

PrajaktaPurohit added Status: To be prioritized Indicates that product needs to prioritize this issue. Triage: Confirmed Indicates and issue has been confirmed as described. Type: Enhancement Adds new functionality. and removed Type: Bug Does not work as expected. labels Jul 24, 2020

stevendanna removed this from the accepted-minor milestone Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure api_fqdn Is Resolvable Within Milliseconds or log #75

Ensure api_fqdn Is Resolvable Within Milliseconds or log #75

sean-horn commented Jan 30, 2015

sean-horn commented Jan 30, 2015

jeremiahsnapp commented Mar 11, 2015

markan commented Jul 24, 2020

Ensure api_fqdn Is Resolvable Within Milliseconds or log #75

Ensure api_fqdn Is Resolvable Within Milliseconds or log #75

Comments

sean-horn commented Jan 30, 2015

sean-horn commented Jan 30, 2015

jeremiahsnapp commented Mar 11, 2015

markan commented Jul 24, 2020