Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: test/test_snmp_queue_counters.py and telemetry/test_telemetry.py issue of counter mismatch because of no proper delay after config-reload #15683

Closed
harjotsinghpawra opened this issue Nov 21, 2024 · 1 comment · Fixed by #15688

Comments

@harjotsinghpawra
Copy link
Contributor

Issue Description

When we run these scripts sometimes based on the platform and image along with other factors it takes some time for ports to come up and buffer queues to be generated and then further Snmp OID or even gnmi info to be genrated .

So in script we immediately try to snmpwalk after all docker are up . But interfaces are still not up so no oid is generated .
Snmpwalk says No Such Instance currently exists at this OID whihc script count as 1 counter being created when none is created, which causes test case to fail

Results you see

Ethernet0_queue_cntrs_oid = '1.3.6.1.4.1.9.9.580.1.5.5.1.4.1'
creds_all_duts = {'mth64-m5-2': {'ansible_altpasswords': [], 'ansible_become_pass': 'roZes@123', 'ansible_ssh_pass': 'roZes@123', 'ansible_ssh_user': 'admin', ...}}
data = {'ACL_TABLE': {'DATAACL': {'policy_desc': 'DATAACL', 'ports': ['PortChannel101', 'PortChannel102', 'PortChannel103', '...'enabled'}, 'eventd': {'available_mem_threshold': '10.0', 'rate_limit_interval': '600', 'state': 'enabled'}, ...}, ...}
duthost = <MultiAsicSonicHost mth64-m5-2>
duthosts = [<MultiAsicSonicHost mth64-m5-2>]
enum_rand_one_per_hwsku_frontend_hostname = 'mth64-m5-2'
get_bfr_queue_cntrs_cmd = 'docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1'
hostip = '1.74.23.17'
multicast_expected_diff = 16
queue_counters_cnt_post = 1
queue_counters_cnt_pre = 1
unicast_expected_diff = 8

["docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1"], kwargs={}
12:37:54 base._run L0108 �[35mDEBUG �[0m| /data/tests/common/devices/multi_asic.py::_run_on_asics#134: [mth64-m5-2] AnsibleModule::shell Result => {"changed": true, "stdout": "iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID", "stderr": "", "rc": 0, "cmd": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "start": "2024-08-28 12:37:55.343677", "end": "2024-08-28 12:37:55.452104", "delta": "0:00:00.108427", "msg": "", "invocation": {"module_args": {"_raw_params": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "_uses_shell": true, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": ["iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID"], "stderr_lines": [], "_ansible_no_log": null, "failed": false}

Results you expected to see

Counter info should match and after config relaod it should wait for all interfaces to come up as well all OID's to be generated under reasonable time.

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

@harjotsinghpawra
Copy link
Contributor Author

Working on the fix.

@harjotsinghpawra harjotsinghpawra changed the title [Bug]: test/test_snmp_queue_counters.py and telemetry/test_telemetry.py issue of counter mismatch because of no proper delay after conf-g reload [Bug]: test/test_snmp_queue_counters.py and telemetry/test_telemetry.py issue of counter mismatch because of no proper delay after config-reload Nov 22, 2024
yejianquan pushed a commit that referenced this issue Nov 22, 2024
…walk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix (#15688)

test_snmp_queue_counters.py/test_telemetry.py config_reload and snmpwalk output time delay fix, test_snmp_queue_counters.py multi-asic KeyError fix

Description of PR
Scripts:
test_snmp_queue_counters.py
test_telemetry

/////////////////////////////////////////////////
First Issue :
When we run these scripts sometimes based on the platform and image along with other factors it takes some time for ports to come up and buffer queues to be generated and then further Snmp OID or even gnmi info to be genrated .

In script we immediately try to snmpwalk after all docker are up . But interfaces are still not up so no oid is generated .
Snmpwalk says No Such Instance currently exists at this OID whihc script count as 1 counter being created when none is created, which causes test case to fail.

enum_rand_one_per_hwsku_frontend_hostname = 'mth64-m5-2'
get_bfr_queue_cntrs_cmd = 'docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1'
hostip = '1.74.23.17'
multicast_expected_diff = 16
queue_counters_cnt_post = 1
queue_counters_cnt_pre = 1
unicast_expected_diff = 8

["docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1"], kwargs={}
12:37:54 base._run L0108 �[35mDEBUG �[0m| /data/tests/common/devices/multi_asic.py::_run_on_asics#134: [mth64-m5-2] AnsibleModule::shell Result => {"changed": true, "stdout": "iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID", "stderr": "", "rc": 0, "cmd": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "start": "2024-08-28 12:37:55.343677", "end": "2024-08-28 12:37:55.452104", "delta": "0:00:00.108427", "msg": "", "invocation": {"module_args": {"_raw_params": "docker exec snmp snmpwalk -v2c -c public 1.74.23.17 1.3.6.1.4.1.9.9.580.1.5.5.1.4.1", "_uses_shell": true, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": ["iso.3.6.1.4.1.9.9.580.1.5.5.1.4.1 = No Such Instance currently exists at this OID"], "stderr_lines": [], "_ansible_no_log": null, "failed": false}

//////////////////////////////////////////////////
Second issue :
In test_snmp_queue_counters script in multi-asic case we choose a buffer_queue of first interface mentioned in BUFFER_QUEUE config and then we try to match that, also we search asic.namepace in queue name which is invalid check which causes buffer_queue_to_del to be None.

This in turn fails the test case by saying that KeyError: None when we try to delete buffer
result = testfunction(**testargs)
File "/var/src/sonic-mgmt/tests/snmp/test_snmp_queue_counters.py", line 123, in test_snmp_queue_counters
del data['BUFFER_QUEUE'][buffer_queue_to_del]
KeyError: None

Summary:
Fixes #15683 and #15686

Approach
What is the motivation for this PR?
How did you do it?
1.) added necessary checks so that all the interfaces are up and oid's are generated only then take command output.
2.) changed wrong logic of multi asic buffer queue selection and alsoimproved it to work for both single and multi-asic system.
3.) Also added extra check where i match the OID's of counters generated by snmp with queuestat output because they should match queuestat gives the latest information.

How did you verify/test it?
Ran it on local CISCO platforms and its passing

co-authorized by: [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant