Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear fabric counters queue/port #2892

Merged

Conversation

kenneth-arista
Copy link
Contributor

@kenneth-arista kenneth-arista commented Jul 5, 2023

What I did

This PR is a clone of #2789.

Added two clear commands for fabric counters queue and fabric counters port.

sonic-clear fabriccountersport
sonic-clear fabriccountersqueue

Fabric counters are cleared and saved with these two commands. For example,

# show fabric counters port
  ASIC    PORT    STATE    IN_CELL    IN_OCTET    OUT_CELL    OUT_OCTET    CRC    FEC_CORRECTABLE    FEC_UNCORRECTABLE    SYMBOL_ERR
------  ------  -------  ---------  ----------  ----------  -----------  -----  -----------------  -------------------  ----------
...
     0      49       up          1         244           0            0       0                  2        2,372,752,496               0
     0      50       up          2         315           1          135       0                  4        2,522,457,120               4
...
# sonic-clear fabriccountersport
Clear and update saved counters port
# show fabric counters port
  ASIC    PORT    STATE    IN_CELL    IN_OCTET    OUT_CELL    OUT_OCTET    CRC    FEC_CORRECTABLE    FEC_UNCORRECTABLE    SYMBOL_ERR
------  ------  -------  ---------  ----------  ----------  -----------  -----  -----------------  -------------------  ------------
...
    0      49       up          0           0           0            0      0                  0                    0             0
    0      50       up          0           0           0            0      0                  0                    0             0

@kenneth-arista
Copy link
Contributor Author

@jfeng-arista

@kenneth-arista
Copy link
Contributor Author

@arlakshm

@kenneth-arista
Copy link
Contributor Author

@gechiang

This PR is a clone of
sonic-net#2789.

Added two clear commands for fabric counters queue and fabric counters
port.

sonic-clear fabriccountersport
sonic-clear fabriccountersqueue
@kenneth-arista kenneth-arista force-pushed the master-clear-fabric-counters branch from 60468bc to fa00dd6 Compare July 17, 2023 20:13
@kenneth-arista
Copy link
Contributor Author

/azpw run Azure.sonic-utilities

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-utilities

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kenneth-arista
Copy link
Contributor Author

The test_clear_tag failure doesn't appear to be related to this change. The error is as follows:

E       assert '          IFACE    RX_OK       RX_BPS    RX_PPS    RX_ERR    TX_OK      TX_BPS    TX_PPS    TX_ERR\n---------------  ...0.00/s         0\n       Vlan1000        0     0.00 B/s    0.00/s         0        0    0.00 B/s    0.00/s         0\n' in "\nFile '/tmp/cache/intfstat/1001-test/intfstat' does not exist\nDid you run 'intfstat -c -t test' to record the counters via tag test?\n\n"

@jfeng-arista
Copy link
Contributor

/azpw run Azure.sonic-utilities

@jfeng-arista
Copy link
Contributor

/AzurePipelines run Azure.sonic-utilities

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 2892 in repo sonic-net/sonic-utilities

@gechiang
Copy link
Contributor

Restarted failed job

@gechiang
Copy link
Contributor

@qiluo-msft , many PRs are all failing on this test:

2023-07-19T16:42:39.2354382Z tests/intfstat_test.py::TestIntfstat::test_clear_tag FAILED [ 40%]

Is this something broken due to some recent change that went in? How do we unblock this PR where this failure is not related to the PR changes?

Thanks!

@gechiang
Copy link
Contributor

/easyCLA

@gechiang
Copy link
Contributor

@kenneth-arista , can you update your branch to fix the outdated branch issue?
Thanks!

@kenneth-arista
Copy link
Contributor Author

Looks like @qiluo-msft already merged master into the branch. Thanks!

@kenneth-arista
Copy link
Contributor Author

@StormLiangMS this PR supersedes #2789 and should be backported to 202305

arlakshm
arlakshm previously approved these changes Aug 9, 2023
@arlakshm
Copy link
Contributor

arlakshm commented Aug 9, 2023

/Azp run Azure.sonic-utilities

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

gechiang
gechiang previously approved these changes Aug 9, 2023
Copy link
Contributor

@gechiang gechiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gechiang gechiang requested a review from qiluo-msft August 9, 2023 00:42
@gechiang
Copy link
Contributor

gechiang commented Aug 9, 2023

@qiluo-msft , can you help review this PR?
Thanks!

cnstat_fqn_file_port_name = cnstat_fqn_file_port + asic_name
json.dump(cnstat_dict, open(cnstat_fqn_file_port_name, 'w'), default=json_serial)
except IOError as e:
print(e.errno, e)
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print

print to stderr. #WontFix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it, just that I see other commands print the same IOError like this, so I was trying to get the same style as those. Please let me know if we still want to print to stderr or like this. thank you

try:
cnstat_cached_dict = json.load(open(cnstat_fqn_file_port_name, 'r'))
except IOError as e:
print(e.errno, e)
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print

print to stderr. #WontFix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it, just that I see other commands print the same IOError like this, so I was trying to get the same style as those. Please let me know if we still want to print to stderr or like this. thank you

if os.path.isfile(cnstat_fqn_file_port_name):
try:
cnstat_cached_dict = json.load(open(cnstat_fqn_file_port_name, 'r'))
except IOError as e:
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except

When exception is caught, do you want to exit earlier? #WontFix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file not exist, I think we can still print out the counters ( with no cleaned ) , instead of print out nothing .
I read the clean code for other counters ( not exactly the same code piece as what I wrote ), but when they don't have the cached file, it still print out the uncleared counters. Please suggest if this behavior is ok or you want to print out nothing in this case. thank you

def save_fresh_stats(self):
# Get stat for each port and save
counter_port_name_map = self.db.get_all(self.db.COUNTERS_DB, COUNTERS_FABRIC_PORT_NAME_MAP)
if counter_port_name_map is None:
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counter_port_name_map

not used. is it needed? if yes, do you want to print some info before return? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printed out a message before the return

if os.path.isfile(cnstat_fqn_file_queue_name):
try:
cnstat_cached_dict = json.load(open(cnstat_fqn_file_queue_name, 'r'))
except IOError as e:
Copy link
Contributor

@qiluo-msft qiluo-msft Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except

do you want to return earlier if exception caught? #WontFix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file not exist, I think we can still print out the counters ( with no cleaned ) , instead of print out nothing .
I read the clean code for other counters ( not exactly the same code piece as what I wrote ), but when they don't have the cached file, it still print out the uncleared counters. Please suggest if this behavior is ok or you want to print out nothing in this case. thank you

@jfeng-arista jfeng-arista dismissed stale reviews from gechiang and arlakshm via 95ad6e3 August 15, 2023 21:56
@rlhui rlhui added the p0 label Aug 18, 2023
Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please also check with other active reviewers.

@jfeng-arista
Copy link
Contributor

The change looks good to me

@arlakshm arlakshm merged commit 7f8779d into sonic-net:master Aug 19, 2023
@kenneth-arista kenneth-arista deleted the master-clear-fabric-counters branch August 22, 2023 21:33
@rlhui
Copy link
Contributor

rlhui commented Aug 23, 2023

@arlakshm please help create a msft ado if it helps to speed up cherry-pick, thx

@gechiang gechiang added the included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" label Aug 23, 2023
@gechiang
Copy link
Contributor

gechiang commented Aug 27, 2023

@kenneth-arista , This PR when cherry-picked to the msft repo it is causing build issue as described by @mlok-nokia
this is causing build issues where PR is failing on
tests/fabricstat_test.py::TestFarbicStat::test_single_clear_fabric_counter
tests/fabricstat_test.py::TestMultiAsicFabricStat::test_multi_show_fabric_counters_queue_clear

I have reverted this PR from the msft repo and removing the "included in chassis for 202205 branch".
@kenneth-arista , please help investigate why this is an issue... you can try re-include this PR on the "https://github.com/Azure/sonic-buildimage-msft/tree/202205" repo and see the build hitting PR test failure issue.
Once this PR is reverted, then it works fine.

@gechiang gechiang removed the included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" label Aug 27, 2023
@gechiang
Copy link
Contributor

@arlakshm , please help follow up with Arista team (@kenneth-arista) on the issue that this PR is causing on the azure/sonic-buildimage-msft 202205 branch. I have reverted this Change from the sonic-utilities.msft 202205 repo so the most current snapshot on "azure/sonic-buildimage-msft 202205 branch" no longer has this problem PR. BUt we need this be investigated and fix so we can bring back this fix to "azure/spmoc-utilities.msft 202205" and "azure/sonic-buildimage-msft 202205" repos.

@jfeng-arista
Copy link
Contributor

I can work on this and create a PR of clear fabric counters in 202205 branch. Please let me know which branches should I work on . thank you

@gechiang
Copy link
Contributor

I can work on this and create a PR of clear fabric counters in 202205 branch. Please let me know which branches should I work on . thank you

Can you raise a PR in this repo:
https://github.com/Azure/sonic-utilities.msft/tree/202205

And make sure you test it (build it) under this repo:
https://github.com/Azure/sonic-buildimage-msft/tree/202205

If it addresses the build PR test failure issues, then it is ready.
Appreciate your extra effort @jfeng-arista
Thanks!

@jfeng-arista
Copy link
Contributor

got it . thank you !

@jfeng-arista
Copy link
Contributor

@gechiang gechiang added the included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" label Sep 30, 2023
@gechiang
Copy link
Contributor

Adding back the label to indicate it got merged in MSFT repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis for 202205 Branch included in chassis for 202205 branch indicate that this PR got merged into the "chassis for 202205 branch" p0
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants