-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) #2349
Changes from 1 commit
3134557
716fa37
8f91400
b1e41e8
88c9af4
1c9c0f4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1112,6 +1112,51 @@ def run_firmware(port_name, mode): | |
|
||
return status | ||
|
||
def is_fw_switch_done(port_name): | ||
""" | ||
Make sure the run_firmware cmd is done | ||
@port_name: | ||
Returns 1 on success, and exit_code = -1 on failure | ||
""" | ||
status = 0 | ||
physical_port = logical_port_to_physical_port_index(port_name) | ||
sfp = platform_chassis.get_sfp(physical_port) | ||
|
||
try: | ||
api = sfp.get_xcvr_api() | ||
except NotImplementedError: | ||
click.echo("This functionality is currently not implemented for this platform") | ||
sys.exit(ERROR_NOT_IMPLEMENTED) | ||
|
||
try: | ||
MAX_WAIT = 600 # 60s timeout. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we read from the module what is advertised instead of hardcoding here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @CliveNi why 60 secs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For more compatible with different module, timeout time should be the worst case. 60s is calculated for different implementation.
Case 1 and 3 will spend the most time to switch image. It will need to copy one image to non-volatile buffer, then interchange two images with each other. That will need 3 cycles erasing and programming on non-volatile memory. We can roughly assume all of fw component in module around 1MBytes. A common M0~M4 MCU needs around 40ms per page(2KBytes) for non-volatile erasing and programming, so that the calculation as below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, if a firmware size is >> 1MB (e.g. close to 2 MB), does that mean 60 sec wait time would not be enough? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly, 1MB is a roughly number, but I think the margin is enough for different DSP solutions now. Actually, it only uses around 500KB totally in our 400/800G implementation. |
||
fw_info = api.get_module_fw_info() | ||
cnt = 0 | ||
is_busy = 1 if (fw_info['status'] == False) and (fw_info['result'] is not None) else 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you show examples of fw_info 1) when a module is busy and 2) when a module shows faulty status? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "fw_info" is returned by "get_module_fw_info()" which is in cmis.py of "sonic-platform-common". It will return 3 combination patterns of status and result as below.
The unit test function "test_is_fw_switch_done" in sfputil_test.py will also test all these cases. |
||
while is_busy and cnt < MAX_WAIT: | ||
time.sleep(0.1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May not need to check this frequently. It also brings a lot of workload to the module. Checking every 1 or 2 seconds should be OK without causing too much time delay. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, already revised to check every 2 seconds. |
||
fw_info = api.get_module_fw_info() | ||
is_busy = 1 if (fw_info['status'] == False) and (fw_info['result'] is not None) else 0 | ||
cnt += 1 | ||
|
||
if fw_info['status'] == True: | ||
(ImageA, ImageARunning, ImageACommitted, ImageAValid, | ||
ImageB, ImageBRunning, ImageBCommitted, ImageBValid) = fw_info['result'] | ||
|
||
if (ImageARunning == 1) and (ImageACommitted == 0): # ImageA is running, but not committed. | ||
status = 1 # run_firmware is done. | ||
elif (ImageBRunning == 1) and (ImageBCommitted == 0): # ImageB is running, but not committed. | ||
status = 1 # run_firmware is done. | ||
else: # No image is running, or running and committed image is same. | ||
status = -1 # Failure for Switching images. | ||
else: | ||
status = -1 # Timeout or check code error or CDB not supported. | ||
|
||
except NotImplementedError: | ||
click.echo("This functionality is not applicable for this transceiver") | ||
|
||
return status | ||
|
||
def commit_firmware(port_name): | ||
status = 0 | ||
physical_port = logical_port_to_physical_port_index(port_name) | ||
|
@@ -1280,6 +1325,13 @@ def upgrade(port_name, filepath): | |
|
||
click.echo("Firmware run in mode 1 successful") | ||
|
||
status = is_fw_switch_done(port_name) | ||
if status != 1: | ||
click.echo('Failed to switch firmware images!') | ||
sys.exit(EXIT_FAIL) | ||
|
||
click.echo("Firmware images switch successful") | ||
|
||
status = commit_firmware(port_name) | ||
if status != 1: | ||
click.echo('Failed to commit firmware! CDB status: {}'.format(status)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this API is required because there is already a return status from run_firmware() to indicate a failure. An imageA could be running while the committed Image could be Image B. For eg.
Now the running image is B but the committed image is A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"run_firmware" indicates the status which read before reset to target image. This status only means module received and done the judgement for "run_firmware" cmd, but not means module already jumped into target image.
"run_firmware" will maybe involved non-volatile memory operation in module according to module implementation, so that it will maybe spend several or tens of seconds. That will cause commit failed if sends "commit_firmware" once "run_firmware" done without "is_fw_switch_done".