-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System Ready #10479
System Ready #10479
Conversation
This pull request introduces 15 alerts when merging 590ee50a9547bf9d9fd88b478aeefdca37e0f571 into b152f2a - view on LGTM.com new alerts:
|
Could you please add comments for each new function/class? |
Could you please fix the LGTM issues? |
Could please re-format files? I see many format issues, for example:
should be
|
Could you please remove unused comments? |
@@ -56,6 +56,9 @@ | |||
"has_global_scope": {% if feature + '.service' in installer_services.split(' ') %}true{% else %}false{% endif %}, | |||
"has_per_asic_scope": {% if feature + '@.service' in installer_services.split(' ') %}true{% else %}false{% endif %}, | |||
"auto_restart": "{{autorestart}}", | |||
{%- if feature in ["bgp", "swss", "pmon"] %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add comment to explain the purpose here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import multiprocessing | ||
from collections import OrderedDict | ||
import tempfile | ||
from swsssdk import ConfigDBConnector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ConfigDBConnector in swsscommon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mpmgr = multiprocessing.Manager() | ||
|
||
#Logger class for syslog | ||
class Logger(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use sonic_py_common.logger.Logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#Initalise the syslog infrastructure | ||
logger = Logger(SYSLOG_IDENTIFIER) | ||
|
||
class Dict2Obj(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a special reason to use this class instead of directly using dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
|
||
def unlock(self): | ||
fcntl.flock(self.f, fcntl.LOCK_UN) | ||
#self.f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will the file being closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed lock as it is not need here.
if self.task_stopping_event.is_set(): | ||
return | ||
self.task_queue.put(msg) | ||
time.sleep(0.001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
time.sleep(0.001) | ||
|
||
def task_stop(self): | ||
self.task_stopping_event.set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you need stop the event loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the event loop runs, as ProcessTaskBase in src/sonic-py-common/sonic_py_common/task_base.py has task_stop which actually kills the subprocess if not joined(exited) on time.
def task_stop(self):
# Signal the process to stop
self.task_stopping_event.set()
# Wait for the process to exit
self._task_process.join(self._stop_timeout_secs)
# If the process didn't exit, attempt to kill it
if self._task_process.is_alive():
os.kill(self._task_process.pid, signal.SIGKILL)
if self._task_process.is_alive():
return False
|
||
update_proc_name('sysmonitor') | ||
event = 'SERVICE_EVENT' | ||
sysready_lock.lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the lock for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed lock as it is not needed here.
# Queue to receive the STATEDB and Systemd state change event | ||
while True: | ||
try: | ||
msg = myQ.get(timeout=15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use constant value instead of magic number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/system-health/scripts/healthd
Outdated
@@ -72,6 +73,8 @@ class HealthDaemon(DaemonBase): | |||
import sonic_platform.platform | |||
chassis = sonic_platform.platform.Platform().get_chassis() | |||
manager = HealthCheckerManager() | |||
sysmon = Sysmonitor() | |||
sysmon.system_service() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sysmon.system_service is a infinite loop, the rest of the code in this function will never run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken care.
could you please add unit test? |
This pull request introduces 4 alerts when merging 9388306a2644f61c903aa5b918f7ba5564af4af0 into 5cd6bc4 - view on LGTM.com new alerts:
|
SYSTEM_READY_TABLE = 'SYSTEM_READY' | ||
SYSTEM_ALLSRV_STATE="DOWN" | ||
SYSREADY_LOCKFILE="/var/run/sysready.lock" | ||
dnsrvs_name_list=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate the reason to keep it as a list? Query/remove an element from set is faster than list.
|
||
#add the services from the below targets | ||
path= ["/etc/systemd/system/multi-user.target.wants", "/etc/systemd/system/sonic.target.wants"] | ||
for p in path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand.
global dnsrvs_name_list | ||
global allsrvs_dict | ||
global allsrvs_status | ||
global spl_srv_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spl_srv_list not used.
def run_systemctl_show(service): | ||
command = ('systemctl show {} --property=Id,LoadState,UnitFileState,Type,ActiveState,SubState,Result'.format(service)) | ||
output = utils.run_command(command) | ||
srv_properties = output.split('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use output.splitlines()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, used glob in place of listdir and modified dnsrvs_name from list to set as per suggestion.
|
||
def post_unit_status(st_db, srv_name, srv_status, app_status, fail_reason, update_time): | ||
key = 'ALL_SERVICE_STATUS|{}'.format(srv_name) | ||
st_db.set(st_db.STATE_DB, key, 'service_status', srv_status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you consider using hmset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This pull request introduces 1 alert when merging ee0818c6f8ddc19ae926e7a128016717bd515f84 into 3fc3259 - view on LGTM.com new alerts:
|
e6f68dd
to
001b7ed
Compare
This pull request introduces 1 alert when merging 001b7ed92156e6ff9099f8a254a0c711ddee122c into e0f5333 - view on LGTM.com new alerts:
|
95e5686
to
9782dbe
Compare
Taken care of all the format issues |
Added comments for all functions and classes |
Have added unit test coverage for the system ready code in test_system_health.py |
/azpw run Azure.sonic-buildimage |
1 similar comment
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
1 similar comment
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run Azure.sonic-buildimage |
Commenter does not have sufficient privileges for PR 10479 in repo Azure/sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Commenter does not have sufficient privileges for PR 10479 in repo Azure/sonic-buildimage |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
description "This configuration controls the system ready tool to check | ||
the app ready/up status"; | ||
type boolean; | ||
default false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add yang model unit test?
We recently fixed a bug #10986
…anch Related work items: #52, #71, #73, #75, #77, sonic-net#1306, sonic-net#1588, sonic-net#1991, sonic-net#2031, sonic-net#2040, sonic-net#2053, sonic-net#2066, sonic-net#2069, sonic-net#2087, sonic-net#2107, sonic-net#2110, sonic-net#2112, sonic-net#2113, sonic-net#2117, sonic-net#2124, sonic-net#2125, sonic-net#2126, sonic-net#2128, sonic-net#2130, sonic-net#2131, sonic-net#2132, sonic-net#2133, sonic-net#2134, sonic-net#2135, sonic-net#2136, sonic-net#2137, sonic-net#2138, sonic-net#2139, sonic-net#2140, sonic-net#2143, sonic-net#2158, sonic-net#2161, sonic-net#2233, sonic-net#2243, sonic-net#2250, sonic-net#2254, sonic-net#2260, sonic-net#2261, sonic-net#2267, sonic-net#2278, sonic-net#2282, sonic-net#2285, sonic-net#2288, sonic-net#2289, sonic-net#2292, sonic-net#2294, sonic-net#8887, sonic-net#9279, sonic-net#9390, sonic-net#9511, sonic-net#9700, sonic-net#10025, sonic-net#10322, sonic-net#10479, sonic-net#10484, sonic-net#10493, sonic-net#10500, sonic-net#10580, sonic-net#10595, sonic-net#10628, sonic-net#10634, sonic-net#10635, sonic-net#10644, sonic-net#10670, sonic-net#10691, sonic-net#10716, sonic-net#10731, sonic-net#10750, sonic-net#10751, sonic-net#10752, sonic-net#10761, sonic-net#10769, sonic-net#10775, sonic-net#10776, sonic-net#10779, sonic-net#10786, sonic-net#10792, sonic-net#10793, sonic-net#10800, sonic-net#10806, sonic-net#10826, sonic-net#10839, sonic-net#10840, sonic-net#10842, sonic-net#10844, sonic-net#10847, sonic-net#10849, sonic-net#10852, sonic-net#10865, sonic-net#10872, sonic-net#10877, sonic-net#10886, sonic-net#10889, sonic-net#10903, sonic-net#10904, sonic-net#10905, sonic-net#10913, sonic-net#10914, sonic-net#10916, sonic-net#10919, sonic-net#10925, sonic-net#10926, sonic-net#10929, sonic-net#10933, sonic-net#10934, sonic-net#10937, sonic-net#10941, sonic-net#10947, sonic-net#10952, sonic-net#10953, sonic-net#10957, sonic-net#10959, sonic-net#10971, sonic-net#10972, sonic-net#10980
Why I did it
At present, there is no mechanism in an event driven model to know that the system is up with all the essential sonic services and also, all the docker apps are ready along with port ready status to start the network traffic. With the asynchronous architecture of SONiC, we will not be able to verify if the config has been applied all the way down to the HW. But we can get the closest up status of each app and arrive at the system readiness.
How I did it
A new python based system monitor tool is introduced under system-health framework to monitor all the essential system host services including docker wrapper services on an event based model and declare the system is ready. This framework gives provision for docker apps to notify its closest up status. CLIs are provided to fetch the current system status and also service running status and its app ready status along with failure reason if any.
How to verify it
"show system-health sysready-status" click CLI
Syslogs for system ready
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)