Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry process can not start if SONiC is installed from ONIE in 202012 #12499

Closed
stephenxs opened this issue Oct 26, 2022 · 5 comments
Closed
Assignees
Labels
Issue for 202211 MSFT P1 Priority of the issue, lower than P0 Triaged this issue has been triaged

Comments

@stephenxs
Copy link
Collaborator

Description

Steps to reproduce the issue:

  1. Install a 202012 based image from ONIE
  2. Boot the image

Describe the results you received:

The telemetry process does not start due to the following error

could not load server key pair: open /etc/sonic/telemetry/streamingtelemetryserver.cer: no such file or directory

Describe the results you expected:

The telemetry should start without error

Output of show version:

(paste your output here)
Built by: sw-r2d2-bot@r-build-sonic-ci03-243

Platform: x86_64-mlnx_msn2700_simx-r0
HwSKU: ACS-MSN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1623X09522
Model Number: MSN2700-CS2FO
Hardware Revision: N/A
Uptime: 04:48:57 up 5 days, 11:28,  1 user,  load average: 0.30, 0.26, 0.27
Date: Wed 26 Oct 2022 04:48:57

Docker images:
REPOSITORY                                         TAG                            IMAGE ID       SIZE
docker-orchagent                                   202205.45-1c14e7185_Internal   0b8792c287ab   478MB
docker-orchagent                                   latest                         0b8792c287ab   478MB
docker-fpm-frr                                     202205.45-1c14e7185_Internal   f001b423d6b9   489MB
docker-fpm-frr                                     latest                         f001b423d6b9   489MB
docker-platform-monitor                            202205.45-1c14e7185_Internal   adb6b0a0ddb7   862MB
docker-platform-monitor                            latest                         adb6b0a0ddb7   862MB
docker-teamd                                       202205.45-1c14e7185_Internal   3a997a582743   459MB
docker-teamd                                       latest                         3a997a582743   459MB
docker-macsec                                      latest                         a8456f85afb9   461MB
docker-syncd-mlnx                                  202205.45-1c14e7185_Internal   0f05778fecc9   859MB
docker-syncd-mlnx                                  latest                         0f05778fecc9   859MB
docker-snmp                                        202205.45-1c14e7185_Internal   c528c18404fa   488MB
docker-snmp                                        latest                         c528c18404fa   488MB
docker-dhcp-relay                                  latest                         2050939b9fcb   453MB
docker-sonic-telemetry                             202205.45-1c14e7185_Internal   d1904a2ec9cf   524MB
docker-sonic-telemetry                             latest                         d1904a2ec9cf   524MB
docker-lldp                                        202205.45-1c14e7185_Internal   51dd67161dbf   485MB
docker-lldp                                        latest                         51dd67161dbf   485MB
docker-database                                    202205.45-1c14e7185_Internal   e430ac8ecce6   443MB
docker-database                                    latest                         e430ac8ecce6   443MB
docker-mux                                         202205.45-1c14e7185_Internal   f0eb633f46db   492MB
docker-mux                                         latest                         f0eb633f46db   492MB
docker-router-advertiser                           202205.45-1c14e7185_Internal   ac0e9ce0f056   443MB
docker-router-advertiser                           latest                         ac0e9ce0f056   443MB
docker-nat                                         202205.45-1c14e7185_Internal   f2b951758ed8   444MB
docker-nat                                         latest                         f2b951758ed8   444MB
docker-sflow                                       202205.45-1c14e7185_Internal   de7b746195ad   442MB
docker-sflow                                       latest                         de7b746195ad   442MB
docker-sonic-mgmt-framework                        202205.45-1c14e7185_Internal   05e46479966b   571MB
docker-sonic-mgmt-framework                        latest                         05e46479966b   571MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.3.0-202205-22                8acaae084720   643MB

Output of show techsupport:

(paste your output here or download and attach the file here )
[sysdump_sonic_dump_r-panther-13_20220929_003540.tar.gz](https://github.com/sonic-net/sonic-buildimage/files/9865404/sysdump_sonic_dump_r-panther-13_20220929_003540.tar.gz)

Additional information you deem important (e.g. issue happens only occasionally):

It is relevant to PR sonic-net/sonic-utilities#2277 in which the entry TELEMETRY|certs is forcibly inserted into CONFIG_DB in db_migrator which is also invoked during configuration initialization. But the certifications won't be generated automatically. As a result, the telemetry process won't start.
Just want to double-check whether it is the expected behavior

@stephenxs stephenxs changed the title telemetry process can not start if SONiC is installed from ONIE telemetry process can not start if SONiC is installed from ONIE in 202012 Oct 26, 2022
@judyjoseph judyjoseph added Triaged this issue has been triaged MSFT P1 Priority of the issue, lower than P0 labels Nov 23, 2022
@ayurkiv-nvda
Copy link
Contributor

ayurkiv-nvda commented May 15, 2023

If there was no telemetry/ folder under /etc/sonic/ it means that image was not deployed after installation and certificates were not generated (probably missed configuration steps))

If there are no certificates, image behavior will be the following:

  • telemetry docker will crash a few times, and after each crash after 30 seconds, it will try to restart
  • after a few crashes, telemetry docker will stop the restart cycle and remains in DOWN state

If telemetry/ folder exists, there is no problem at all.

@ayurkiv-nvda
Copy link
Contributor

Hello @vaibhavhd
Did you have a chance to take a look?

@vaibhavhd
Copy link
Contributor

There has been an improvement to migrating telemetry (and other tables): sonic-net/sonic-utilities#2887

As part of the improvement, tables and their data is not hardcoded in db_migrator. Instead, migrator calls minigraph parser which uses minigraph.xml.
So:
if SONiC is installed from ONIE and minigraph.xml is missing -- TELEMETRY config will not be loaded.
if SONiC is installed from ONIE and minigraph.xml is present -- TELEMETRY config will be loaded as minigraph parser generates this config. In this scenario if the config is migrated but certs are not present then telemetry process is expected to crash until user puts the certs.

Let me know if this answers your concerns.

@volodymyrsamotiy
Copy link
Collaborator

@stephenxs, according to above comments it is expected behavior, could you please check and help to close the ticket if you agree?

@stephenxs
Copy link
Collaborator Author

@stephenxs, according to above comments it is expected behavior, could you please check and help to close the ticket if you agree?

Yes, I think the issue can be closed with the solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202211 MSFT P1 Priority of the issue, lower than P0 Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

5 participants