Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return Grpc.Internal error once there's socker read timeout issue occurs in streaming subscribe under routing flap case #110

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

FengPan-Frank
Copy link
Collaborator

@FengPan-Frank FengPan-Frank commented May 10, 2023

Currently streaming telemetry subscribe will hit some error under routing flap case if it subscribes APPL_DB ROUTE_TABLE, this change will return explicit error code to client without any handling.

pubsub notification to avoid inconsistency introduced by other concurrent cmd

Why I did it

When there's DB table flapping happens, like APPL_DB ROUTE_Table, current gnmi server has some performance issue which will result in below error
"read unix @->/var/run/redis/redis.sock: i/o timeout"

We should need to fix this systemically. Now the fix is to return explicit error code as Internal error to client, and later we will do some optimization, and add some retry strategy into the redis function call.

How I did it

Return Grpc.Internal error to the client

How to verify it

verified via gnmi_cli, meanwhile some tests which introduces bgp flapping.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@FengPan-Frank FengPan-Frank force-pushed the fenpan_pubsubfix branch 2 times, most recently from fc40d5d to cdfda65 Compare May 26, 2023 09:34
@FengPan-Frank FengPan-Frank changed the title Use transaction in db_client pubsub notification to avoid inconsistency introduced by other concurrent cmd Update db_client pubsub query condition to avoid inconsistency introduced by other concurrent cmd May 31, 2023
@FengPan-Frank FengPan-Frank changed the title Update db_client pubsub query condition to avoid inconsistency introduced by other concurrent cmd Update db_client pubsub query condition to avoid inconsistency introduced by other concurrent command May 31, 2023
@tomek-US
Copy link
Contributor

tomek-US commented Jun 1, 2023

Please rebase and add test coverage to unblock the merge.

@FengPan-Frank FengPan-Frank force-pushed the fenpan_pubsubfix branch 8 times, most recently from 01f6473 to 3a7c780 Compare June 21, 2023 07:35
@FengPan-Frank FengPan-Frank changed the title Update db_client pubsub query condition to avoid inconsistency introduced by other concurrent command Return Grpc.Internal error once there's socker read timeout issue occurs in streaming subscribe under routing flap case Jun 27, 2023
@FengPan-Frank FengPan-Frank force-pushed the fenpan_pubsubfix branch 2 times, most recently from 359fe0b to c45e54a Compare June 27, 2023 10:52
**Why I did it**
[Semgrep](https://github.com/returntocorp/semgrep) is a static analysis tool to find security vulnerabilities.
When opening a PR or commtting to PR, Semgrep performs a diff-aware scanning, which scans changed files in PRs.
When merging PR, Semgrep performs a full scan on master branch and report all findings.

Ref: - [Supported Language](https://semgrep.dev/docs/supported-languages/#language-maturity) - [Semgrep Rules](https://registry.semgrep.dev/rule)

**How I did it**
Integrate Semgrep into this repository by committing a job configuration file
@FengPan-Frank FengPan-Frank force-pushed the fenpan_pubsubfix branch 2 times, most recently from bfa44cb to 1475110 Compare June 27, 2023 11:19
@FengPan-Frank
Copy link
Collaborator Author

Please rebase and add test coverage to unblock the merge.

Done.

MockFail++
fmt.Printf("mock sleep for redis timeout\n")
time.Sleep(30 * time.Second)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use gomonkey to replace MockFail?

@@ -207,6 +208,10 @@ func (c *Client) Run(stream gnmipb.GNMI_SubscribeServer) (err error) {
c.Close()
// Wait until all child go routines exited
c.w.Wait()
if strings.Contains(err.Error(), "i/o timeout") {
Copy link
Collaborator

@qiluo-msft qiluo-msft Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contains

Match error message string will be fragile? Which library assign this error message? Could you use error code? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but seems this will require NOT small changes, like now we're using priorityQueue in all db_client which only stores error messages, refer

func putFatalMsg(q *queue.PriorityQueue, msg string) {
, do you suggest we polish this part now or later together with the project change?

@zbud-msft
Copy link
Contributor

Can you add steps to reproduce issue in description?

@@ -744,6 +745,12 @@ func tableData2Msi(tblPath *tablePath, useKey bool, op *string, msi *map[string]
return nil
}

if MockFail == 1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockFail

Should it only be used in unit test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants