-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return Grpc.Internal error once there's socker read timeout issue occurs in streaming subscribe under routing flap case #110
base: master
Are you sure you want to change the base?
Conversation
fc40d5d
to
cdfda65
Compare
Please rebase and add test coverage to unblock the merge. |
01f6473
to
3a7c780
Compare
3a7c780
to
cd34766
Compare
359fe0b
to
c45e54a
Compare
**Why I did it** [Semgrep](https://github.com/returntocorp/semgrep) is a static analysis tool to find security vulnerabilities. When opening a PR or commtting to PR, Semgrep performs a diff-aware scanning, which scans changed files in PRs. When merging PR, Semgrep performs a full scan on master branch and report all findings. Ref: - [Supported Language](https://semgrep.dev/docs/supported-languages/#language-maturity) - [Semgrep Rules](https://registry.semgrep.dev/rule) **How I did it** Integrate Semgrep into this repository by committing a job configuration file
bfa44cb
to
1475110
Compare
…into fenpan_pubsubfix
1475110
to
3286382
Compare
Done. |
MockFail++ | ||
fmt.Printf("mock sleep for redis timeout\n") | ||
time.Sleep(30 * time.Second) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use gomonkey to replace MockFail?
@@ -207,6 +208,10 @@ func (c *Client) Run(stream gnmipb.GNMI_SubscribeServer) (err error) { | |||
c.Close() | |||
// Wait until all child go routines exited | |||
c.w.Wait() | |||
if strings.Contains(err.Error(), "i/o timeout") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but seems this will require NOT small changes, like now we're using priorityQueue in all db_client which only stores error messages, refer
sonic-gnmi/sonic_data_client/db_client.go
Line 833 in 01fe667
func putFatalMsg(q *queue.PriorityQueue, msg string) { |
Can you add steps to reproduce issue in description? |
@@ -744,6 +745,12 @@ func tableData2Msi(tblPath *tablePath, useKey bool, op *string, msi *map[string] | |||
return nil | |||
} | |||
|
|||
if MockFail == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently streaming telemetry subscribe will hit some error under routing flap case if it subscribes APPL_DB ROUTE_TABLE, this change will return explicit error code to client without any handling.
pubsub notification to avoid inconsistency introduced by other concurrent cmd
Why I did it
When there's DB table flapping happens, like APPL_DB ROUTE_Table, current gnmi server has some performance issue which will result in below error
"read unix @->/var/run/redis/redis.sock: i/o timeout"
We should need to fix this systemically. Now the fix is to return explicit error code as Internal error to client, and later we will do some optimization, and add some retry strategy into the redis function call.
How I did it
Return Grpc.Internal error to the client
How to verify it
verified via gnmi_cli, meanwhile some tests which introduces bgp flapping.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)