Skip to content

Extend ledger request timeout#1390

Merged
tsachiherman merged 11 commits intoalgorand:masterfrom
tsachiherman:tsachi/extend_ledger_request_write_timeout
Aug 13, 2020
Merged

Extend ledger request timeout#1390
tsachiherman merged 11 commits intoalgorand:masterfrom
tsachiherman:tsachi/extend_ledger_request_write_timeout

Conversation

@tsachiherman
Copy link
Copy Markdown
Contributor

@tsachiherman tsachiherman commented Aug 12, 2020

Summary

The recent increase in ledger size require the server side handler to allow longer timeouts.
This became a bigger issue that I was original hoping to deal with : The timeout configuration for the issue is the "WriteTimeout", which is defined once you start the HTTP server.

That, naturally, doesn't give us the required resolution needed in order to associate a specific request handler with a given timeout. In order to achieve that, I extended the request tracker so that it would keep track of all incoming connections, even across HTTP requests ( i.e. http requests might use http 1.1 and re-use the connection ).

From that point it was quite straight-forward : expose the entry point from the GossipNode interface and update the timeout on the handler.

Test Plan

Tested manually.

Comment thread rpcs/ledgerService.go
return
}
if conn := ls.net.GetHTTPRequestConnection(request); conn != nil {
conn.SetWriteDeadline(time.Now().Add(maxCatchpointFileWritingDuration))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would figure out what the file size is, and have a custom timeout per file. For now, I'm leaving it with a higher max value as I don't want to go into the rabbit hole of retrieving the file size.

Copy link
Copy Markdown
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review in progress....
Some comments.

Comment thread network/wsNetwork.go Outdated
Comment thread rpcs/ledgerService.go Outdated
Comment thread network/requestTracker.go
requests []*TrackerRequest // this is an ordered list, according to the requestsHistory.created
remoteHost string
requests []*TrackerRequest // this is an ordered list, according to the requestsHistory.created
additionalHostRequests map[*TrackerRequest]struct{} // additional requests that aren't included in the "requests", and always assumed to be "alive".
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name prefix "additional" is not informative. Can you please add more information and preferably rename it.
Also, what will the map point to? What will struct{} be?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the map is to track the count of entries for that tracker, while providing easy ( and efficient ) way to delete them.

@tsachizehub tsachizehub added this to the Sprint 7 milestone Aug 13, 2020
Comment thread network/requestTracker.go
func (ard *hostIncomingRequests) countConnections(rateLimitingWindowStartTime time.Time) (count uint) {
i := ard.findTimestampIndex(rateLimitingWindowStartTime)
return uint(len(ard.requests) - i)
return uint(len(ard.requests) - i + len(ard.additionalHostRequests))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not checking if the connections in additionalHostRequests are after rateLimitingWindowStartTime.
What is the catch? Maybe the comment should change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the connections in ard.additionalHostRequests are such that have already reached the http handler.
As such, these will get deleted only when the connection.Close() is called.
The tracking of the last-30-seconds window doesn't apply for these since these might be a long-living connections.

The purpose of the 30-seconds window was to track incoming connections that never made it to the http handler. Once they did, we don't really care about their "window".

@brianolson
Copy link
Copy Markdown
Contributor

I'm rusty looking at network code and this is extra twisty because of the request tracker, but I didn't see anything wrong with it.

Comment thread network/requestTracker.go
}

trackerRequest := makeTrackerRequest(conn.RemoteAddr().String(), "", "", time.Now())
trackerRequest := makeTrackerRequest(conn.RemoteAddr().String(), "", "", time.Now(), conn)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conn is being passed here, but isn't this nil at this point?
It gets initialized further down:

  conn = &requestTrackedConnection{Conn: conn, tracker: rt}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. I missed the initialization above.

Copy link
Copy Markdown
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Thanks for the explanations.

@tsachiherman tsachiherman merged commit d8e3983 into algorand:master Aug 13, 2020
@tsachiherman tsachiherman deleted the tsachi/extend_ledger_request_write_timeout branch August 13, 2020 21:29
tsachiherman added a commit to tsachiherman/go-algorand that referenced this pull request Jul 7, 2021
Increase write timeout on ledger requests.
cce added a commit to cce/go-algorand that referenced this pull request Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants