-
Notifications
You must be signed in to change notification settings - Fork 181
[dataquery] Optimize filtering of valid candidates for large datasets #9344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3450f73
to
e9e2bcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, should the new db connection needs to be in lorisintance since it is only used in dataquery?
* | ||
* @return \Database | ||
*/ | ||
public function getNewDatabaseConnection() : \Database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function useful in all places that use lorisinstance object?
Shouldn't it be placed only in the dataquery module query.class.inc
because that is the only one using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful any place that calls setBuffering(false)
and then iterates over a query result because the connection can't be re-used until the iteration is finished.
It's the only one now, but there will be others in other PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ie. #9347 is another example that will use it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes, thanks. I will make a quick test today ideally.
This optimizes the filtering of inaccessible candidates, building on PR#9334 Previously, the entire query was loaded into memory, and then manipulated in a way that made it easier to see which candidates are valid, then returned as an array by putting the valid ones in a temporary table and selecting it. With this change, instead of an array, a generator is used so that only one candidate (and its sessions) need to be loaded into memory at a time. This is done on an unbuffered connection, which necessitates a different database connection since only one query can be run at a time on a connection and the caller may make other queries before the generator finished.
a3562a7
to
4d54282
Compare
@driusan tried several queries but I do not have something big enough to make a thorough testing. I think it is good noneless. |
I think the important thing is that you tested the queries. I've tested the performance on large datasets (albiet in combination with other optimizations..) |
…aces#9344) This optimizes the filtering of inaccessible candidates, building on PR#9334 Previously, the entire query was loaded into memory, and then manipulated in a way that made it easier to see which candidates are valid, then returned as an array by putting the valid ones in a temporary table and selecting from it. With this change, instead of an array, a generator is used so that only one candidate (and its sessions) need to be loaded into memory at a time. This is done on an unbuffered connection, which necessitates a different database connection since only one query can be run at a time on a connection and the caller may make other queries before the generator finished. This result should be significantly less memory required for filtering candidates.
This updates DBRowProvisioner to remove the strval (and fixes #9335). After doing that, it became clear that the inner class could be replaced by a generator while still fulfilling the contract of returning a `\Traversable`. Doing this made it more obvious that since #9334, pselect could be used directly while maintaining the lazy evaluation instead of bypassing the Database object to use the PDO directly. However this required injecting a `LorisInstance` object to the constructor to get the database connection. With access to the `LorisInstance` object, the `DBRowProvisioner` can also get a new database connection and disable query buffering to lazily retrieve rows since PR #9344. This may theoretically improve performance and memory usage on large provisioners but has not been tested. (Performance did not decrease in raisinbread but it is not a large dataset where it would be noticeable.)
This optimizes the filtering of inaccessible candidates, building on PR#9334
Previously, the entire query was loaded into memory, and then manipulated in a way that made it easier to see which candidates are valid, then returned as an array by putting the valid ones in a temporary table and selecting from it.
With this change, instead of an array, a generator is used so that only one candidate (and its sessions) need to be loaded into memory at a time. This is done on an unbuffered connection, which necessitates a different database connection since only one query can be run at a time on a connection and the caller may make other queries before the generator finished.
This result should be significantly less memory required for filtering candidates.