-
Notifications
You must be signed in to change notification settings - Fork 347
[JDBC] Add retries with delay #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9d48b39 to
d2065d8
Compare
...va/org/apache/polaris/extension/persistence/relational/jdbc/RelationalJdbcConfiguration.java
Outdated
Show resolved
Hide resolved
.../persistence/impl/relational/jdbc/AtomicMetastoreManagerWithJdbcBasePersistenceImplTest.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
|
@singhpk234 I think the code should be implemented using SmallRye That being said, it would require some logic around the driver
Benefits from doing so include less code to maintain, but also metrics on the amount of retries. Note that I am only |
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
|
Thank you @pingtimeout, Really appreciate your inputs. are you suggesting using this module from quarkus https://quarkus.io/guides/smallrye-fault-tolerance
I agree with this in principal, i did evaluate options like FailSafe lib also but we don't want very sophisticated retry logic, its simple one we need with simple knobs if we want like Adaptive Retries or like callbacks i would considered using lib, but our use-case it seemed like an over kill, as bringing new lib will bring lib / notice, cve etc, hence refrained from it only for this use case. That being said, if you strongly feel about this i am happy to re evaluate and find the best path forward ! |
|
@singhpk234 no problem, that was just a suggestion, not a requirement :-). That being said, could you add a javadoc or a TODO around the |
Could you add more details what the description represents and a summary of the actual changes. Two Gatling summaries w/o context do not tell anything. |
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retry-loops, especially fair retries, are very tricky. What I'm missing in this PR are concurrency-tests and specific tests vetting the retry loop itself.
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...va/org/apache/polaris/extension/persistence/relational/jdbc/RelationalJdbcConfiguration.java
Outdated
Show resolved
Hide resolved
.../persistence/impl/relational/jdbc/AtomicMetastoreManagerWithJdbcBasePersistenceImplTest.java
Outdated
Show resolved
Hide resolved
snazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests look appropriate now from a brief look.
quarkus/common/build.gradle.kts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is
id("polaris-quarkus")
for this?
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Please allow one more day for other people to comment before merging.
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatasourceOperations.java
Outdated
Show resolved
Hide resolved
738a4ac to
51e1cde
Compare
51e1cde to
0f5cc33
Compare
|
I re-ran polaris benchmarks (with the setting @pingtimeout used https://docs.google.com/document/d/1RLYaAtNUkgNW3Ef7-BWfF_8RkSK7B7oR/edit#bookmark=id.von5ayuoga6), happy to share that we now have 100% success with this. |
| - POLARIS_PERSISTENCE_RELATIONAL_JDBC_MAX_RETRIES=5 | ||
| - POLARIS_PERSISTENCE_RELATIONAL_JDBC_INITIAL_DELAY_IN_MS=100 | ||
| - POLARIS_PERSISTENCE_RELATIONAL_JDBC_MAX_DELAY_IN_MS=5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't defaults work for "getting started"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do ! these are just for demo purpose !
Last merged commit f2d4b55 The NoSQL-parts contains changes to align the nosql-specific configuration keys + values as follows. No immediate action needed for Dremio, as `org.apache.polaris.persistence.nosql.quarkus.config.ConfigBackwardsCompat` takes care of relocating old config keys + values. ``` polaris.persistence.type=persistence polaris.backend.name=... ``` to ``` polaris.persistence.type=noqsl polaris.persistence.backend.type=... ``` * main: Update dependency io.micrometer:micrometer-bom to v1.14.7 (apache#1570) * JDBC: Simplify JDBC entity conversion (apache#1564) * fix(Catalog): Add List PolarisStorageAction for all metadata read operations (apache#1391) * fix(Catalog): Add List PolarisStorageAction for all metadata read operations * Site : Update cloud providers quickstart to use (apache#1554) * [JDBC] Add retries with delay (apache#1517) [JDBC] Add retries with delay This change adds retries in the JDBC persistence layer, these retries are with jitter and are tunable in the following ways : a. max_retries : Total number of retries we expect the persistence to do on Connection Reset exception and serializable error exceptions, before giving up. b. max_duaration_in_ms : Time in ms since the first attempt this retries should be done. For ex on configured 500 ms the total time spent in retrying should not exceed 500ms (optimistically) c. initial_delay_in_ms : initial delay before the first attempt * [Docs] Add JDBC retry properties (apache#1550) * Use env var in spark container (apache#1522) * added Signed-off-by: owenowenisme <[email protected]> * fix Signed-off-by: owenowenisme <[email protected]> * add export Signed-off-by: owenowenisme <[email protected]> * update docs using .env Signed-off-by: owenowenisme <[email protected]> * update docs Signed-off-by: owenowenisme <[email protected]> * change back from using .env to export Signed-off-by: owenowenisme <[email protected]> * Apply suggestions from code review Co-authored-by: Adnan Hemani <[email protected]> --------- Signed-off-by: owenowenisme <[email protected]> Co-authored-by: Adnan Hemani <[email protected]> * Migrate catalog configs to the new reserved prefix (apache#1557) * rewrite * rewrite * stable * changes per comments * Remove unused javadoc parameter in BasePersistence (apache#1580) * Site: Publish table maintenance policies (apache#1581) * Add schema symlinks to static site directory Co-authored-by: Yufei Gu <yufei.apache.org> * Remove `defaults` / `overrides` from feature configurations (apache#1572) * double WithParentName * autolint * Revert some * autolint * add to BCconfigs * autolint * yank * copy yuns test * autolint * remove defaults * repair test * autolint * stablize test * stable * autolint * configmap change * copypaste * regen helm docs * autolint * no dots in props * remove accidental file * small changes per review * clean out defaults * BCC fix * autolint * typefix * autolint * main: Update dependency io.prometheus:prometheus-metrics-exporter-servlet-jakarta to v1.3.7 (apache#1578) * main: Update dependency io.micrometer:micrometer-bom to v1.15.0 (apache#1575) * main: Update dependency io.projectreactor.netty:reactor-netty-http to v1.2.6 (apache#1577) * main: Update dependency boto3 to v1.38.15 (apache#1574) * NoSQL: BREAKING: Change NoSQL configuration options Requires changing the Quarkus configuration(s) from ``` polaris.persistence.type=persistence polaris.backend.name=... ``` to ``` polaris.persistence.type=noqsl polaris.persistence.backend.type=... ``` * DREMIO: Use 'nosql' persistence by default * DREMIO: backwards-compatible configuations * Revert "DREMIO: Use 'nosql' persistence by default" This reverts commit ccba4976b5398511c1c987a242ac11517278c700. Causes test failures :( * damn defaults refactoring 😡 * try fix helm --------- Signed-off-by: owenowenisme <[email protected]> Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: Yufei Gu <[email protected]> Co-authored-by: fivetran-ashokborra <[email protected]> Co-authored-by: Prashant Singh <[email protected]> Co-authored-by: Owen Lin (You-Cheng Lin) <[email protected]> Co-authored-by: Adnan Hemani <[email protected]> Co-authored-by: Eric Maynard <[email protected]> Co-authored-by: Dmitri Bourlatchkov <[email protected]>
About the change
This change attempts to add retries in the JDBC persistence layer, these retries with jitter are tunable in the following ways :
a. max_retries : total number of retries we expect the persistence to do before giving up.
b. max_duaration_in_ms : time in ms since the first attempt this retries should be done. For ex on configured 500 ms the total time spent in retrying should not exceed 500ms (optimistically)
c. initial_delay_in_ms : intial delay before the first attempt
As a result of which we see the following improvements, with the setup of getting-started guide
Make 100% of the call pass
Increased the TPS to 125