-
Notifications
You must be signed in to change notification settings - Fork 3k
fix: JdbcCatalog jdbc.user is overwrite by user property #3078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ca582e9 to
ec28e44
Compare
|
good catch, LGTM |
|
I found the same problem. #3060 Thanks. |
| properties.forEach((key, value) -> dbProps.put(key.replace(JdbcCatalog.PROPERTY_PREFIX, ""), value)); | ||
| properties.forEach((key, value) -> { | ||
| if (key.startsWith(JdbcCatalog.PROPERTY_PREFIX)) { | ||
| String dbKey = key.replaceFirst("^" + JdbcCatalog.PROPERTY_PREFIX, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit / non-blocking: Can you possibly abstract this into a utility method in maybe a JdbcCatalogUtil class or something similar?
The need to iterate over properties, filter the ones that don't match the prefix, and then update the keys, is usually pretty common.
EDIT - Maybe a function in org.apache.iceberg.jdbc.JdbcUtil
public static Properties propertiesForCatalog(Properties properties, String catalogName) { ... }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I feel like we might need a check that the properties passed to the JdbcClientPool all belong to the correct jdbc catalog (should a user have two or more configured).
A function definition like this might suffice inside of JdbcUtil.
// We might need to check that its got the name in it too (e.g. that it doesn't belong to another catalog).
public static String jdbcPropertyPrefixForCatalog(String name) {
return String.format("%s"."%s", JdbcCatalog.PROPERTY_PREFIX, name);
}We have similar functions elsewhere. But possibly that's already been handled by this point in the code? Added tests would definitely sort this out 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could this be simplified to not use the regex? If not that's ok, just generally not a fan of regex if possible. I looked at some of the surrounding class and it does seem they use a regex, so maybe it's the string concatenation that's getting to me?
Also, do we need to strip more than just jdbc from the string at this point? And do we need to check that these properties are specifically for this JDBC catalog?
For example, if I were using spark-sql and I had two catalogs, spark.sql.catalog.jdbc_1 and spark.sql.catalog.jdbc_2, should we do a name check here or is that already handled? And is there any chance the regex will do something we don't intend for it to do if the catalog is named jdbc_foo? I'm guessing not because of replaceFirst.
Either way, this looks much better to me.
TLDR - Can you add test cases around configuring the right jdbc user as well as a test case with two jdbc catalogs with two different jdbc users- possibly in TestJdbcCatalog (or say, from Spark or from Flink if it's hard to test from there for some reason)?
@dungdm93 once this is merged in, would you mind updating that guide in a follow up PR? |
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me. Thanks for this @dungdm93!
I'd like it if we could add a test for catching the properties, specifically user, so it's not overwritten.
Additionally, if we could test with two different JDBC catalogs configured, like foo and bar, that would be great. Want to make sure that the regex on key.replaceFirst("^jdbc", "") is sufficient. And it would be great to make that into a utility function, as I see a few other places where we use such a regex or call to replace over JdbcCatalog.PROPERTY_PREFIX.
So I'm +0, and will be +1 once the tests are added (or if there's a good reason to not add them, I'm open to that too 🙂).
b93731e to
f81509e
Compare
|
Hello @kbendick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in review. Was out sick the past week or so.
Can you add a test specifically using the JDBC catalog for the situation that this covers? E.g. that only catalog properties that start with jdbc.* are passed to the JDBC catalog or that the user property isn't overridden by some external source (I think maybe it's coming from the Java System properties in the two reported cases - in your screenshot I'm guessing that dungdm93 is a system property on your computer being passed to the JVM)?
You can do it with another test on user being set somewhere that would have previously overridden it - using System.setProperty("user", "incorrectUser") if that's where the wrong user is coming from... just be sure to unset that System property at the end of the test!
|
|
||
| public static double propertyAsDouble(Map<String, String> properties, | ||
| String property, double defaultValue) { | ||
| String property, double defaultValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - Unnecessary change in this PR. Ideally, we keep every line change specific to this one PR so that groups that might maintain a fork have an easier time cherry-picking what they need etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because of my IDE (IntelliJ). I changed it back.
Could you take a look
|
Running CI. |
|
@dungdm93 we enabled CI, but it seems like there's a style error. Could you rebase off of the latest master (so as to pick up changes to the test suite that were causing timeouts and to pick up reduced logging verbosity)? You can run the following, assuming that the upstream repo (the Apache repo) has been set and named as $ git checkout master
$ git pull upstream master --rebase
$ git checkout jdbc-catalog-user
$ git rebase -i master # Then type `:wq` to keep all your commit messages, but rebase onto the updated master branch
$ git push --force |
| return defaultValue; | ||
| } | ||
|
|
||
| public static Properties filterAndRemovePrefix(Map<String, String> properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbendick I know you suggested the refactoring, but it's unclear to me how this util method can be used by others because of the return type Properties. I personally think JdbcUtil is a better place to put this, or making it a private method inside JdbcCatalog is also good enough. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be fine.
In retrospect, I had envisioned it maybe being a bit different, e.g. returning Map<String, String> as we do filter some maps based on prefix key and then strip off the prefix (e.g. in initialize calls etc). For example, we do that with Spark quite often, both when initializeing the Catalog as well as merging in the spark.sql.catalog.$(catalogName).hadoop.* fields.
But you're right, it was totally my suggestion to have it return Properties and it's not that portable that way. I apologize for that extra work @dungdm93
So I can agree that JdbcUtil would probably be just as good of a place for this. I'd be fine with moving to JdbcUtil or as a private method as mentioned. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in PropertyUtil because I think this func can be used elsewhere, not just JDBC catalog.
If need to change the return type to Map<String, String>, just let me known.
So, @jackye1995, @kbendick what is your final decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with Jack that until we actually go to reuse this code, it might be best to place it in JdbcUtil for now @dungdm93.
Possibly as a follow up PR we could move it to be more generic and also update the various places that have similar functionality at the same time.
But for now, I think it's probably best to keep it in JdbcUtil so we can get this PR in to fix this bug. 🙂
|
If you can fix the style check and then move the utility function back into Thank you so much for the work you've put into this so far! |
64db6d6 to
bc1160a
Compare
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the ability to restart the tests at present, but in the interest of getting this in quickly, it's probably best to just keep this PR focused on the Jdbc stuff.
In a follow up PR, if you can make this work with other places that follow a similar pattern, I'll be one of the first people to help review. Plus, once this is merged, the tests will just run if you submit another patch. 🙂
| return defaultValue; | ||
| } | ||
|
|
||
| public static Properties filterAndRemovePrefix(Map<String, String> properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with Jack that until we actually go to reuse this code, it might be best to place it in JdbcUtil for now @dungdm93.
Possibly as a follow up PR we could move it to be more generic and also update the various places that have similar functionality at the same time.
But for now, I think it's probably best to keep it in JdbcUtil so we can get this PR in to fix this bug. 🙂
| /* | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the checkStyle is upset as the license header doesn't match what's used elsewhere.
It should be
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
bc1160a to
86158c1
Compare
|
@kbendick agree with you. /cc @jackye1995 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me, will merge after Kyle and CI approves
|
Since it's been a few days and both Jack and I are +1, I'm going to go ahead and merge this. We can fix anything else that Kyle finds in a follow-up. |
|
Thanks for contributing this fix, @dungdm93! |
I just try JDBC Catalog in my local machine following this guide, but I got the following exception:
$ spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.12.0,org.postgresql:postgresql:42.2.23 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.iceberg.warehouse=/tmp/warehouse \ --conf spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.jdbc.JdbcCatalog \ --conf spark.sql.catalog.iceberg.uri=jdbc:postgresql://postgres:5432/iceberg \ --conf spark.sql.catalog.iceberg.jdbc.user=postgres \ --conf spark.sql.catalog.iceberg.jdbc.password=SuperSecr3t spark-sql> CREATE TABLE iceberg.default.student (id bigint, name string) USING iceberg; 21/09/05 08:46:37 ERROR SparkSQLDriver: Failed in [CREATE TABLE iceberg.default.student (id bigint, name string) USING iceberg] org.apache.iceberg.jdbc.UncheckedSQLException: Failed to connect: jdbc:postgresql://postgres:5432/iceberg at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:54) at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:31) ... Caused by: org.postgresql.util.PSQLException: FATAL: password authentication failed for user "dungdm93" at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:613) at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:161) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223) at org.postgresql.Driver.makeConnection(Driver.java:465) at org.postgresql.Driver.connect(Driver.java:264) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.apache.iceberg.jdbc.JdbcClientPool.newClient(JdbcClientPool.java:52)After some investigation, I found the reason is because

jdbc.userproperty is overwrite byuserproperty