-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector #28215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
|
|
||
| @DockerTest | ||
| @Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant change. Since docker tests are not integrated into jenkins we can turn this on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing that other test suites for other DBMS don't have this, so good to remove to make it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really following the discussion; are you guys saying this line should be removed? Because there's nothing changing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as other tests don't have this. It's not a kind of "should be", but "can be".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it's removed in code diff as of now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it removed. It's still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I should be more clearer - the change is removed, in other words, rolled back. No change.
| /** | ||
| * Parameter whether the container should run privileged. | ||
| */ | ||
| val privileged: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB2 docker requires privileged run.
| val conn = java.sql.DriverManager.getConnection(jdbcUrl) | ||
| conn.close() | ||
| var conn: Connection = null | ||
| eventually(timeout(2.minutes), interval(1.second)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single connection simplification + timeout increase.
|
|
||
| override def dataPreparation(conn: Connection): Unit = { | ||
| conn.prepareStatement("CREATE TABLE bar (c0 text)").executeUpdate() | ||
| conn.prepareStatement("CREATE TABLE bar (c0 VARCHAR(8))").executeUpdate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB2 doesn't support text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be surprised if this change affects others, but it may be worth to test others manually and mention the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing modifications I'm always re-executing all of them. This has happened here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And not to avoid the question all passed :)
| USERPROFILE=/database/config/db2inst1/sqllib/userprofile | ||
| echo "export DB2_KRB5_PRINCIPAL=db2/__IP_ADDRESS_REPLACE_ME__@EXAMPLE.COM" >> $USERPROFILE | ||
| echo "export KRB5_KTNAME=/var/custom/db2.keytab" >> $USERPROFILE | ||
| su - db2inst1 -c "db2set DB2ENVLIST=KRB5_KTNAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trick is needed because DB2 forwards environment variables automatically only if it's starting with DB2 (KRB5_KTNAME doesn't fit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add this as "comment" to reduce the hops to finally reach this comment on understanding this trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, added.
|
cc @HeartSaVioR |
|
Test build #121279 has finished for PR 28215 at commit
|
|
Looks unrelated. |
|
Retest this please. |
|
Test build #121287 has finished for PR 28215 at commit
|
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks great in overall. As I commented in earlier PRs, still not be able to run the tests (and it's tricky to modify my local env. only for running tests), so just assumed you've run the tests manually.
| USERPROFILE=/database/config/db2inst1/sqllib/userprofile | ||
| echo "export DB2_KRB5_PRINCIPAL=db2/__IP_ADDRESS_REPLACE_ME__@EXAMPLE.COM" >> $USERPROFILE | ||
| echo "export KRB5_KTNAME=/var/custom/db2.keytab" >> $USERPROFILE | ||
| su - db2inst1 -c "db2set DB2ENVLIST=KRB5_KTNAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add this as "comment" to reduce the hops to finally reach this comment on understanding this trick.
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...scala/org/apache/spark/sql/execution/datasources/jdbc/connection/DB2ConnectionProvider.scala
Show resolved
Hide resolved
|
|
||
| override def dataPreparation(conn: Connection): Unit = { | ||
| conn.prepareStatement("CREATE TABLE bar (c0 text)").executeUpdate() | ||
| conn.prepareStatement("CREATE TABLE bar (c0 VARCHAR(8))").executeUpdate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be surprised if this change affects others, but it may be worth to test others manually and mention the result.
|
Re-executed the docker tests and passed. |
|
Test build #121369 has finished for PR 28215 at commit
|
|
retest this, please |
|
Test build #121374 has finished for PR 28215 at commit
|
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming manual tests were passed
|
cc @dongjoon-hyun @vanzin since you know this area well |
|
Hi, @gaborgsomogyi . It would be great if you proceed |
| @Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker | ||
| class DB2IntegrationSuite extends DockerJDBCIntegrationSuite { | ||
| override val db = new DatabaseOnDocker { | ||
| override val imageName = "lresende/db2express-c:10.5.0.5-3.10.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this image to the official one like ibmcom/db2:11.5.0.0a. It would be great if we use the same DB2 version in both DB2IntegrationSuite and DB2KrbIntegrationSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above comment is for the new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worth a separate PR because it's a non-trivial change.
|
Test build #121526 has finished for PR 28215 at commit
|
|
The check failure is valid, it's failing to compile in that environment. |
|
Oh, nice catch. That compilation error seems to only come with JDK11. That said, Spark PR builder may need to run at least two different environments; that's lucky we found the issue from Github Action, but it has been the source of false alarm hence not considered seriously in many case. E.g. we may not able to find this if we see failure in Linter build side on Github Action, as building Spark will be cancelled if Linter build fails. |
|
Ah gosh! Fixed it. |
|
Test build #121616 has finished for PR 28215 at commit
|
|
Merging to master. |
|
Is there additional required for this? All the other test (including new DB2IntegreationSuite in #28325) passed, but this one fail. ...
DB2KrbIntegrationSuite:
org.apache.spark.sql.jdbc.DB2KrbIntegrationSuite *** ABORTED ***
Exception encountered when invoking run on a nested suite - The code passed to eventually never returned normally. Attempted 128 times over 2.016876443166667 minutes. Last failure message: Login failure for db2/10.0.0.6@EXAMPLE.COM from keytab /Users/dongjoon/PRS/SPARK-PR-28325/external/docker-integration-tests/target/tmp/spark-35512019-33ef-4cad-a54a-2bec69a3d4c2/db2.keytab. (DockerJDBCIntegrationSuite.scala:158)
...
Run completed in 4 minutes, 27 seconds.
Total number of tests run: 42
Suites: completed 8, aborted 1
Tests: succeeded 42, failed 0, canceled 0, ignored 0, pending 0
*** 1 SUITE ABORTED *** |
… the DB2 docker inside ### What changes were proposed in this pull request? This is a followup PR discussed [here](#28215 (comment)). ### Why are the changes needed? It would be good to re-enable `DB2IntegrationSuite` and upgrade the docker image inside to use the latest. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing docker integration tests. Closes #28325 from gaborgsomogyi/SPARK-31533. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Hi, Guys. How can we handle this commit? |
|
Is it fine to check it on Monday as a start? |
|
Logs would be helpful from test side and from docker instance side to see what have gone wrong. |
|
Sure, no rush for this because this is an integration test we will not revert this urgently. I just wondered this IT was tested or not when this PR was merged. |
|
I'm re-executing them after each change because it's only couple of minutes. I've just tried it out and works on my machine. On Monday I'll put it into a loop but logs would speed up the process because such case we dont have to spend time on reproduction. |
|
Got it. Could you share your work environment? Then, I can try your way if possible. I tried For the record, in two machines, this test suite consecutively fails several times (I also increased the resources up to 10 cores and 22GB memory) and never succeeds in this environment. |
|
Hi, @gatorsmile and @maropu . |
| override protected def setAuthentication(keytabFile: String, principal: String): Unit = { | ||
| val config = new SecureConnectionProvider.JDBCConfiguration( | ||
| Configuration.getConfiguration, "JaasClient", keytabFile, principal) | ||
| Configuration.setConfiguration(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe when scanning tables in different secure databases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I'll create a separate jira to handle config synchronisation globally...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark can scan different JDBC relations concurrently though, could we synchronized them easily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution shouldn't be complicated but it effects all other providers which change the configuration (not just DB2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've filed https://issues.apache.org/jira/browse/SPARK-31575 to handle the issue.
|
@dongjoon-hyun here is my environment (just checked the latest master and still works):
|
|
That said maybe my environment is just a lucky one and would be good to take a look at the logs from test side and from container side as well. |
|
I run it by myself and I checked it passed on my MacOS (Sierra, Docker Desktop community v2.3.0.0). |
|
Thank you, @maropu ! |
|
Thank you, @gaborgsomogyi . |
What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.
This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.
In this PR I've added DB2 support (other supported databases will come in later PRs).
What this PR contains:
DB2ConnectionProviderDB2ConnectionProviderSuiteDB2KrbIntegrationSuitedocker integration testWhy are the changes needed?
Missing JDBC kerberos support.
Does this PR introduce any user-facing change?
Yes, now user is able to connect to DB2 using kerberos.
How was this patch tested?