-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31833][SQL][test-hive1.2] Set HiveThriftServer2 with actual port while configured 0 #28651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan @maropu @wangyum @dongjoon-hyun thanks very much. |
|
Test build #123166 has finished for PR 28651 at commit
|
|
retest this please |
2 similar comments
|
retest this please |
|
retest this please |
...-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SharedThriftServer.scala
Outdated
Show resolved
Hide resolved
|
Test build #123171 has finished for PR 28651 at commit
|
|
Test build #123176 has finished for PR 28651 at commit
|
|
Test build #123179 has finished for PR 28651 at commit
|
...-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SharedThriftServer.scala
Show resolved
Hide resolved
...-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SharedThriftServer.scala
Show resolved
Hide resolved
| keyStorePassword, sslVersionBlacklist); | ||
| } | ||
|
|
||
| // in case it is configured with 0 which represents any free port, we should set it to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it -> HIVE_SERVER2_THRIFT_PORT for clarity.
|
Test build #123258 has finished for PR 28651 at commit
|
|
thanks, merging to master/3.0! |
…rt while configured 0 ### What changes were proposed in this pull request? When I was developing some stuff based on the `DeveloperAPI ` `org.apache.spark.sql.hive.thriftserver.HiveThriftServer2#startWithContext`, I need to use thrift port randomly to avoid race on ports. But the `org.apache.hive.service.cli.thrift.ThriftCLIService#getPortNumber` do not respond to me with the actual bound port but always 0. And the server log is not right too, after starting the server, it's hard to form to the right JDBC connection string. ``` INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 0 with 5...500 worker threads ``` Indeed, the `53742` is the right port ```shell lsof -nP -p `cat ./pid/spark-kentyao-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1.pid` | grep LISTEN java 18990 kentyao 288u IPv6 0x22858e3e60d6a0a7 0t0 TCP 10.242.189.214:53723 (LISTEN) java 18990 kentyao 290u IPv6 0x22858e3e60d68827 0t0 TCP *:4040 (LISTEN) java 18990 kentyao 366u IPv6 0x22858e3e60d66987 0t0 TCP 10.242.189.214:53724 (LISTEN) java 18990 kentyao 438u IPv6 0x22858e3e60d65d47 0t0 TCP *:53742 (LISTEN) ``` In the PR, when the port is configured 0, the `portNum` will be set to the real used port during the start process. Also use 0 in thrift related tests to avoid potential flakiness. ### Why are the changes needed? 1 fix API bug 2 reduce test flakiness ### Does this PR introduce _any_ user-facing change? yes, `org.apache.hive.service.cli.thrift.ThriftCLIService#getPortNumber` will always give you the actual port when it is configured 0. ### How was this patch tested? modified unit tests Closes #28651 from yaooqinn/SPARK-31833. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit fe1da29) Signed-off-by: Wenchen Fan <[email protected]>
| httpServer.start(); | ||
| // In case HIVE_SERVER2_THRIFT_HTTP_PORT or hive.server2.thrift.http.port is configured with | ||
| // 0 which represents any free port, we should set it to the actual one | ||
| portNum = connector.getLocalPort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn did you try to test that? For me setting hive.server2.thrift.http.port to 0 doesn't seem to work when trying to start it in http mode like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kentyao@hulk ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200601 sbin/start-thriftserver.sh --conf spark.hadoop.hive.server2.thrift.http.port=0 --conf spark.hadoop.hive.server2.transport.mode=http
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200601/logs/spark-kentyao-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hulk.local.out
kentyao@hulk ~/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200601 tail -f -2 /Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200601/logs/spark-kentyao-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hulk.local.out
20/06/05 21:26:20 INFO HiveThriftServer2: HiveThriftServer2 started
20/06/05 21:26:20 INFO ThriftCLIService: Started ThriftHttpCLIService in http mode on port 54379 path=/cliservice/* with 5...500 worker threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks... let me debug this more on my end why am I getting 0 back...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verify this locally, it seems to be ok.
but in #28738, I just run into the same issue.
| hiveServer2 = HiveThriftServer2.startWithContext(sqlContext) | ||
| hiveServer2.getServices.asScala.foreach { | ||
| case t: ThriftCLIService if t.getPortNumber != 0 => | ||
| serverPort = t.getPortNumber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveThriftServer2.startWithContext -> HiveThriftServer2.start() -> CompositeService.start() -> Service.start()...
ThriftCLIService.start() creates new Thread(this).start();
So ThriftBinaryCLIService / ThriftHttpCLIService run() is launched in a background thread. I think this can race before the actual port gets assigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run the tests with http mode locally with #28738 and
build/sbt "hive-thriftserver/test-only *HiveThriftHttpServerSuite" -Phive -Phive-thriftserver -Dsbt.override.build.repos=true -Phive-2.3I check the logs in target/unit-tests.log
20/06/05 06:35:55.237 pool-1-thread-1 INFO AbstractService: Service:ThriftHttpCLIService is started.
20/06/05 06:35:55.237 pool-1-thread-1 INFO AbstractService: Service:HiveServer2 is started.
20/06/05 06:35:55.326 Thread-17 INFO Server: jetty-9.4.18.v20190429; built: 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 1.8.0_251-b08
20/06/05 06:35:55.358 Thread-17 INFO session: DefaultSessionIdManager workerName=node0
20/06/05 06:35:55.358 Thread-17 INFO session: No SessionScavenger set, using defaults
20/06/05 06:35:55.359 Thread-17 INFO session: node0 Scavenging every 660000ms
20/06/05 06:35:55.366 Thread-17 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@23f4b16a{/,null,AVAILABLE}
20/06/05 06:35:55.438 Thread-17 INFO AbstractConnector: Started ServerConnector@1b76f67f{HTTP/1.1,[http/1.1]}{0.0.0.0:55923}
20/06/05 06:35:55.438 Thread-17 INFO Server: Started @7043ms
20/06/05 06:35:55.438 Thread-17 INFO ThriftCLIService: Started ThriftHttpCLIService in http mode on port 55923 path=/cliservice/* with 5...500 worker threads
20/06/05 06:35:55.442 pool-1-thread-1 INFO Utils: Supplied authorities: localhost:0
20/06/05 06:35:55.442 pool-1-thread-1 WARN Utils: ***** JDBC param deprecation *****
20/06/05 06:35:55.442 pool-1-thread-1 WARN Utils: The use of hive.server2.transport.mode is deprecated.
20/06/05 06:35:55.442 pool-1-thread-1 WARN Utils: Please use transportMode like so: jdbc:hive2://<host>:<port>/dbName;transportMode=<transport_mode_value>
20/06/05 06:35:55.442 pool-1-thread-1 WARN Utils: ***** JDBC param deprecation *****
20/06/05 06:35:55.442 pool-1-thread-1 WARN Utils: The use of hive.server2.thrift.http.path is deprecated.
20/06/05 06:35:55.442 pool-1-thread-1 WARN Utils: Please use httpPath like so: jdbc:hive2://<host>:<port>/dbName;httpPath=<http_path_value>
20/06/05 06:35:55.442 pool-1-thread-1 INFO Utils: Resolved authority: localhost:0
20/06/05 06:35:55.663 pool-1-thread-1 ERROR HiveConnection: Error opening session
org.apache.thrift.transport.TTransportException: org.apache.http.conn.HttpHostConnectException: Connect to localhost:80 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused)
at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:297)
at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:316)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
at org.apache.hive.service.rpc.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:162)
at org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:154)
at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:680)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:200)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withMultipleConnectionJdbcStatement$1(SharedThriftServer.scala:106)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.withMultipleConnectionJdbcStatement(SharedThriftServer.scala:106)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.withMultipleConnectionJdbcStatement$(SharedThriftServer.scala:105)
at org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.withMultipleConnectionJdbcStatement(HiveThriftServer2Suites.scala:280)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.withJdbcStatement(SharedThriftServer.scala:141)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.withJdbcStatement$(SharedThriftServer.scala:140)
at org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.withJdbcStatement(HiveThriftServer2Suites.scala:280)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$startThriftServer$4(SharedThriftServer.scala:166)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.concurrent.Eventually.makeAValiantAttempt$1(Eventually.scala:395)
at org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:409)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
at org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.eventually(HiveThriftServer2Suites.scala:280)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
at org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.eventually(HiveThriftServer2Suites.scala:280)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:165)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$beforeAll$1(SharedThriftServer.scala:71)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.beforeAll(SharedThriftServer.scala:71)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.beforeAll$(SharedThriftServer.scala:68)
at org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.beforeAll(HiveThriftServer2Suites.scala:280)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:59)
at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to localhost:80 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:159)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:394)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:251)
... 53 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:606)
at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:75)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
... 64 more
20/06/05 06:35:55.665 pool-1-thread-1 WARN HiveConnection: Failed to connect to localhost:0The starting phase looks ok to me and the port logged assigned Started ThriftHttpCLIService in http mode on port 55923 path=/cliservice/* with 5...500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log line Started ThriftHttpCLIService in http mode on port 55923 path=/cliservice/* with 5...500 is logged in a background thread that is launched to start the server, so
serverPort = t.getPortNumber
may be executed before it's assigned and be still 0 at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see
What changes were proposed in this pull request?
When I was developing some stuff based on the
DeveloperAPIorg.apache.spark.sql.hive.thriftserver.HiveThriftServer2#startWithContext, I need to use thrift port randomly to avoid race on ports. But theorg.apache.hive.service.cli.thrift.ThriftCLIService#getPortNumberdo not respond to me with the actual bound port but always 0.And the server log is not right too, after starting the server, it's hard to form to the right JDBC connection string.
Indeed, the
53742is the right portIn the PR, when the port is configured 0, the
portNumwill be set to the real used port during the start process.Also use 0 in thrift related tests to avoid potential flakiness.
Why are the changes needed?
1 fix API bug
2 reduce test flakiness
Does this PR introduce any user-facing change?
yes,
org.apache.hive.service.cli.thrift.ThriftCLIService#getPortNumberwill always give you the actual port when it is configured 0.How was this patch tested?
modified unit tests