-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42690][CONNECT] Implement CSV/JSON parsing functions for Scala client #40332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2aed104
aadfe3f
8dbf090
70406c0
14ce16c
7805fe3
0f77457
7b9ce61
c9f8522
dca6f60
3734ee9
20f1722
ca6ec7b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| LogicalRDD [c1#0, c2#0], false | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh this makes me sad. We are we using RDDs here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala Lines 424 to 433 in 39a5512
spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala Lines 503 to 521 in 39a5512
spark/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala Lines 560 to 571 in 39a5512
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the server side, the input |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| LogicalRDD [c1#0, c2#0], false |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| { | ||
| "common": { | ||
| "planId": "1" | ||
| }, | ||
| "parse": { | ||
| "input": { | ||
| "common": { | ||
| "planId": "0" | ||
| }, | ||
| "localRelation": { | ||
| "schema": "{\"type\":\"struct\",\"fields\":[{\"name\":\"value\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}" | ||
| } | ||
| }, | ||
| "format": "PARSE_FORMAT_CSV", | ||
| "schema": { | ||
| "struct": { | ||
| "fields": [{ | ||
| "name": "c1", | ||
| "dataType": { | ||
| "string": { | ||
| } | ||
| }, | ||
| "nullable": true | ||
| }, { | ||
| "name": "c2", | ||
| "dataType": { | ||
| "integer": { | ||
| } | ||
| }, | ||
| "nullable": true | ||
| }] | ||
| } | ||
| }, | ||
| "options": { | ||
| "header": "true" | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| { | ||
| "common": { | ||
| "planId": "1" | ||
| }, | ||
| "parse": { | ||
| "input": { | ||
| "common": { | ||
| "planId": "0" | ||
| }, | ||
| "localRelation": { | ||
| "schema": "{\"type\":\"struct\",\"fields\":[{\"name\":\"value\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}" | ||
| } | ||
| }, | ||
| "format": "PARSE_FORMAT_JSON", | ||
| "schema": { | ||
| "struct": { | ||
| "fields": [{ | ||
| "name": "c1", | ||
| "dataType": { | ||
| "string": { | ||
| } | ||
| }, | ||
| "nullable": true | ||
| }, { | ||
| "name": "c2", | ||
| "dataType": { | ||
| "integer": { | ||
| } | ||
| }, | ||
| "nullable": true | ||
| }] | ||
| } | ||
| }, | ||
| "options": { | ||
| "allowsinglequotes": "true" | ||
| } | ||
| } | ||
| } |
Large diffs are not rendered by default.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
Lines 404 to 417 in 69dd20b
From the code of the server side,
userSpecifiedSchemais anOption[StructType]and default isNone, so I think we can use it without specifying theuserSpecifiedSchemafor this function? Or is my test case not the correct scenario?@zhengruifeng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should add the user provided schema in the message? Or always discard it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will inferFromDataset trigger an job? If so, I think we’d better skip it if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right, we should add
schemato the message if it exists, thanks ~ I will update it later