Skip to content

Conversation

@gatorsmile
Copy link
Member

Added Python test cases for the function isnan, isnull, nanvl and json_tuple.

Fixed a bug in the function json_tuple

@rxin , could you help me review my changes? Please let me know anything is missing.

Thank you! Have a good Thanksgiving day!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test both string names and columns? e.g.

df.select(isnan("a").alias("r1"), isnan(df.a).alias("r2")).collect()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and do the same thing for the rest of the functions

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46709 has finished for PR 9977 at commit d47f1e7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public final class OneWayMessage implements RequestMessage\n

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46713 has finished for PR 9977 at commit d6e29d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

cc @davies for a final look.

The changes LGTM.

@gatorsmile
Copy link
Member Author

In the next few days, I will look at the implementation of get_json_object. I suspect the original implementation has an issue regarding null values.

If you compare the results of get_json_object and json_tuple, one attribute is different. I will try to fix it in a separate PR. This should not be related to the Python interface, I think.

Thank you!

@gatorsmile
Copy link
Member Author

Just did a quick check. I can confirm this is not caused by Python. I reproduced it using the scala API.

@gatorsmile
Copy link
Member Author

Narrowed down to the following code in jsonExpressions.scala:

          val output = new ByteArrayOutputStream()
          val matched = Utils.tryWithResource(
            jsonFactory.createGenerator(output, JsonEncoding.UTF8)) { generator =>
            parser.nextToken()
            evaluatePath(parser, generator, RawStyle, parsed.get)
          }

So far, our parser returns the same results of output for the following two cases. Both results are "null":

    val tuple: Seq[(String, String)] = ("5", """{"f1": null}""") :: Nil
    val df: DataFrame = tuple.toDF("key", "jstring")
    val res = df.select(functions.get_json_object($"jstring", "$.f1")).collect()
    val tuple2: Seq[(String, String)] = ("5", """{"f1": "null"}""") :: Nil
    val df2: DataFrame = tuple2.toDF("key", "jstring")
    val res3 = df2.select(functions.get_json_object($"jstring", "$.f1")).collect()

@gatorsmile
Copy link
Member Author

Found a discussion about this issue:

http://www.scriptscoop.net/t/1a9222820510/java-how-to-tell-jackson-to-deserialize-null-string-to-null-literal.html

Please let me know what I should do next. Thanks! @rxin @davies @marmbrus @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think one simple case should be enough for Python tests, other corner cases should be tested in Scala.

The Python doc tests will be part of API doc, so it's better to be read friendly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do. I will move the test case of get_json_object to the scala test file. Will simplify the existing test cases of get_json_object and json_tuple. Thanks!

@rxin
Copy link
Contributor

rxin commented Nov 26, 2015

I'd just simply the test case as Davies suggested, and then merge this in. In parallel you can work on a patch to fix whatever bugs you find.

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46750 has finished for PR 9977 at commit b83525a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 26, 2015

Thanks - I'm going to merge this.

asfgit pushed a commit that referenced this pull request Nov 26, 2015
Added Python test cases for the function `isnan`, `isnull`, `nanvl` and `json_tuple`.

Fixed a bug in the function `json_tuple`

rxin , could you help me review my changes? Please let me know anything is missing.

Thank you! Have a good Thanksgiving day!

Author: gatorsmile <[email protected]>

Closes #9977 from gatorsmile/json_tuple.

(cherry picked from commit 068b643)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 068b643 Nov 26, 2015
@gatorsmile
Copy link
Member Author

Thank you for your help! @rxin @davies

Just let me know if you need me to do any JIRA. Have a good holiday!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants