Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Oct 15, 2018

What changes were proposed in this pull request?

Literal.value should have a value a value corresponding to dataType. This pr added code to verify it and fixed the existing tests to do so.

How was this patch tested?

Modified the existing tests.

@maropu maropu changed the title [SPARK-25734][SQL} [SPARK-25734][SQL] Literal should have a value corresponding to dataType Oct 15, 2018
@cloud-fan
Copy link
Contributor

LGTM

case StringType => v.isInstanceOf[UTF8String]
case _: StructType => v.isInstanceOf[InternalRow]
case _: ArrayType => v.isInstanceOf[ArrayData]
case _: MapType => v.isInstanceOf[MapData]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should validate recursively for StructType, ArrayType, and MapType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, good catch ....!

@SparkQA
Copy link

SparkQA commented Oct 15, 2018

Test build #97384 has finished for PR 22724 at commit 9115d26.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • s\"but $

}
require(v == null || doValidate(v, dataType),
s"Literal must have a corresponding value to ${dataType.catalogString}, " +
s"but ${if (v != null) s"class ${Utils.getSimpleName(v.getClass)}" else "null"} found.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but null found cannot happen logically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes. I'll update.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97419 has finished for PR 22724 at commit b00f4bb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • s\"but class $

case BinaryType => v.isInstanceOf[Array[Byte]]
case StringType => v.isInstanceOf[UTF8String]
case st: StructType =>
v.isInstanceOf[GenericInternalRow] && {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely not true. can we do

v.isInstanceOf[InternalRow] && {
  val row = v.asInstanceOf[InternalRow]
  st.fields.map(_.dataType).zipWithIndex.foreach {
    case (dt, i) => doValidate(row.get(i, dt), dt)
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes. ok, I'll fix.

doValidate(map.keyArray.array.head, mt.keyType) &&
doValidate(map.valueArray.array.head, mt.valueType)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we don't need to check the whole elements for ArrayType and MapType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the whole element check seems to be expensive, the current one is ok to me.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too if the tests pass

}
case ObjectType(cls) => cls.isInstance(v)
case udt: UserDefinedType[_] => doValidate(v, udt.sqlType)
case _ => false
Copy link
Member

@ueshin ueshin Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add NullType case?
nvm, not needed usually.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97427 has finished for PR 22724 at commit a30e9ce.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case BinaryType => v.isInstanceOf[Array[Byte]]
case StringType => v.isInstanceOf[UTF8String]
case st: StructType =>
v.isInstanceOf[InternalRow] && {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do the same for array and map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case mt: MapType =>
v.isInstanceOf[ArrayBasedMapData] && {
val map = v.asInstanceOf[ArrayBasedMapData]
map.numElements() == 0 || {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this. The array validation already consider numElements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you suggested like this?

      case mt: MapType =>
        v.isInstanceOf[MapData] && {
          val map = v.asInstanceOf[MapData]
          doValidate(map.keyArray(), ArrayType(mt.keyType)) &&
            doValidate(map.valueArray(), ArrayType(mt.valueType))
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97431 has finished for PR 22724 at commit 56b4778.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM pending jenkins

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97432 has finished for PR 22724 at commit 4df5abd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Oct 16, 2018

I'm checking the reason of the test failures...

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97433 has finished for PR 22724 at commit 1c042f3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97439 has finished for PR 22724 at commit 60a7793.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97443 has finished for PR 22724 at commit 77f65f9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

LGTM, but this is a breaking change, so I'd suggest to do it only for 3.0 and add a note to the migration guide. WDYT?

@maropu
Copy link
Member Author

maropu commented Oct 16, 2018

yea, its ok to merge this into v3.0 only. But, we need to update the guide?

@cloud-fan
Copy link
Contributor

Literal is an internal API, and AFAIK end-users can't construct an invalid Literal with public APIs. If they can, then it's a bug and we have a problem...

@mgaido91
Copy link
Contributor

yes, you're right. So we don't need any migration note. But what if a user uses Literal directly? He/she shouldn't, but we cannot exclude this is happening. So I'd still target this for 3.0 only.

@cloud-fan
Copy link
Contributor

yea +1 on 3.0 only, this is kind of a developer API, advanced users may use it.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97451 has finished for PR 22724 at commit a2f0288.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97457 has finished for PR 22724 at commit 04a535d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in a9f685b Oct 17, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
`Literal.value` should have a value a value corresponding to `dataType`. This pr added code to verify it and fixed the existing tests to do so.

## How was this patch tested?
Modified the existing tests.

Closes apache#22724 from maropu/SPARK-25734.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants