-
Notifications
You must be signed in to change notification settings - Fork 138
add isin to TypedColumn with restriction to primitive types #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add isin to TypedColumn with restriction to primitive types #254
Conversation
Codecov Report
@@ Coverage Diff @@
## master #254 +/- ##
==========================================
+ Coverage 96.57% 96.57% +<.01%
==========================================
Files 51 51
Lines 876 877 +1
Branches 10 14 +4
==========================================
+ Hits 846 847 +1
Misses 30 30
Continue to review full report at Codecov.
|
9a1b907 to
a6bf378
Compare
| val ds = TypedDataset.create(data) | ||
| val res = ds.filter(ds('a).isin(values:_*)).collect().run().toVector | ||
| res ?= data.filter(d => values.contains(d.a)) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you parametrize this test over the type Int? It would be interesting to see how this works for other types (SQLDate, String, X1[String], ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! will do it this evening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly if fails for SQLDate . we get Cause: java.lang.RuntimeException: Unsupported literal type class frameless.SQLDate SQLDate. I guess we should limit isin to primitive types ? Or @OlivierBlanvillain do you have an idea on handling case classes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it. What about other used defined case classes, such as X1[String], X2[String, Int],?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue since SQLDate is also a case class:
Cause: java.lang.RuntimeException: Unsupported literal type class frameless.X1 X1(Ȗ沭䁎Ꞛ蟽啮琏銉綿鋪鎸㭝ꜰﮤ菝˔赘䑕⏐ꖃ댬ự뷴뒎ﺱ㟯천䪼汹题켭)
[info] at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)
[info] at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
[info] at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
[info] at scala.util.Try.getOrElse(Try.scala:79)
[info] at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162)
[info] at org.apache.spark.sql.functions$.typedLit(functions.scala:112)
[info] at org.apache.spark.sql.functions$.lit(functions.scala:95)
[info] at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:779)
[info] at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:779)
[info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info] at scala.collection.Iterator$class.foreach(Iterator.scala:891)
[info] at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
[info] at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
[info] at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
[info] at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info] at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[info] at org.apache.spark.sql.Column.isin(Column.scala:779)
it boils down to spark calling typedLit and since the value isn't a Column or a Symbol, spark tries to create a literal for it which fails here
@OlivierBlanvillain Is it because frameless serialize literals differently than spark ?
Also, same code fails in plain spark so I am leaning towards limiting isin to primitive types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise! Could you squash it in a single commit?
| typed(self.untyped >= lit(u)(self.uencoder).untyped) | ||
|
|
||
| /** | ||
| * Is in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns true if the value of this column is contained in of the arguments.
| * df.select( df('age).isin(15, 20, 30) ) | ||
| * }}} | ||
| * | ||
| * @param u are constants of the same type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@param values
cdf8a4f to
de672f8
Compare
|
@OlivierBlanvillain Done |
|
Thanks! |
| @implicitNotFound("Cannot do isin operation on columns of type ${A}.") | ||
| trait CatalystIsin[A] | ||
|
|
||
| object CatalystIsin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, aren’t these the same as anything that is comparable? We already have a typeclass for those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imarios which typeclass are you referring to ? also CatalystIsin should contain Array type but didn't manage to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ayoub-benali, that would be CatalystOrdered. I think anything that can be ordered (compared) can also be checked during an "isin".
helps with #164