Skip to content

Conversation

@ayoub-benali
Copy link
Contributor

helps with #164

@codecov-io
Copy link

codecov-io commented Feb 15, 2018

Codecov Report

Merging #254 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #254      +/-   ##
==========================================
+ Coverage   96.57%   96.57%   +<.01%     
==========================================
  Files          51       51              
  Lines         876      877       +1     
  Branches       10       14       +4     
==========================================
+ Hits          846      847       +1     
  Misses         30       30
Impacted Files Coverage Δ
dataset/src/main/scala/frameless/TypedColumn.scala 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53fcf79...de672f8. Read the comment docs.

val ds = TypedDataset.create(data)
val res = ds.filter(ds('a).isin(values:_*)).collect().run().toVector
res ?= data.filter(d => values.contains(d.a))
})
Copy link
Contributor

@OlivierBlanvillain OlivierBlanvillain Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you parametrize this test over the type Int? It would be interesting to see how this works for other types (SQLDate, String, X1[String], ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! will do it this evening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly if fails for SQLDate . we get Cause: java.lang.RuntimeException: Unsupported literal type class frameless.SQLDate SQLDate. I guess we should limit isin to primitive types ? Or @OlivierBlanvillain do you have an idea on handling case classes ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it. What about other used defined case classes, such as X1[String], X2[String, Int],?

Copy link
Contributor Author

@ayoub-benali ayoub-benali Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue since SQLDate is also a case class:

Cause: java.lang.RuntimeException: Unsupported literal type class frameless.X1 X1(Ȗ沭䁎Ꞛ蟽啮琏銉綿鋪鎸㭝ꜰﮤ菝˔赘䑕⏐៼ꖃ댬ự뷴뒎ﺱ㟯천䪼汹题켭)
[info]   at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)
[info]   at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
[info]   at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:163)
[info]   at scala.util.Try.getOrElse(Try.scala:79)
[info]   at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162)
[info]   at org.apache.spark.sql.functions$.typedLit(functions.scala:112)
[info]   at org.apache.spark.sql.functions$.lit(functions.scala:95)
[info]   at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:779)
[info]   at org.apache.spark.sql.Column$$anonfun$isin$1.apply(Column.scala:779)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
[info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
[info]   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
[info]   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
[info]   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info]   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
[info]   at org.apache.spark.sql.Column.isin(Column.scala:779)

it boils down to spark calling typedLit and since the value isn't a Column or a Symbol, spark tries to create a literal for it which fails here

@OlivierBlanvillain Is it because frameless serialize literals differently than spark ?
Also, same code fails in plain spark so I am leaning towards limiting isin to primitive types

@ayoub-benali ayoub-benali changed the title add isin to TypedColumn add isin to TypedColumn with restriction to primitive types Feb 21, 2018
Copy link
Contributor

@OlivierBlanvillain OlivierBlanvillain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise! Could you squash it in a single commit?

typed(self.untyped >= lit(u)(self.uencoder).untyped)

/**
* Is in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns true if the value of this column is contained in of the arguments.

* df.select( df('age).isin(15, 20, 30) )
* }}}
*
* @param u are constants of the same type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@param values

@ayoub-benali
Copy link
Contributor Author

@OlivierBlanvillain Done

@OlivierBlanvillain OlivierBlanvillain merged commit e893ebe into typelevel:master Feb 25, 2018
@OlivierBlanvillain
Copy link
Contributor

Thanks!

@implicitNotFound("Cannot do isin operation on columns of type ${A}.")
trait CatalystIsin[A]

object CatalystIsin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, aren’t these the same as anything that is comparable? We already have a typeclass for those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imarios which typeclass are you referring to ? also CatalystIsin should contain Array type but didn't manage to add it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ayoub-benali, that would be CatalystOrdered. I think anything that can be ordered (compared) can also be checked during an "isin".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants