-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18230][MLLib]Throw a better exception, if the user or product doesn't exist #21740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b0c31a7 to
b3ef34f
Compare
jianran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the IllegalArgumentException better than SparkException?
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see the JIRA for a discussion. The main reason is that NoSuchElementException doesn't really make sense semantically and has no useful message attached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to assert about the exact message string. Maybe assert it contains the user ID. But just checking for the exception also seems close enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is you now look up the user and product twice. Instead, the callers should probably check whether userFeatures.lookup(user) is empty before calling .head and otherwise throw the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed the method to 'validateAndGetUser', where it check, whether the user exist or not and it returns the corresponding user feature. Similarly for the product also.
Please let me know if anymore changes required.
d7315c4 to
b07e07e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to the code base, I understand Either and Option isn't used a lot in public APIs in Spark but shouldn't it be annotated that the functions throws a certain type of exception (being more explicit with the Exception) ?
like @throws(classOf[IllegalArgumentException]) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning Option could be reasonable, but, I think the caller would end up giving up with an exception anyway. There isn't an obviously better thing to do.
Exceptions all behave as if unchecked in Scala and don't get expressed in method signatures in the byte code. @throws is really for Java compatibility where it's important for the Java language to express that a checked exception is thrown (it makes a compile-time difference in Java).
That's not necessary here, but scaladoc'ing exceptions like this is indeed a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation Sean :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the pattern match? .lookup should already be known to return a Seq of the type of the RDD's values which is already known to be Array[Double].
The name of the method isn't really accurate either; it gets a user vector, not user.
val vec = userFeatures.lookup(user)
require(vec.nonEmpty, ...)
vec.head
How about one method instead of two, that takes the RDD as an arg? might cut down on the code expansion here. I'm on the fence about whether this warrants a new method since there are only two call sites, but it's reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment.
I have removed both the methods. Because each method has to give distinct exception, like "user not found" in the first method and "product not found" in the second method. So, it is better to remove both the methods, instead of making one.
I have added the validation code inside the methods such as predict, recommendProduct etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Please let me know if anymore changes required.
b07e07e to
01d43c8
Compare
|
Test build #4211 has finished for PR 21740 at commit
|
|
Hi @srowen. The build has passed. |
|
Merged to master. Your JIRA handle is "shahid" right? |
|
Thanks @srowen. yes, my JIRA handle is "shahid". |
When invoking MatrixFactorizationModel.recommendProducts(Int, Int) with a non-existing user, a java.util.NoSuchElementException is thrown:
What changes were proposed in this pull request?
Throw a better exception, like "user-id/product-id doesn't found in the model", for a non-existent user/product
How was this patch tested?
Added UT