-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47088] Using BigDecimal to do the resource calculation #45157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @srowen, Could you help review this PR? Thx very much. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To take advantage of BigDecimal, you need to avoid floats entirely. You want to go from the string the user supplies to BigDecimal, otherwise, you're supplying a float to BigDecimal and still have the precision issues. Unless there's a reason that can't work, that's the right solution and most of this change goes away. There should be no "ONE_ENTIRE_RESOURCE" etc
|
Hi @srowen, The public API def resource(resourceName: String, amount: Double): this.type = { in TaskResourceRequests require double, and the TaskResourceRequest also require double. So if we would like to go from the string, that probably breaks the APIs. |
|
Darn. Yeah I don't think there is a way to do better without changing the API and that's too much change for what isn't typically an issue in practice. I can't think of a way that keeps the double API but doesn't have a round off error in some case. Let's leave this. |
|
sorry, I've been ooo and busy with other things. I didn't fully follow the discussion in #44690. Can you two please summarize exactly why are we changing this? Is there actually some case the previous method did not handle? Yes I understand there could be some precision loss when dealing with floats/double but I think that is going to come from the API. Is this purely for readability at this point? I think this goes from storing longs to storing BigDecimals, which is not going to be more memory efficient and I thought BigDecimal was fairly slow at operations, but maybe Java has made that better. There obviously was some conversion going on before with toInternalResource so some overhead there if this makes things more efficient. Doing some very poor many testing in a scala shell creating bigdecimal takes around 30,000ns and doing the conversions is around 6,000ns. |
|
The issue is roughly: let's say you want to schedule a task that needs 1/9th of a GPU, and so you request resource amount I think trying to do integer math after floating-point still has the issue; the value is already 'wrong' when it comes in as a double in the API. It may in practice not come up a lot, as often these values are set as a Spark config, as a string, and a user is going to write "0.111" or something; some prefix of what is notionally a big repeating decimal, and so they will ask for less than 1/9th (say) of a GPU and schedule 9 as desired, leaving Spark thinking there's a tiny bit of GPU available when there isn't, but certainly nothing like 1/9th. It's still a bit ugly and yeah the API kind of needs to take a different input type to really address this but maybe not worth breakage. |
How does a user specify "1.0/9.0" in the configs? I tend to agree with your second statement that I don't see this as a problem in practice so my question is why are we changing it to be slower and user more memory if its not solving a problem. |
|
It would have to be programmatically. Imagine, for example, that I generate a config with value 1.0/9.0 that works out to the string "0.1111111111111111" and when parsed as double, that's slightly bigger than 1/9. I think there is a real case here, just possibly rare; that's not unimaginable though. (Performance is not the issue here, these calculations are trivial even in BigDecimal) |
|
Sorry, I'm not following what this pr changes from the previous implementation. the method toInternalResource handles 1.0/9.0 as well, correct? That is perhaps what I'm missing, what case does that not work for.. maybe 1/9.0 works out but you are saying others won't? say 1/9.0 comes out a big bigger like 0.111111111111111112: In this pr, the key line seems to be (the round down):
but as we have discussed the api takes the double so if someone is programmatically passing something in its going to be computed on their side not by Spark. |
|
I think the discussion on this got lost after a force-push, and referred to an earlier implementation, but my assertion is: if you start from double, then no matter how you represent that with integers or BigDecimal downstream, there will be some problem of this form, yes. In some cases your double representation is too small and it works, in some cases it isn't. You should be able to find counterexamples, indeed. The representation would have to be exact from start to finish to really solve this and that seems to need API change. |
|
ok so we should just close this then, if it becomes an issue we could revisit the API. |
|
Hi @srowen, May I ask what's your opinion on @tgravescs comment? |
|
I think I agree with it. I don't think there's a 'real' fix without changing the API and not sure it's worth it for the corner case here |
What changes were proposed in this pull request?
This PR utilizes BigDecimal to calculate the GPU resources.
Why are the changes needed?
To prevent precision errors, the current method of calculating GPU resources involves multiplying by 1E16 to convert doubles to Longs. If needed, it will also convert Longs back to doubles. This approach introduces redundancy in the code, especially for test code.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass the CIs
Was this patch authored or co-authored using generative AI tooling?
No