-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to reflect new changes to the t-digest algorithm #20
Comments
Agreed. I'd have to give the paper a read to see what changes were made since I implemented this. However, I suggest you start with adding the existing implementation to Dask, and update the algorithm here afterwards. The algorithm as written works, and seeing it in use will likely influence what changes (if any) need to be made.
Yes. The paper is tracked in the github repo as well: https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf.
Yes, and kind of. The tdigest license (apache) is included in our license file with a note in the relevant files that the algorithm is borrowed from the tdigest repo. However, the fastest way to do things in java is not the same as in C - you will likely have to make changes rather than doing a direct port. I personally have no bandwidth right now to help with this project, but would accept whatever changes are made, provided they come with speed/accuracy improvements and proper tests. |
It makes sense to do the Dask implementation first. The speed improvement is nice to have but probably not essential, the current version should work alright. Thanks for the answers. My C is actually better than my Java so hopefully it's not too complicated to convert it. I should have time to do this after I have the Dask implementation ready so I'll just put this on my pile. |
Lately there have been many changes to the t-digest that improve both accuracy and speed of the algorithm. Having the changes incorporated into this module will be very useful when this implementation of t-digest is used to implement the user facing
quantile
function in Dask.There are couple of questions open still before the work can begin:
The text was updated successfully, but these errors were encountered: