Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds in MergingDigest #107

Open
death opened this issue May 6, 2018 · 27 comments
Open

Index out of bounds in MergingDigest #107

death opened this issue May 6, 2018 · 27 comments

Comments

@death
Copy link

death commented May 6, 2018

Hi,

The following variant of the testSmallCountQuantile test results in a java.lang.ArrayIndexOutOfBoundsException error. Dropping the last element will have the test pass.

@Test
public void testSmallCountQuantile2() {
    List<Double> data = Lists.newArrayList(3.0, 3.1, 3.0, 3.0,
                                           3.0, 3.0, 3.0, 3.0,
                                           3.0, 3.0, 3.0, 3.0,
                                           3.5, 3.3, 3.4, 3.5,
                                           3.5, 4.0, 2.95, 3.18,
                                           180.0, 3.0, 3.0, 3.0
                                           , 3.0
                                           );
    TDigest td = new MergingDigest(1);
    for (Double datum : data) {
        td.add(datum);
    }
}
@dariomx
Copy link

dariomx commented Sep 5, 2018

this may be out of date; i just ran it into current master and it passes.

@tdunning
Copy link
Owner

tdunning commented Sep 6, 2018 via email

@rnorth
Copy link

rnorth commented Nov 26, 2018

I'm not sure I'd say this is out of date, per se. The above code snippet runs OK with the current master version. However, the latest release version (AFAICT 3.2, released August 2017) will crash with these inputs, and we've seen similar failures in production systems with different inputs.

Is there a plan to release a new version of the library to maven central? It's a really nice library, so we're keen to continue using it and would prefer not to have to run our own internal fork!

@tdunning
Copy link
Owner

tdunning commented Nov 26, 2018 via email

@tdunning
Copy link
Owner

A compression factor of 1 is pretty extreme, isn't it? I wouldn't understand the use of any value less than, say, 10 or 20.

@rnorth
Copy link

rnorth commented Nov 26, 2018

Thanks for such a quick response. Yes, a patch release would be greatly appreciated.

We have this occurring at a compression factor of 100 (and obviously an entirely different volume of input data points).

@tdunning
Copy link
Owner

Where is the exception happening?

The problem that I just looked at had to do with the bufferSize getting set too small so that the array copy failed on merging. Where is your problem appearing? Same place?

@rnorth
Copy link

rnorth commented Nov 26, 2018

In both cases (the repro case above and our error in production) both failed at line 302 of MergingDigest.java, which is the following (in v3.2):

        System.arraycopy(mean, 0, incomingMean, incomingCount, lastUsedCell);

Relevant part of the stack trace:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at com.tdunning.math.stats.MergingDigest.merge(MergingDigest.java:302)
	at com.tdunning.math.stats.MergingDigest.mergeNewValues(MergingDigest.java:291)
	at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:202)
	at com.tdunning.math.stats.MergingDigest.add(MergingDigest.java:194)
	at com.tdunning.math.stats.AbstractTDigest.add(AbstractTDigest.java:131)

I'm afraid right now I can't give much more information about the state of the fields at the time this is occurring, aside from saying that we've seen this on one of our busiest services. I shall try and gather some more useful data tomorrow, though, if it helps.

@hangfei
Copy link

hangfei commented May 14, 2019

Is this fixed?

@tdunning
Copy link
Owner

tdunning commented May 14, 2019 via email

@tdunning
Copy link
Owner

Verified.

@hangfei
Copy link

hangfei commented Feb 8, 2020

Is this fix released and in which version?

@cnuernber
Copy link

cnuernber commented Mar 24, 2021

Related exception in 3.2:

ERROR in (issue-201-incorrect-result-column-count) (System.java:-2)
Uncaught exception, not in assertion.
expected: nil
  actual: java.lang.ArrayIndexOutOfBoundsException: null
 at java.lang.System.arraycopy (System.java:-2)
    com.tdunning.math.stats.MergingDigest.merge (MergingDigest.java:302)
    com.tdunning.math.stats.MergingDigest.mergeNewValues (MergingDigest.java:291)
    com.tdunning.math.stats.MergingDigest.quantile (MergingDigest.java:661)

This is after do a parallelized aggregation where TDigest.add was called to merge separate thread contexts. It happens only once in a while, not every time which is a bummer for a large aggregation.

@tdunning
Copy link
Owner

tdunning commented Mar 24, 2021 via email

@cnuernber
Copy link

cnuernber commented Mar 24, 2021

I have a test case that does this about 3 times out of 100 if I run it in a loop. I am using version 3.2.

I will get test with master:HEAD and see if that eliminates/changes the problem.

@tdunning
Copy link
Owner

tdunning commented Mar 24, 2021 via email

@cnuernber
Copy link

100% main:HEAD fixes the test. I ran it many thousands of times just now and saw zero failures.

The test case involves a Clojure datatframe library which means you would be calling Clojure in your unit tests. Are you sure you want the test case :-)? I basically to a N-core reduction across Y datasets followed by thread0_digest.add(rest-of-digests).

This library is fantastic so I am down to help you any way you like.

@tdunning
Copy link
Owner

tdunning commented Mar 24, 2021 via email

@tdunning
Copy link
Owner

@cnuernber I would like to take you up on your offer.

Can you look at the current main branch plus live issues and identify what you would say are blockers for a point release?

If none and if this sounds like consensus, I will start the process with what we have.

@cnuernber
Copy link

Yes, will do.

@cnuernber
Copy link

I looked through the issues and have them categorized and such. Here are my opinions- In short I think a point release is worth it now to get the fixes for mergetree.

  • I think it is worth a point release as the fixes to the merge algorithm are substantial in both stability and correctness.
  • AVLTree needs time and effort to stabilize - it does have some issues.
  • For serialization, my recommendation is to do as little effort as necessary to come up with a decently forward compatible versioned POD scheme for mergetree alone. Java serialization is often a disaster and a major security risk and once things are in simple datatypes users can serialize however they like including using Java serialization. This should cut the effort required as it sidesteps the issue of how to actually save the data and avoids unnecessary dependencies. I don't agree Avro is a good choice, parquet is a good option for longterm storage due to its default good compression and wide language support and Arrow is a good option and very well supported across languages now. There are lots of other options not considered such as dendrite; virtually an endless number of them I think. So starting with a carefully built POD (or POJO as Java people call them) I think is reasonable and if people want protobufs or flatbuffers they can auto-generate them from the POD description. For reference, the Arrow project uses flatbuffers as its base serialization format and builds their schema and record pathways off of a basic flatbuffer message type.
  • There appears to be a high-quality header-only c++ implementation that @derrickburns was working on that perhaps would be a good idea. A minimal clean C-library based on that implementation would then get you Python, R, Julia, Go, etc. with no further bindings. So that cuts your core tested language matrix to Java and C++/C.

I messaged you on linked in and we can continue offline from there or we can continue the discussion here; whichever way works for you.

@tdunning
Copy link
Owner

tdunning commented Apr 2, 2021

I will close this after the upcoming release.

@tdunning
Copy link
Owner

tdunning commented Apr 8, 2021

(Slight) progress note.

I have decoded the new behavior of gpg and interactions with the maven code signing (short answer, much better than before). Should be able to cut a release shortly.

@rnataf
Copy link

rnataf commented Sep 6, 2022

Hi @cnuernber, could you explain how you solved the bug ? I'm having the same one, really rarely also.

@cnuernber
Copy link

Well, two ways. Honestly I think this library has been superceded by apache datasketches. What we did before I started work with datasketches was I patched the library and released it on Clojars under a different name.

If you are doing work in this area of course I have to recommend our data processing system :-).

@wxbty
Copy link

wxbty commented Jul 6, 2023

Well, two ways. Honestly I think this library has been superceded by apache datasketches. What we did before I started work with datasketches was I patched the library and released it on Clojars under a different name.

If you are doing work in this area of course I have to recommend our data processing system :-).

hi,@cnuernber, I saw that your fork just submitted two configurations, how did you solve the current problem?(main...techascent:t-digest:main)

@tdunning
Copy link
Owner

tdunning commented Jul 6, 2023

@cnuernber Interesting to see the work on tech.ml.dataset. It frankly looks a bit like a reinvention of Arrow, but just in Clojure, at least for the columnar in-memory format work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants