Skip to content

Conversation

@sanastas
Copy link

@sanastas sanastas commented Sep 12, 2018

Yet another PR, after re-basing on top of the new state of the Druid master. The comments added in PR#6235 will be all fixed. Adding here the previous introduction...

Hi Everybody,

Please take a look on the following Oak Incremental Index Pool Request. Below please find a list of the help material. Although we will soon present also the performance results, we would like to hear your thoughts and comments from the correctness and software engineering point of views.

Thanks,
Anastasia

In continuation to our great talk yesterday (where we agreed about publishing a PR for Oak Incremental Index for Druid), hereby please find some reading material about Oak. The pool request itself will be ready for tomorrow. I would strongly suggest to read some of the documentation prior to looking on the code, as it may make the last task easier and clearer. As I said the Oak is now an open source library (https://github.com/yahoo/Oak) and its README might help you to understand the Oak's interface and new Druid's code.

Recall we had an issue#5698 about Oak Incremental Index (#5698) and there you can find another useful documents to start from. We will publish the latest single-thread ingestion benchmark results tomorrow, together with the pool request. Feel free to ask questions!

Files from the issue:

1.  Oak Introduction: https://github.com/druid-io/druid/files/1946175/OakIntroduction.pdf
2.  Oak Details Presentation: https://github.com/druid-io/druid/files/1946182/OAK.Off-Heap.Allocated.Keys.pptx
3.  Suggested Refactoring: https://github.com/druid-io/druid/files/1947353/Incremental.Index.Refactoring.Suggestion.pdf
4.  Some additional Oak results: https://github.com/druid-io/druid/files/1959106/OakMap.OnHeap.with.Integer.Values.pdf

Here one can see initial ingestion benchmark results: https://github.com/apache/incubator-druid/files/2321429/IndexIngestionBenchmarkWithOak.pdf

@leventov
Copy link
Member

@sanastas thank you. Could you please fix those inspections: https://teamcity.jetbrains.com/viewLog.html?buildId=1610792&tab=Inspection&buildTypeId=OpenSourceProjects_Druid_InspectionsPullRequests

Also, oak:oak:1.0-SNAPSHOT dependency is unresolved for the build

@sanastas
Copy link
Author

Thank you, Roman (@leventov )! Thank you for taking a look! We will fix the oak:oak:1.0-SNAPSHOT dependency asap.

Regarding the code inspection you have mentioned, is it some static code analyzing? It is a bit strange as part of the files are not in our scope. We will try to look deeper. Thank you anyway!

@leventov
Copy link
Member

Yes, this is static analysis. Switch to "Errors" radio to see the problems:

image
As far as I can see, they are all related to Oak.

@sanastas
Copy link
Author

Oh! I see it now, will fix it as well. Thanks!

@fjy fjy added this to the 0.13.0 milestone Sep 14, 2018
@jihoonson
Copy link
Contributor

Hi guys, I don't think this should be a blocking issue for 0.13.0 release. I'll remove the milestone.

@jihoonson jihoonson removed this from the 0.13.0 milestone Sep 17, 2018
@gianm
Copy link
Contributor

gianm commented Sep 28, 2018

Hi @sanastas,

Does this code run -- and if so how can it be activated? I am thinking we can start helping to review / test by deploying this patch to our test cluster. It will at least provide some more info on stability, perf, etc in our environment.

@sanastas
Copy link
Author

Hi @gianm !

It is great to hear you can start help with review! To make the Oak available for the Druid cluster now is our first priority now. I hope it will be finished soon.

Unfortunately, I am leaving for a while for maternity leave. I will be reading emails from time to time but will not be involved directly, till I am back in 2019. However, our team continue working on Oak so no worries.

Eshcar Hillel (@Eshcar) [email protected] , will be our point of contact for all Druid related issues. She’ll connect to one of the next calls, she had been on the calls already.

Thanks!

@Eshcar
Copy link

Eshcar commented Oct 4, 2018

Hi
I have started working on publishing Oak in maven central repository. Need to handle some access permission issues first.
Hope to get this done in the next few days.

@yonigottesman yonigottesman force-pushed the DRUID--OAK_PULL_REQUEST--V01 branch from d649e91 to 8010843 Compare October 11, 2018 10:11
@yonigottesman
Copy link

Hi guys, This branch now passes all checks and you can check it out.
There is a new IncrementalIndex that can be created with IncrementailIndex.Builder.buildOffheapOak().
@gianm, Do you have a standard way you benchmark your build? I would also want to setup a cluster and evaluate it. Do you have some workload you always use?

double throughput = rowsPerSegment / (double) duration;
log.info("Throughput: " + throughput + " ops/ms");
}
}
Copy link
Contributor

@b-slim b-slim Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please attach the benchmark results before and after this PR? Nvm I see the results

-Xmx512m
-XX:MaxDirectMemorySize=1500m
-XX:MaxDirectMemorySize=4g
-Duser.language=en
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

@Eshcar
Copy link

Eshcar commented Oct 15, 2018

The first version of Oak (0.1.3) is deployed in Maven Central, and the druid pom file is updated to retrieve it.
However 0.1.3 does not support yet off heap buffers greater than 2GB, so don't try benchmarking at scale.
We will soon deploy a new version that will allow tests with larger memory. Will update when it happens.

@stale
Copy link

stale bot commented Apr 14, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@stale stale bot added the stale label Apr 14, 2019
@stale
Copy link

stale bot commented Apr 21, 2019

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Apr 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants