-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOMA Profiler RFCs #162
base: main
Are you sure you want to change the base?
SOMA Profiler RFCs #162
Conversation
@beroy - one thing that would help me review this is some additional context and background information on the requirements and definition of success. Questions top of mind:
Lastly (minor point) - we should probably discuss if this belongs in the TileDB-SOMA repo, or as a peer repo (motivated by a goal of keeping each repo simpler/focused - i.e., the old monorepo debate :-) |
@bkmartinjr, thanks a lot for the great comments. We had a design review last week and discussed some of the related issues but did not cover all the questions here. I try to address them here: Finally, this repo suggestion seems to come from an agreement between TileDB and CZI for sharing RFCs. @maniarathi can comment here. |
Hi @beroy,
Did this review include the TileDB team? Most of the code is in their DB stack (not in SOMA or Census code), and that is also where most of the complexity and performance sensitive code is (e.g, github.com/TileDB/TileDB and github.com/TileDB/TileDB-Py). The core TileDB team has extensive experience with performance work, and I presume have a tool stack already in place. I just want to make sure we are filling a gap that exists, and that we benefit from the existing knowledge and tools to build upon.
AFAIK, you can't capture them from outside the process, but enabling in-process collection has very little overhead and they are designed to be captured in the normal course of using the DB. Given that you must run the system with your own driver process, I suggest just capturing them as part of that driver process. The total size of stats summary is tiny and can be written to logs as part of running the core tests. There are some near-term caveats to this due to our double-instantiation of the core DB, @gspowley and others can talk you through their plans to centralize everything (tactically,just capture stats fro both instances, or perhaps even ignore the second if you are focused on read-only perf). The stats API in SOMA is already exposed via |
@bkmartinjr the current POC is @nguyenv |
@bkmartinjr. TileDB people were not in the review but I discuss their profiling tools with @gspowley. While their tool capture some of the needed stuff the main focus is on distributed nodes. Also my goal is to capture the breakdown across python/R and C++. Your suggestion about collecting DB stats makes total sense (directly log them from the driver). BTW, here by core DB do you mean TileDB? Also, I'm not familiar with the double-instantiation plan. Will check with them |
@beroy - I'm using "core DB" and "embedded TileDB" as synonyms. We end up with two separate instantiations, with their own private memory/context/etc due to the way we are bootstrapping the SOMA C++ layer. As noted by @johnkerl, @nguyenv is POC on this (apologies for earlier misdirect). Using Python as an example (similar issue in R):
I only point this out as the separate core DB instances have their own stats data. |
two non–content-related notes from me (these apply to both open RFCs right now so I am pasting this note in both places):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few requests for clarifications and decisions about scope of the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few requests for clarifications and decisions about scope of the RFC.
rfcs/profiler.md
Outdated
|
||
### Benefits | ||
|
||
- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward single cell applications and can be used for any services across CZI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward single cell applications and can be used for any services across CZI. | |
- A major benefit of this design is that the profilers (both generic and custom ones) are not necessarily targeted toward SOMA and can be used for various services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beroy please make the requested change
Co-authored-by: John Kerl <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As Andrew suggested, I recommend to fill a more detailed implementation/deployment as a separate RFC or tech spec.
No description provided.