Add peak memory distribution among tasks of each stage#13379
Add peak memory distribution among tasks of each stage#13379arhimondr merged 3 commits intoprestodb:masterfrom
Conversation
83e8cf1 to
f85def4
Compare
aweisberg
left a comment
There was a problem hiding this comment.
One small nit about TaskDistribution constructor. Otherwise it makes sense to me.
There was a problem hiding this comment.
Just curious, why reduce precision?
There was a problem hiding this comment.
For current purpose of using this class for either memory bytes or cpu time, float precision is enough.
There was a problem hiding this comment.
Changed it back double to avoid confusion
There was a problem hiding this comment.
Constructing a task distribution is very verbose and a lot of this stuff comes from a DistributionSnapshot, can you update the TaskDistributionConstructor to reduce some of this duplication by working on the DistributionSnapshot directly?
There was a problem hiding this comment.
This is intended to support serialization into json string. There is downstream dependency on structure of this class, e.g. visualization of distribution.
mbasmanova
left a comment
There was a problem hiding this comment.
Add peak total memory of a task to TaskStats
- Some questions.
- I'd update commit message to
Add peak total memory to TaskStats
There was a problem hiding this comment.
- cumulativeUserMemory is stored as double; peakTotalMemoryInBytes seems similar, but it stored as long; why is the discrepancy?
- peakTotalMemoryInBytes name is not consistent with cumulativeUserMemory; shouldn't it be peakTotalMemory?
There was a problem hiding this comment.
- cumulativeUserMemory uses double as its unit is Bytes * Time, this explains the difference of names
There was a problem hiding this comment.
@viczhang861 Vic, thanks for explaining. It makes sense now.
There was a problem hiding this comment.
Looking at cumulativeUserMemory, I'd expect
@JsonProperty("peakTotalMemory") double peakTotalMemory,
There was a problem hiding this comment.
Or peakTotalMemoryReservation and use DataSize as type? No strong opinion here.
mbasmanova
left a comment
There was a problem hiding this comment.
Rename StageCpuDistribution to TaskDistribution
There was a problem hiding this comment.
I'm wondering if it would be useful to keep Stage in the name. The data is the distribution of task resource usage within a stage, right? StageResourceDistribution maybe?
There was a problem hiding this comment.
Updated to ResourceDistribution to avoid confusion
There was a problem hiding this comment.
This is unrelated change and therefore should go into separate commit. My preference would be to keep double though. Primarily to avoid having other readers wonder why float.
mbasmanova
left a comment
There was a problem hiding this comment.
Add peak memory distribution among tasks of each stage
Looks good.
There was a problem hiding this comment.
nit: perhaps, computeResourceDistributions or computeCpuAndMemoryDistributions
|
This is all fine by me. I tested it and the data looks right.
From: Maria Basmanova <notifications@github.com>
Sent: Friday, September 13, 2019 10:31 AM
To: prestodb/presto <presto@noreply.github.com>
Cc: oerling <erling@xs4all.nl>; Review requested <review_requested@noreply.github.com>
Subject: Re: [prestodb/presto] Add peak memory distribution among tasks of each stage (#13379)
@mbasmanova <https://github.com/mbasmanova> requested review from @prestodb/aria on: #13379 <#13379> Add peak memory distribution among tasks of each stage.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub <#13379?email_source=notifications&email_token=AKPPPT6ZQAE6R227Y7DZ3ODQJPE4LA5CNFSM4IVPFQH2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOTTYTCNA#event-2633052468> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AKPPPTZOKG5KYYWQVBMVKSDQJPE4LANCNFSM4IVPFQHQ> .
|
f85def4 to
50f82b9
Compare
There was a problem hiding this comment.
Or peakTotalMemoryReservation and use DataSize as type? No strong opinion here.
There was a problem hiding this comment.
What about taskPeakMemoryDistribution? No strong opinion since peakMemoryDistribution is parallel to cpuTimeDistribution
50f82b9 to
dab55fb
Compare
This class can be used for either memory or cpu distribution of tasks in a stage.
dab55fb to
dba96e9
Compare
Resolves #13327