-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Time-Specific Metrics #819
Conversation
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GhfsTimeStatistic.java
Outdated
Show resolved
Hide resolved
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GhfsTimeStatistic.java
Outdated
Show resolved
Hide resolved
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GhfsTimeStatistic.java
Outdated
Show resolved
Hide resolved
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GhfsTimeStatistic.java
Outdated
Show resolved
Hide resolved
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFSInputStream.java
Outdated
Show resolved
Hide resolved
|
||
/** Time Statistics which are collected in GCS. */ | ||
@InterfaceStability.Unstable | ||
public enum GhfsTimeStatistic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure creating a mutable enum is the right thing to do. Why not create class instead? Also having this singleton make the testing harder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the plugin GhfsTimeStatisticPlugin, I had references to loop over various metrics. This is the reason why mutable enum was used. Also by class, do you mean an implementation like https://github.com/LucaCanali/hadoop/blob/s3aAndHDFSTimeInstrumentation/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ATimeInstrumentation.java ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mutable enums are an anti-pattern since it is sort of like global variables, which are hard to test etc. I was curious why can't we achieve the same functionality we need by using Java class instead of using Java Enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having enum class makes the metrics look more organized in my opinion. I agree with mutable enum issue so I have mapped them with AtomicLong and used it to store the metric's value. Does that seem like a good alternative ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is practically a global mutable variable and testing this will be difficult. Hence I do not think this is the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a bug for this and assign to me. Also add a TODO comment in the code with the bug number.
@@ -496,14 +496,21 @@ public String getScheme() { | |||
|
|||
@Override | |||
public FSDataInputStream open(Path hadoopPath, int bufferSize) throws IOException { | |||
long startTime = System.nanoTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not create a lambda for timing the methods, instead of duplicating the time duration tracking logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unable to understand on how to use lambda function here. In best case scenario, there will be addition of 2 lines (for startTime and endTime) in each function along with try-finally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have shared a sample code with you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean, need to wrap this entire method call inside lambda? or export existing contents of method to another method and call that from lambda?
Clearly cannot wrap the public method call itself!
/gcbrun |
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GhfsTimeStatistic.java
Outdated
Show resolved
Hide resolved
@@ -496,14 +496,21 @@ public String getScheme() { | |||
|
|||
@Override | |||
public FSDataInputStream open(Path hadoopPath, int bufferSize) throws IOException { | |||
long startTime = System.nanoTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have shared a sample code with you
|
||
/** Time Statistics capturing total time taken by a function in micro seconds */ | ||
@InterfaceStability.Unstable | ||
public enum GhfsTimeStatistic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any plans to capture read/write time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that is captures in hflush, hsync and close statistics from GoogleHadoopOutputStream. If not, can you point to specific functions ?
|
||
/** Time Statistics which are collected in GCS. */ | ||
@InterfaceStability.Unstable | ||
public enum GhfsTimeStatistic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is practically a global mutable variable and testing this will be difficult. Hence I do not think this is the right thing to do.
public void hsync_increment_timeStatistics() throws IOException { | ||
Path objectPath = new Path(ghfs.getUri().resolve("/dir/object2.txt")); | ||
FileSystem.Statistics statistics = new FileSystem.Statistics(ghfs.getScheme()); | ||
GoogleHadoopOutputStream fout = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider reducing code duplication in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a single function to test all 3 statistics
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFSInputStream.java
Outdated
Show resolved
Hide resolved
gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoopFileSystem.java
Outdated
Show resolved
Hide resolved
instrumentation.fileCreated(); | ||
return response; | ||
} finally { | ||
GhfsTimeStatistic.incrementTimeElapsedMicrosec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that these stats should be FS instance-specific, not global for JVM.
* | ||
* @return current map of minimums | ||
*/ | ||
private Map<String, Long> minimums() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please move private methods below public methods.
|
||
/** Time Statistics which are collected in GCS. */ | ||
@InterfaceStability.Unstable | ||
public enum GhfsTimeStatistic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please open a bug for this and assign to me. Also add a TODO comment in the code with the bug number.
/gcbrun |
Original PR: GoogleCloudDataproc#819
* Added Time-Specific Metrics Original PR: #819 Co-authored-by: guljain <[email protected]>
Added time-related metrics.
Functions chosen are -