-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-2660. Create insight point for datanode container protocol #1272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @elek for implementing this (and creating the video, too).
In my experience:
$ ozone insight log datanode.dispatcher -f pipeline=...
$ ozone insight log -v datanode.dispatcher -f pipeline=...
works, but the second command by itself does not. I can see the TRACE level messages in datanode logs, but not in the terminal where insight is run. No idea why.
| addProtocolMessageMetrics(metrics, "hdds_dispatcher", | ||
| Type.SCM, ScmBlockLocationProtocolProtos.Type.values()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be leftover from ScmProtocolBlockLocationInsight. I think it should be something like Type.DATANODE, ContainerProtos.Type.values() instead. (BaseInsightPoint needs to be tweaked to allow it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something which couldn't be easily fixed. Type.DATANODE is not supported for metrics. I can throw and UnsupportedOperationException instead of showing the metrics.
|
|
||
| @Override | ||
| public boolean filterLog(Map<String, String> filters, String logLine) { | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why this is needed? I think this may cause logs from other pipelines to be printed.
For example, I ran ozone insight logs on a 3-node pipeline, then:
$ ozone sh key put -r ONE /vol1/bucket1/passwd /etc/passwd
and the logs from the single datanode to which this block was written (part of the 3-node pipeline, too) appeared in the console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two parts of the filtering:
- where to connect
- what to display (based on
[...]text from the log
Unfortunately the pipeline id is not available in the HddsDispatcher, therefore we couldn't filter out the other pipelines (and we need this code segment to show all the lines)
...nsight/src/main/java/org/apache/hadoop/ozone/insight/datanode/DatanodeDispatcherInsight.java
Outdated
Show resolved
Hide resolved
|
Closing this pr temporary. This is on my list but today, it's pending. I will improve the insight first to support multiple nodes, or making the datanode insight is to be a one-node insight point... |
|
Both of the open questions are answered with an improvement: instead of using |
|
I think the new approach addressed all the earlier problems. Can we merge it now? |
|
Thanks @elek for updating the patch. Interestingly |
I double-checked and it worked for me. When you use leader, the messages are displayed immediately, when you use follower, the messages will be appeared only after the commit... |
It seems to depend on the content. Plain text files work fine, but it stops working on the first binary, eg. I think we should avoid logging chunk content. |
I checked but it's not something which can be added easily. The And would also introduce significant overhead in case of tracing is turned on. I am not sure if it's worth it, but I can add it if you think so. |
Thanks for checking. I'm fine with doing it as a follow-up (created HDDS-4271). |
* master: HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)
* HDDS-4122-remove-code-consolidation: (21 commits) Restore files that had deduplicated code from master Revert other delete request/response files back to their original states on master HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) ...
You were right, you did it easily ;-) |
What changes were proposed in this pull request?
The goal of this task is to create a new insight point for the datanode container protocol (HddsDispatcher) to be able to debug client<->datanode communication.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-2660
How was this patch tested?
Manually. See the video:
https://www.youtube.com/watch?v=msQgfF95ivc&list=PLCaV-jpCBO8UK5Ged2A_iv3eHuozzMsYv&index=7&t=0s
;-)