-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance #1443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @elek for sharing these scripts. I tried a few, and they work fine. I also tried using both read and write scripts at the same time, which needs a few name changes.
dev-support/byteman/ratis-flush.btm
Outdated
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # Turn of flush for Raits log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be leftover comment from ratis-no-flush.btm.
|
|
||
| ENDRULE | ||
|
|
||
| RULE BlockOutputStream.watchForCommit.Entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume one would not use both watchforcommit scripts at the same time. If we did, rule name would need to be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the chance is low, but let me add the same prefix to here, just to be on the safe side...
Co-authored-by: Doroszlai, Attila <[email protected]>
|
Thanks @elek for updating the patch. |
* master: HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)
* HDDS-4122-remove-code-consolidation: (21 commits) Restore files that had deduplicated code from master Revert other delete request/response files back to their original states on master HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) ...
I am using https://byteman.jboss.org to debug the performance of spark + teragen with different scripts. Some byteman scripts are already shared by HDDS-4095 or HDDS-342 but it seems to be a good practice to share the newer scripts to make it possible to reproduce performance problems.
For using byteman with Ozone, see this video:
https://www.youtube.com/watch?v=_4eYsH8F50E&list=PLCaV-jpCBO8U_WqyySszmbmnL-dhlzF6o&index=5
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4270
How was this patch tested?
I used them during real k8s deployments:
https://github.com/elek/ozone-perf-env