@@ -177,7 +177,7 @@ cd /opt
177
177
Copy the integration tests jar into the docker container
178
178
179
179
```
180
- docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.8 .0-SNAPSHOT.jar adhoc-2:/opt
180
+ docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.10 .0-SNAPSHOT.jar adhoc-2:/opt
181
181
```
182
182
183
183
```
@@ -214,21 +214,29 @@ spark-submit \
214
214
--conf spark.network.timeout=600s \
215
215
--conf spark.yarn.max.executor.failures=10 \
216
216
--conf spark.sql.catalogImplementation=hive \
217
+ --conf spark.driver.extraClassPath=/var/demo/jars/* \
218
+ --conf spark.executor.extraClassPath=/var/demo/jars/* \
217
219
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob \
218
- /opt/hudi-integ-test-bundle-0.8 .0-SNAPSHOT.jar \
220
+ /opt/hudi-integ-test-bundle-0.10 .0-SNAPSHOT.jar \
219
221
--source-ordering-field test_suite_source_ordering_field \
220
222
--use-deltastreamer \
221
223
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output \
222
224
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input \
223
225
--target-table table1 \
224
226
--props file:/var/hoodie/ws/docker/demo/config/test-suite/test.properties \
225
- --schemaprovider-class org.apache.hudi.utilities. schema.FilebasedSchemaProvider \
227
+ --schemaprovider-class org.apache.hudi.integ.testsuite. schema.TestSuiteFileBasedSchemaProvider \
226
228
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
227
229
--input-file-size 125829120 \
228
230
--workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml \
229
231
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
230
232
--table-type COPY_ON_WRITE \
231
- --compact-scheduling-minshare 1
233
+ --compact-scheduling-minshare 1 \
234
+ --hoodie-conf hoodie.metrics.on=true \
235
+ --hoodie-conf hoodie.metrics.reporter.type=GRAPHITE \
236
+ --hoodie-conf hoodie.metrics.graphite.host=graphite \
237
+ --hoodie-conf hoodie.metrics.graphite.port=2003 \
238
+ --clean-input \
239
+ --clean-output
232
240
```
233
241
234
242
Or a Merge-on-Read job:
@@ -253,23 +261,44 @@ spark-submit \
253
261
--conf spark.network.timeout=600s \
254
262
--conf spark.yarn.max.executor.failures=10 \
255
263
--conf spark.sql.catalogImplementation=hive \
264
+ --conf spark.driver.extraClassPath=/var/demo/jars/* \
265
+ --conf spark.executor.extraClassPath=/var/demo/jars/* \
256
266
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob \
257
- /opt/hudi-integ-test-bundle-0.8 .0-SNAPSHOT.jar \
267
+ /opt/hudi-integ-test-bundle-0.10 .0-SNAPSHOT.jar \
258
268
--source-ordering-field test_suite_source_ordering_field \
259
269
--use-deltastreamer \
260
270
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output \
261
271
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input \
262
272
--target-table table1 \
263
273
--props file:/var/hoodie/ws/docker/demo/config/test-suite/test.properties \
264
- --schemaprovider-class org.apache.hudi.utilities. schema.FilebasedSchemaProvider \
274
+ --schemaprovider-class org.apache.hudi.integ.testsuite. schema.TestSuiteFileBasedSchemaProvider \
265
275
--source-class org.apache.hudi.utilities.sources.AvroDFSSource \
266
276
--input-file-size 125829120 \
267
277
--workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-mor.yaml \
268
278
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator \
269
279
--table-type MERGE_ON_READ \
270
- --compact-scheduling-minshare 1
280
+ --compact-scheduling-minshare 1 \
281
+ --hoodie-conf hoodie.metrics.on=true \
282
+ --hoodie-conf hoodie.metrics.reporter.type=GRAPHITE \
283
+ --hoodie-conf hoodie.metrics.graphite.host=graphite \
284
+ --hoodie-conf hoodie.metrics.graphite.port=2003 \
285
+ --clean-input \
286
+ --clean-output
271
287
```
272
288
289
+ ## Visualize and inspect the hoodie metrics and performance (local)
290
+ Graphite server is already setup (and up) in ``` docker/setup_demo.sh ``` .
291
+
292
+ Open browser and access metrics at
293
+ ```
294
+ http://localhost:80
295
+ ```
296
+ Dashboard
297
+ ```
298
+ http://localhost/dashboard
299
+
300
+ ```
301
+
273
302
## Running long running test suite in Local Docker environment
274
303
275
304
For long running test suite, validation has to be done differently. Idea is to run same dag in a repeated manner for
@@ -279,12 +308,12 @@ contents both via spark datasource and hive table via spark sql engine. Hive val
279
308
If you have "ValidateDatasetNode" in your dag, do not replace hive jars as instructed above. Spark sql engine does not
280
309
go well w/ hive2* jars. So, after running docker setup, follow the below steps.
281
310
```
282
- docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.8 .0-SNAPSHOT.jar adhoc-2:/opt/
283
- docker cp demo/config/test-suite/test.properties adhoc-2:/opt/
311
+ docker cp packaging/hudi-integ-test-bundle/target/hudi-integ-test-bundle-0.10 .0-SNAPSHOT.jar adhoc-2:/opt/
312
+ docker cp docker/ demo/config/test-suite/test.properties adhoc-2:/opt/
284
313
```
285
314
Also copy your dag of interest to adhoc-2:/opt/
286
315
```
287
- docker cp demo/config/test-suite/complex-dag-cow.yaml adhoc-2:/opt/
316
+ docker cp docker/ demo/config/test-suite/complex-dag-cow.yaml adhoc-2:/opt/
288
317
```
289
318
290
319
For repeated runs, two additional configs need to be set. "dag_rounds" and "dag_intermittent_delay_mins".
@@ -428,7 +457,7 @@ spark-submit \
428
457
--conf spark.driver.extraClassPath=/var/demo/jars/* \
429
458
--conf spark.executor.extraClassPath=/var/demo/jars/* \
430
459
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob \
431
- /opt/hudi-integ-test-bundle-0.8 .0-SNAPSHOT.jar \
460
+ /opt/hudi-integ-test-bundle-0.10 .0-SNAPSHOT.jar \
432
461
--source-ordering-field test_suite_source_ordering_field \
433
462
--use-deltastreamer \
434
463
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output \
@@ -446,6 +475,14 @@ spark-submit \
446
475
--clean-output
447
476
```
448
477
478
+ If you wish to enable metrics add below properties as well
479
+ ```
480
+ --hoodie-conf hoodie.metrics.on=true \
481
+ --hoodie-conf hoodie.metrics.reporter.type=GRAPHITE \
482
+ --hoodie-conf hoodie.metrics.graphite.host=graphite \
483
+ --hoodie-conf hoodie.metrics.graphite.port=2003 \
484
+ ```
485
+
449
486
Few ready to use dags are available under docker/demo/config/test-suite/ that could give you an idea for long running
450
487
dags.
451
488
```
0 commit comments