Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion jmh.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ configure(jmhProjects) {
// Path is relative to either spark2 or spark3 folder, depending on project being tested
sourceSets {
jmh {
java.srcDirs = ['src/jmh/java', '../spark/src/jmh/java']
java.srcDirs = ['src/jmh/java', '../../../spark/src/jmh/java']
compileClasspath += sourceSets.main.runtimeClasspath
}
}
Expand Down
36 changes: 18 additions & 18 deletions site/docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,89 +26,89 @@ Also note that JMH benchmarks run within the same JVM as the system-under-test,
### IcebergSourceNestedListParquetDataWriteBenchmark
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`

### SparkParquetReadersNestedDataBenchmark
A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt`

### SparkParquetWritersFlatDataBenchmark
A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt`

### IcebergSourceFlatORCDataReadBenchmark
A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt`

### SparkParquetReadersFlatDataBenchmark
A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt`

### VectorizedReadDictionaryEncodedFlatParquetDataBenchmark
A benchmark to compare performance of reading Parquet dictionary encoded data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt`

### IcebergSourceNestedListORCDataWriteBenchmark
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt`

### VectorizedReadFlatParquetDataBenchmark
A benchmark to compare performance of reading Parquet data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt`

### IcebergSourceFlatParquetDataWriteBenchmark
A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt`

### IcebergSourceNestedAvroDataReadBenchmark
A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt`

### IcebergSourceFlatAvroDataReadBenchmark
A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt`

### IcebergSourceNestedParquetDataWriteBenchmark
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt`

### IcebergSourceNestedParquetDataReadBenchmark
* A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

` ./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt`
` ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt`

### IcebergSourceNestedORCDataReadBenchmark
A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt`

### IcebergSourceFlatParquetDataReadBenchmark
A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt`

### IcebergSourceFlatParquetDataFilterBenchmark
A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with a flat schema, where the records are clustered according to the
column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt`

### IcebergSourceNestedParquetDataFilterBenchmark
A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with nested data, where the records are clustered according to the
column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:
`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt`

### SparkParquetWritersNestedDataBenchmark
* A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3:
`./gradlew :iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt`
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark
* -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark
* -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark
* -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark
* -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=AvroWritersBenchmark
* -PjmhOutputPath=benchmark/avro-writers-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark
* -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
*
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=ParquetWritersBenchmark
* -PjmhOutputPath=benchmark/parquet-writers-benchmark-result.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
* <p>
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark
* -PjmhOutputPath=benchmark/results.txt
* </code>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
* <p>
* To run this benchmark for either spark-2 or spark-3:
* <code>
* ./gradlew :iceberg-spark[2|3]:jmh
* ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh
* -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark
* -PjmhOutputPath=benchmark/results.txt
* </code>
Expand Down