Skip to content

Commit

Permalink
PARQUET-107: Add option to disable summary metadata.
Browse files Browse the repository at this point in the history
This adds an option to the commitJob phase of the MR OutputCommitter,
parquet.enable.summary-metadata (default true), that can be used to
disable the summary metadata files generated from the footers of all of
the files produced. This enables more control over when those summary
files are produced and makes it possible to rename MR outputs and then
generate the summaries.

Author: Ryan Blue <[email protected]>

Closes #68 from rdblue/PARQUET-107-add-summary-metadata-option and squashes the following commits:

261e5e4 [Ryan Blue] PARQUET-107: Add option to disable summary metadata.
  • Loading branch information
rdblue authored and julienledem committed Oct 1, 2014
1 parent da91299 commit be1222e
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -41,22 +41,24 @@ public ParquetOutputCommitter(Path outputPath, TaskAttemptContext context) throw

public void commitJob(JobContext jobContext) throws IOException {
super.commitJob(jobContext);
try {
Configuration configuration = ContextUtil.getConfiguration(jobContext);
final FileSystem fileSystem = outputPath.getFileSystem(configuration);
FileStatus outputStatus = fileSystem.getFileStatus(outputPath);
List<Footer> footers = ParquetFileReader.readAllFootersInParallel(configuration, outputStatus);
Configuration configuration = ContextUtil.getConfiguration(jobContext);
if (configuration.getBoolean(ParquetOutputFormat.ENABLE_JOB_SUMMARY, true)) {
try {
ParquetFileWriter.writeMetadataFile(configuration, outputPath, footers);
final FileSystem fileSystem = outputPath.getFileSystem(configuration);
FileStatus outputStatus = fileSystem.getFileStatus(outputPath);
List<Footer> footers = ParquetFileReader.readAllFootersInParallel(configuration, outputStatus);
try {
ParquetFileWriter.writeMetadataFile(configuration, outputPath, footers);
} catch (Exception e) {
LOG.warn("could not write summary file for " + outputPath, e);
final Path metadataPath = new Path(outputPath, ParquetFileWriter.PARQUET_METADATA_FILE);
if (fileSystem.exists(metadataPath)) {
fileSystem.delete(metadataPath, true);
}
}
} catch (Exception e) {
LOG.warn("could not write summary file for " + outputPath, e);
final Path metadataPath = new Path(outputPath, ParquetFileWriter.PARQUET_METADATA_FILE);
if (fileSystem.exists(metadataPath)) {
fileSystem.delete(metadataPath, true);
}
}
} catch (Exception e) {
LOG.warn("could not write summary file for " + outputPath, e);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@
*
* # To enable/disable dictionary encoding
* parquet.enable.dictionary=true # false to disable dictionary encoding
*
* # To enable/disable summary metadata aggregation at the end of a MR job
* # The default is true (enabled)
* parquet.enable.summary-metadata=true # false to disable summary aggregation
* </pre>
*
* If parquet.compression is not set, the following properties are checked (FileOutputFormat behavior).
Expand All @@ -99,6 +103,7 @@ public class ParquetOutputFormat<T> extends FileOutputFormat<Void, T> {
public static final String ENABLE_DICTIONARY = "parquet.enable.dictionary";
public static final String VALIDATION = "parquet.validation";
public static final String WRITER_VERSION = "parquet.writer.version";
public static final String ENABLE_JOB_SUMMARY = "parquet.enable.summary-metadata";

public static void setWriteSupportClass(Job job, Class<?> writeSupportClass) {
getConfiguration(job).set(WRITE_SUPPORT_CLASS, writeSupportClass.getName());
Expand Down

0 comments on commit be1222e

Please sign in to comment.