Skip to content

[MINOR][DOC]Add missing compression codec .#22068

Closed
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:nosupportlz4
Closed

[MINOR][DOC]Add missing compression codec .#22068
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:nosupportlz4

Conversation

@10110346
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Parquet file provides six codecs: "snappy", "gzip", "lzo", "lz4", "brotli", "zstd".
This pr add missing compression codec :"lz4", "brotli", "zstd" .

How was this patch tested?

N/A

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 10, 2018

Test build #94550 has finished for PR 22068 at commit 74aa80c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Aug 10, 2018

Would it be better to update a comment in DataFrameWriter.scala, too?

   * <li>`compression` (default is the value specified in `spark.sql.parquet.compression.codec`):
   * compression codec to use when saving to file. This can be one of the known case-insensitive
   * shorten names(`none`, `snappy`, `gzip`, and `lzo`). This will override
   * `spark.sql.parquet.compression.codec`.</li>

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 10, 2018

Test build #4237 has finished for PR 22068 at commit 74aa80c.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member

Yea, if there are more instances found, we better fix them together while we are here.

@10110346
Copy link
Copy Markdown
Contributor Author

I will update,thanks @kiszk @HyukjinKwon

@kiszk
Copy link
Copy Markdown
Member

kiszk commented Aug 11, 2018

Thanks. BTW, I found another instance in test, not in doc.

Do we address this in this PR? Or, do we address in another PR?
@HyukjinKwon WDYT ?

class ParquetCompressionCodecPrecedenceSuite extends ParquetTest with SharedSQLContext {
  test("Test `spark.sql.parquet.compression.codec` config") {
    Seq("NONE", "UNCOMPRESSED", "SNAPPY", "GZIP", "LZO").foreach { c =>
      withSQLConf(SQLConf.PARQUET_COMPRESSION.key -> c) {
        val expected = if (c == "NONE") "UNCOMPRESSED" else c
        val option = new ParquetOptions(Map.empty[String, String], spark.sessionState.conf)
        assert(option.compressionCodecClassName == expected)
      }
    }
  }

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 11, 2018

Test build #94593 has finished for PR 22068 at commit 59f6080.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member

Eh, I think it's okay. Let's just get this in.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 11, 2018

Test build #94594 has finished for PR 22068 at commit 5b4562a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346
Copy link
Copy Markdown
Contributor Author

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 11, 2018

Test build #94605 has finished for PR 22068 at commit 5b4562a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

@asfgit asfgit closed this in 4b11d90 Aug 11, 2018
asfgit pushed a commit that referenced this pull request Aug 11, 2018
## What changes were proposed in this pull request?

Parquet file provides six codecs: "snappy", "gzip", "lzo", "lz4", "brotli", "zstd".
This pr add missing compression codec :"lz4", "brotli", "zstd" .
## How was this patch tested?
N/A

Closes #22068 from 10110346/nosupportlz4.

Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
## What changes were proposed in this pull request?

Parquet file provides six codecs: "snappy", "gzip", "lzo", "lz4", "brotli", "zstd".
This pr add missing compression codec :"lz4", "brotli", "zstd" .
## How was this patch tested?
N/A

Closes apache#22068 from 10110346/nosupportlz4.

Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants