Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions docs/sql-data-sources-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,19 @@ Data source options of Avro can be set via:
</td>
<td>function <code>from_avro</code></td>
</tr>
<tr>
<td><code>datetimeRebaseMode</code></td>
<td>The SQL config <code>spark.sql.avro</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
<td>The <code>datetimeRebaseMode</code> option allows to specify the rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Julian to Proleptic Gregorian calendar.<br>
Currently supported modes are:
<ul>
<li><code>EXCEPTION</code>: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: loads dates/timestamps without rebasing.</li>
<li><code>LEGACY</code>: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.</li>
</ul>
</td>
<td>read and function <code>from_avro</code></td>
</tr>
</table>

## Configuration
Expand Down Expand Up @@ -317,6 +330,31 @@ Configuration of Avro can be done using the `setConf` method on SparkSession or
</td>
<td>2.4.0</td>
</tr>
<tr>
<td>spark.sql.avro.datetimeRebaseModeInRead</td>
<td><code>EXCEPTION</code></td>
<td>The rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Julian to Proleptic Gregorian calendar:<br>
<ul>
<li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Avro files.</li>
</ul>
This config is only effective if the writer info (like Spark, Hive) of the Avro files is unknown.
</td>
<td>3.0.0</td>
</tr>
<tr>
<td>spark.sql.avro.datetimeRebaseModeInWrite</td>
<td><code>EXCEPTION</code></td>
<td>The rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Proleptic Gregorian to Julian calendar:<br>
<ul>
<li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Avro files.</li>
</ul>
</td>
<td>3.0.0</td>
</tr>
</table>

## Compatibility with Databricks spark-avro
Expand Down
86 changes: 86 additions & 0 deletions docs/sql-data-sources-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,42 @@ REFRESH TABLE my_table;

</div>

## Data Source Option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to list other datasource options here? If not, I would write it as another section like Rebasing Datetime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, other options should be here.


Data source options of Parquet can be set via:
* the `.option`/`.options` methods of `DataFrameReader` or `DataFrameWriter`
* the `.option`/`.options` methods of `DataStreamReader` or `DataStreamWriter`

<table class="table">
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
<tr>
<td><code>datetimeRebaseMode</code></td>
<td>The SQL config <code>spark.sql.parquet</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
<td>The <code>datetimeRebaseMode</code> option allows to specify the rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Julian to Proleptic Gregorian calendar.<br>
Currently supported modes are:
<ul>
<li><code>EXCEPTION</code>: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: loads dates/timestamps without rebasing.</li>
<li><code>LEGACY</code>: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.</li>
</ul>
</td>
<td>read</td>
</tr>
<tr>
<td><code>int96RebaseMode</code></td>
<td>The SQL config <code>spark.sql.parquet</code> <code>.int96RebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
<td>The <code>int96RebaseMode</code> option allows to specify the rebasing mode for INT96 timestamps from the Julian to Proleptic Gregorian calendar.<br>
Currently supported modes are:
<ul>
<li><code>EXCEPTION</code>: fails in reads of ancient INT96 timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: loads INT96 timestamps without rebasing.</li>
<li><code>LEGACY</code>: performs rebasing of ancient timestamps from the Julian to Proleptic Gregorian calendar.</li>
</ul>
</td>
<td>read</td>
</tr>
</table>

### Configuration

Configuration of Parquet can be done using the `setConf` method on `SparkSession` or by running
Expand Down Expand Up @@ -329,4 +365,54 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
</td>
<td>1.6.0</td>
</tr>
<tr>
<td>spark.sql.parquet.datetimeRebaseModeInRead</td>
<td><code>EXCEPTION</code></td>
<td>The rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Julian to Proleptic Gregorian calendar:<br>
<ul>
<li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files.</li>
</ul>
This config is only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.
</td>
<td>3.0.0</td>
</tr>
<tr>
<td>spark.sql.parquet.datetimeRebaseModeInWrite</td>
<td><code>EXCEPTION</code></td>
<td>The rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Proleptic Gregorian to Julian calendar:<br>
<ul>
<li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files.</li>
</ul>
</td>
<td>3.0.0</td>
</tr>
<tr>
<td>spark.sql.parquet.int96RebaseModeInRead</td>
<td><code>EXCEPTION</code></td>
<td>The rebasing mode for the values of the <code>INT96</code> timestamp type from the Julian to Proleptic Gregorian calendar:<br>
<ul>
<li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient INT96 timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
<li><code>LEGACY</code>: Spark will rebase INT96 timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files.</li>
</ul>
This config is only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.
</td>
<td>3.1.0</td>
</tr>
<tr>
<td>spark.sql.parquet.int96RebaseModeInWrite</td>
<td><code>EXCEPTION</code></td>
<td>The rebasing mode for the values of the <code>INT96</code> timestamp type from the Proleptic Gregorian to Julian calendar:<br>
<ul>
<li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient timestamps that are ambiguous between the two calendars.</li>
<li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
<li><code>LEGACY</code>: Spark will rebase INT96 timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files.</li>
</ul>
</td>
<td>3.1.0</td>
</tr>
</table>