Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions website/docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,6 @@ Options useful for writing tables via `write.format.option(...)`
| [hoodie.datasource.write.insert.drop.duplicates](#hoodiedatasourcewriteinsertdropduplicates) | false | If set to true, records from the incoming dataframe will not overwrite existing records with the same key during the write operation. <br /> **Note** Just for Insert operation in Spark SQL writing since 0.14.0, users can switch to the config `hoodie.datasource.insert.dup.policy` instead for a simplified duplicate handling experience. The new config will be incorporated into all other writing flows and this config will be fully deprecated in future releases.<br />`Config Param: INSERT_DROP_DUPS` |
| [hoodie.datasource.write.keygenerator.class](#hoodiedatasourcewritekeygeneratorclass) | org.apache.hudi.keygen.SimpleKeyGenerator | Key generator class, that implements `org.apache.hudi.keygen.KeyGenerator`<br />`Config Param: KEYGENERATOR_CLASS_NAME` |
| [hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled](#hoodiedatasourcewritekeygeneratorconsistentlogicaltimestampenabled) | false | When set to true, consistent value will be generated for a logical timestamp type column, like timestamp-millis and timestamp-micros, irrespective of whether row-writer is enabled. Disabled by default so as not to break the pipeline that deploy either fully row-writer path or non row-writer path. For example, if it is kept disabled then record key of timestamp type with value `2016-12-29 09:54:00` will be written as timestamp `2016-12-29 09:54:00.0` in row-writer path, while it will be written as long value `1483023240000000` in non row-writer path. If enabled, then the timestamp value will be written in both the cases.<br />`Config Param: KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED` |
| [hoodie.datasource.write.new.columns.nullable](#hoodiedatasourcewritenewcolumnsnullable) | false | When a non-nullable column is added to datasource during a write operation, the write operation will fail schema compatibility check. Set this option to true will make the newly added column nullable to successfully complete the write operation.<br />`Config Param: MAKE_NEW_COLUMNS_NULLABLE`<br />`Since Version: 0.14.0` |
| [hoodie.datasource.write.partitionpath.urlencode](#hoodiedatasourcewritepartitionpathurlencode) | false | Should we url encode the partition path value, before creating the folder structure.<br />`Config Param: URL_ENCODE_PARTITIONING` |
| [hoodie.datasource.write.payload.class](#hoodiedatasourcewritepayloadclass) | org.apache.hudi.common.model.OverwriteWithLatestAvroPayload | Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting. This will render any value set for PRECOMBINE_FIELD_OPT_VAL in-effective<br />`Config Param: PAYLOAD_CLASS_NAME` |
| [hoodie.datasource.write.reconcile.schema](#hoodiedatasourcewritereconcileschema) | false | This config controls how writer's schema will be selected based on the incoming batch's schema as well as existing table's one. When schema reconciliation is DISABLED, incoming batch's schema will be picked as a writer-schema (therefore updating table's schema). When schema reconciliation is ENABLED, writer-schema will be picked such that table's schema (after txn) is either kept the same or extended, meaning that we'll always prefer the schema that either adds new columns or stays the same. This enables us, to always extend the table's schema during evolution and never lose the data (when, for ex, existing column is being dropped in a new batch)<br />`Config Param: RECONCILE_SCHEMA` |
Expand Down Expand Up @@ -359,7 +358,6 @@ The following set of configurations are common across Hudi.
| [hoodie.memory.compaction.max.size](#hoodiememorycompactionmaxsize) | (N/A) | Maximum amount of memory used in bytes for compaction operations in bytes , before spilling to local storage.<br />`Config Param: MAX_MEMORY_FOR_COMPACTION` |
| [hoodie.common.diskmap.compression.enabled](#hoodiecommondiskmapcompressionenabled) | true | Turn on compression for BITCASK disk map used by the External Spillable Map<br />`Config Param: DISK_MAP_BITCASK_COMPRESSION_ENABLED` |
| [hoodie.common.spillable.diskmap.type](#hoodiecommonspillablediskmaptype) | BITCASK | When handling input data that cannot be held in memory, to merge with a file on storage, a spillable diskmap is employed. By default, we use a persistent hashmap based loosely on bitcask, that offers O(1) inserts, lookups. Change this to `ROCKS_DB` to prefer using rocksDB, for handling the spill.<br />`Config Param: SPILLABLE_DISK_MAP_TYPE` |
| [hoodie.datasource.write.new.columns.nullable](#hoodiedatasourcewritenewcolumnsnullable) | false | When a non-nullable column is added to datasource during a write operation, the write operation will fail schema compatibility check. Set this option to true will make the newly added column nullable to successfully complete the write operation.<br />`Config Param: MAKE_NEW_COLUMNS_NULLABLE`<br />`Since Version: 0.14.0` |
| [hoodie.datasource.write.reconcile.schema](#hoodiedatasourcewritereconcileschema) | false | This config controls how writer's schema will be selected based on the incoming batch's schema as well as existing table's one. When schema reconciliation is DISABLED, incoming batch's schema will be picked as a writer-schema (therefore updating table's schema). When schema reconciliation is ENABLED, writer-schema will be picked such that table's schema (after txn) is either kept the same or extended, meaning that we'll always prefer the schema that either adds new columns or stays the same. This enables us, to always extend the table's schema during evolution and never lose the data (when, for ex, existing column is being dropped in a new batch)<br />`Config Param: RECONCILE_SCHEMA` |
| [hoodie.fs.atomic_creation.support](#hoodiefsatomic_creationsupport) | | This config is used to specify the file system which supports atomic file creation . atomic means that an operation either succeeds and has an effect or has fails and has no effect; now this feature is used by FileSystemLockProvider to guaranteeing that only one writer can create the lock file at a time. since some FS does not support atomic file creation (eg: S3), we decide the FileSystemLockProvider only support HDFS,local FS and View FS as default. if you want to use FileSystemLockProvider with other FS, you can set this config with the FS scheme, eg: fs1,fs2<br />`Config Param: HOODIE_FS_ATOMIC_CREATION_SUPPORT`<br />`Since Version: 0.14.0` |
| [hoodie.memory.dfs.buffer.max.size](#hoodiememorydfsbuffermaxsize) | 16777216 | Property to control the max memory in bytes for dfs input stream buffer size<br />`Config Param: MAX_DFS_STREAM_BUFFER_SIZE` |
Expand Down