Skip to content

Releases: GoogleCloudDataproc/hadoop-connectors

2018-03-19 (GCS 1.6.4, BQ 0.10.5)

19 Mar 21:50
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Fixed an issue where JSON auth files containing user auth (e.g. application_default_credentials.json) does not work with google.cloud.auth.service.account.json.keyfile.

  2. Honor GOOGLE_APPLICATION_DEFAULT_CREDENTIALS environment variable. For Google Application Default Credentials (but not other defaults).

  3. Make fs.gs.project.id optional. It is still required for listing buckets, creating buckets, and entire BigQuery connector.

  4. Disable GCS Metadata Cache by default (e.g. set default value of fs.gs.metadata.cache.enable property to false).

  5. Support GCS Requester Pays feature that could be configured with new properties:

    fs.gs.requester.pays.mode (default=DISABLED)
    fs.gs.requester.pays.project.id (no default value)
    fs.gs.requester.pays.buckets (no default value)
    
  6. Add support for specifying marker files pattern that should be copied last during folder rename operation. Pattern is configured with property:

    fs.gs.marker.file.pattern (no default value)
    

BigQuery connector:

  1. POM updates for GCS connector 1.6.4.
  2. Remove Avro and Gson classes from Hadoop 2 shaded jar because they are already included in the Hadoop 2 distribution.

2018-03-15 (GCS 1.8.0, BQ 0.12.0)

16 Mar 13:36
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Support GCS Requester Pays feature that could be configured with new properties:

    fs.gs.requester.pays.mode (default=DISABLED)
    fs.gs.requester.pays.project.id (no default value)
    fs.gs.requester.pays.buckets (no default value)
    
  2. Change relocation package in shaded jar to be connector-specific.

  3. Add support for specifying marker files pattern that should be copied last during folder rename operation. Pattern is configured with property:

    fs.gs.marker.file.pattern (no default value)
    
  4. Min required Java version now is Java 8.

BigQuery connector:

  1. POM updates for GCS connector 1.8.0.
  2. Change relocation package in shaded jar to be connector-specific.
  3. Min required Java version now is Java 8.

2018-02-22 (GCS 1.7.0, BQ 0.11.0)

23 Feb 14:08
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Fixed an issue where JSON auth files containing user auth (e.g. application_default_credentials.json) does not work with google.cloud.auth.service.account.json.keyfile.
  2. Honor GOOGLE_APPLICATION_DEFAULT_CREDENTIALS environment variable. For Google Application Default Credentials (but not other defaults).
  3. Make fs.gs.project.id optional. It is still required for listing buckets, creating buckets, and entire BigQuery connector.
  4. Relocate all dependencies in shaded jar.
  5. Update all dependencies to latest versions.
  6. Disable GCS Metadata Cache by default (e.g. set default value of fs.gs.metadata.cache.enable property to false).

BigQuery connector:

  1. Relocate all dependencies in shaded jar.
  2. Update all dependencies to latest versions.
  3. POM updates for GCS connector 1.7.0.

2018-01-25 (GCS 1.6.3, BQ 0.10.4)

25 Jan 21:14
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Use new GCS batch requests endpoint.

BigQuery connector:

  1. POM updates for GCS connector 1.6.3.

2017-11-21 (GCS 1.6.2, BQ 0.10.3)

22 Nov 21:11
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Wire HTTP transport settings into Credential logic.

BigQuery connector:

  1. POM updates for GCS connector 1.6.2.

2017-04-20 (GCS 1.6.1, BQ 0.10.2)

21 Apr 02:10
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Added a polling loop when determining if a createEmptyObjects error can safely be ignored and expanded the cases in which we will attempt to determine if an empty object already exists.

    Previously, if a rate limiting exception was encountered while creating empty objects the connector would issue a single get request for that object. If the object exists and is zero length we would consider the createEmptyObjects call successful and suppress the rate limit exception.

    The new implementation will poll for the existence of the object, up to a user-configurable maximum, and will poll when either a rate limiting error occurs or when a 500-level error occurs. The maximum can be configured by the following setting:

    fs.gs.max.wait.for.empty.object.creation.ms
    

    Any positive value for this setting will be interpreted to mean "poll for up to this many milliseconds before making a final determination". The default value will cause a maximum wait of 3 seconds. Polling can be disabled by setting this key to 0.

BigQuery connector:

  1. POM updates for GCS connector 1.6.1.

2016-12-16 (GCS 1.6.0, BigQuery 0.10.1)

17 Dec 04:06
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Added new PerformanceCachingGoogleCloudStorage; unlike the existing CacheSupplementedGoogleCloudStorage which only serves as an advisory cache for enforcement of list consistency, the new optional caching layer is able to serving certain metadata and listing requests purely out of a short-lived in-memory cache to enhance performance of some workloads. By default this feature is disabled, and can be controlled with the config settings:

    fs.gs.performance.cache.enable=true (default=false)
    fs.gs.performance.cache.list.caching.enable=true (default=false)
    

    The first option enables the cache to serve getFileStatus requests, while the second option additionally enables serving listStatus. The duration of cache entries can be controlled with:

    fs.gs.performance.cache.max.entry.age.ms (default=3000)
    

    It is not recommended to always run with this feature enabled; it should be used specifically to address cases where frameworks perform redundant sequential list/stat operations in a non-distributed manner, and on datasets which are not frequently changing. It is additionally advised to validate data integrity separately whenever using this feature. There is no cooperative cache invalidation between different processes when using this feature, so concurrent mutations to a location from multiple clients will produce inconsistent/stale results if this feature is enabled.

BigQuery connector:

  1. Added a configurable write disposition when using IndirectBigQueryOutputFormat with WRITE_APPEND as the default.
  2. POM updates for GCS connector 1.6.0.

2016-11-07 (GCS 1.5.5, BigQuery 0.10.0)

08 Nov 00:01
Compare
Choose a tag to compare

Changelog

Cloud Storage connector:

  1. Minor refactoring of logic in CacheSupplementedGoogleCloudStorage to extract a reusable ForwardingGoogleCloudStorage that can be used for other GCS-delegating implementations.

BigQuery connector:

  1. Update output configuration keys to conform to the format in BigQueryConfiguration and have BigQueryOutputConfiguration handle the output path resolution and configuration.
  2. POM updates for GCS connector 1.5.5.

2016-08-23 (GCS 1.5.2, BigQuery 0.7.8)

28 Aug 00:45
Compare
Choose a tag to compare

Changelog

Google Cloud Storage connector:

  1. Updated AbstractGoogleAsyncWriteChannel to always set the X-Goog-Upload-Desired-Chunk-Granularity header independently from the deprecated X-Goog-Upload-Max-Raw-Size; in general this improves performance of large uploads.

BigQuery connector:

  1. POM updates for GCS connector 1.5.2.