Skip to content

Conversation

@budde
Copy link

@budde budde commented Jan 30, 2017

  • Add dependency on aws-java-sdk-sts
  • Replace SerializableAWSCredentials with new SerializableCredentialsProvider interface
  • Make KinesisReceiver take SerializableCredentialsProvider as argument and
    pass credential provider to KCL
  • Add new implementations of KinesisUtils.createStream() that take STS
    arguments
  • Make JavaKinesisStreamSuite test the entire KinesisUtils Java API
  • Update KCL/AWS SDK dependencies to 1.7.x/1.11.x

What changes were proposed in this pull request?

JIRA link with detailed description.

  • Replace SerializableAWSCredentials with new SerializableKCLAuthProvider class that takes 5 optional config params for configuring AWS auth and returns the appropriate credential provider object
  • Add new public createStream() APIs for specifying these parameters in KinesisUtils

How was this patch tested?

  • Manually tested using explicit keypair and instance profile to read data from Kinesis stream in separate account (difficult to write a test orchestrating creation and assumption of IAM roles across separate accounts)
  • Expanded JavaKinesisStreamSuite to test the entire Java API in KinesisUtils

License acknowledgement

This contribution is my original work and that I license the work to the project under the project’s open source license.

@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72163 has finished for PR 16744 at commit 4786cde.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializableKCLAuthProvider(

@budde budde force-pushed the master branch 3 times, most recently from 2401eff to 95ebd9c Compare January 30, 2017 19:47
@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72164 has finished for PR 16744 at commit 2401eff.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializableKCLAuthProvider(

@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72165 has finished for PR 16744 at commit 95ebd9c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializableKCLAuthProvider(

@budde
Copy link
Author

budde commented Jan 30, 2017

Missed the code in python/streaming that this touches. Will update PR.

@budde
Copy link
Author

budde commented Jan 30, 2017

The JIRA I opended for this issue contains further details and background. Linking to it here for good measure:

@budde
Copy link
Author

budde commented Jan 30, 2017

Also, on another note, the SerializableKCLAuthProvider class that SparkQA is identifying as a new public class is actually package private and replaced another package private class (SerializableAWSCredentials).

@SparkQA
Copy link

SparkQA commented Jan 31, 2017

Test build #72175 has finished for PR 16744 at commit 2298dd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializableKCLAuthProvider(

@budde
Copy link
Author

budde commented Jan 31, 2017

Pinging @tdas on this-- looks like you're the committer who has contributed the most to kinesis-asl.

@budde
Copy link
Author

budde commented Feb 1, 2017

Pinging @zsxwing and @srowen, additional committers who have previously reviewed kinesis-asl changes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel qualified to review the substance of this but the form looks reasonable.

What are the drawbacks, if any? any behavior change or compatibility issues to note?

@budde
Copy link
Author

budde commented Feb 1, 2017

There shouldn't be any change to behavior or compatibility when using the existing implementations of KinesisUtils.createStream(). Only drawback I can think of is this is making the createStream() API more complex by introducing an additional set of optional config values, which in turn necessitates an additional set of overridden interface implementations. I think the longer-term solution here is to introduce a builder-style API for generating Kinesis streams and eventually put the existing KinesisUtils.createStream() on the deprecation path, but I've chosen to bite the bullet and just extend createStream() further in the interest of making this a minimal change.

@budde
Copy link
Author

budde commented Feb 1, 2017

Pinging @brkyvz as well, who also appears to have reviewed kinesis-asl changes in the past

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably OK if it's not changing APIs and adding useful support, without complicating things too much. My only real question is about the new dependency and its license and whether it already existed or not

pom.xml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably an ignorant question but is this the first and only time something depends on the AWS SDK here? I know we had discussions about the Kinesis client already because its license was problematic. Didn't it already depend on the AWS SDK and is the license OK? Worth re-checking this situation, if you would, as a second set of eyes would be useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there was previously a direct dependency on the AWS SDK but it is currently getting pulled in as a transitive dependency of the Kinesis Client Library. The KCL dependencies don't include the aws-java-sdk-sts Maven artifact so we must add it as an explicit dependency in the pom.xml for kinesis-asl.

The AWS SDK is licensed under Apache 2.0

@budde
Copy link
Author

budde commented Feb 6, 2017

Amending this PR to upgrade the KCL/AWS SDK dependencies to more-current versions (1.7.3 and 1.11.76, respectively). The RegionUtils.getRegionByEndpoint() API was removed from the SDK so I've had to replace it with a simple string split method for the examples and test suites that were utilizing it.

@SparkQA
Copy link

SparkQA commented Feb 6, 2017

Test build #72465 has finished for PR 16744 at commit eb75482.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SerializableKCLAuthProvider(

@brkyvz
Copy link
Contributor

brkyvz commented Feb 6, 2017

Hi @budde, taking a look at this now. Sorry for the wait

Copy link
Contributor

@brkyvz brkyvz Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too brittle. I would rather use something more like:

RegionUtils.getRegionsForService("kinesis").find(_.getAvailableEndpoints.contains(endpoint)).getOrElse(
  throw new IllegalArgumentException(s"Couldn't find region for endpoint: $endpoint"))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Went for a quick fix but this is much nicer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this piece of code? There are too many options. Might I suggest something like:

trait SerializableAWSAuthProvider extends Serializable {
  def getProvider: AWSCredentialsProvider
}

case class BasicCredentialProvider(
    accessKeyId: String,
    secretKey: String) extends SerializableAWSAuthProvider {
  def getProvider: ...
}

case class STSCredentialsProvider(
    roleArn: String,
    sessionName: String,
    externalId: Option[String], 
    credentials: Option[BasicCredentialProvider]) ...

case object DefaultProvider extends DefaultAWSCredentialsProviderChain

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was hoping for some feedback here. I think making this an interface with split basic/STS implementations should work well. I'll give it a shot.

@budde
Copy link
Author

budde commented Feb 7, 2017

PR has been amended to reflect feedback. Thanks for taking a look, @brkyvz.

@SparkQA
Copy link

SparkQA commented Feb 7, 2017

Test build #72480 has finished for PR 16744 at commit d23365b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 7, 2017

Test build #72481 has finished for PR 16744 at commit 5823740.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move these cred providers to its own file and make this a sealed trait

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@brkyvz
Copy link
Contributor

brkyvz commented Feb 21, 2017

@budde I'm just concerned by the exponential blowoff of APIs. Here's my proposal.
For both Java and Scala, let's just add the APIs with both STS token and AWS Key pair defined versions. I'm going to comment on the PR the APIs that we should remove. Then we can add the builder pattern in the follow up just to make things easier (if we're going to add that anyway, it's unnecessary to support all complex combinations here in this PR anyway)

Copy link
Contributor

@brkyvz brkyvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to push back, but let's just keep the most comprehensive API options. We should deprecate them once we add the Builder interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this

@budde
Copy link
Author

budde commented Feb 21, 2017

@brkyvz I share your concerns around expanding this API further than necessary. I think I'm okay with this as long as we're fairly confident the builder pattern work can be merged in the same Spark release. My reluctance here is that forcing users who want to use STS to provide static credentials for the long-lived auth to STS itself will be a security regression for folks using EC2 where IAM instance profiles provide a secure way of avoiding potential problems with static credential management.

As you've pointed out though, this is less of a concern if we can deprecate the brittle KinesisUtils.createStream() API with a flexible builder class.

I'll give another look over the createStream() implementations you're suggesting we remove and push an update to this PR if I don't have any objections.

@brkyvz
Copy link
Contributor

brkyvz commented Feb 21, 2017

Can't they still use null to use the DefaultProviderChain? It's still supported, right? We're only forcing them to provide a messageHandler.

@budde
Copy link
Author

budde commented Feb 21, 2017

So, if these values are null we'll still be passing them to construct a BasicCredentialsProvider to pass as STSCredentialsProvider.longLivedCredentialsProvider. I could add a check to use DefaultCredentialsProvider if these params are null. It wouldn't be very good Scala style but perhaps this isn't much of a concern if we aren't expecting this to really be used much.

@brkyvz
Copy link
Contributor

brkyvz commented Feb 21, 2017

@budde The scaladocs mention

* @param awsAccessKeyId  AWS AccessKeyId (if null, will use DefaultAWSCredentialsProviderChain)
* @param awsSecretKey  AWS SecretKey (if null, will use DefaultAWSCredentialsProviderChain)

if that's not the case, we should make it so!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a null check here and add logging that we're falling back to DefaultProviderChain

@budde
Copy link
Author

budde commented Feb 21, 2017

@brkyvz I actually think that Scaladoc may be outdated– I double checked the current master branch and it looks like KinesisUtils.createStream() will still provide Some(SerializableAWSCredentials(null, null)) when null values are passed. The helper method returning the AWSCredentialsProvider passed to the KCL doesn't inspect the values to make sure they are non-null, so we'd be relying on the AWS SDK implicitly falling back to DefaultAWSCredentialsProviderChain() when given null credentials, which I don't believe it does.

Regardless, the check you've suggested would restore this behavior. I'll go that route.

@budde
Copy link
Author

budde commented Feb 21, 2017

@brkyvz I've updated the PR per your feedback. BasicAWSCredentials will raise a java.lang.IllegalArgumentException if either keypair value is null so I elected to wrap BasicCredentialsProvider.provider in a try/catch block where we log if an IllegalArgumentException is thrown and return DefaultAWSCredentialsProviderChain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the exception to the log message if you think it's appropriate

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73240 has finished for PR 16744 at commit 11b3b64.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@budde
Copy link
Author

budde commented Feb 22, 2017

Missed updating a test, my mistake. Fixing now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. Fixed. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to add it? Does the AWS exception include what was missing, e.g. access key was null or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does specify that access key/secret key is null. I'll just add it.

@brkyvz
Copy link
Contributor

brkyvz commented Feb 22, 2017

Two final comments. Then I'll merge it pending tests

@budde
Copy link
Author

budde commented Feb 22, 2017

Updated the PR. Thanks for the work you've done on this! Hopefully I can have a PR for the builder interface up later this week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "falling back to DefaultAWSCredentialsProviderChain.", e) instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do it in a separate PR if you like

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and updated it.

…t Kinesis reads via STS

- Add dependency on aws-java-sdk-sts
- Replace SerializableAWSCredentials with new SerializableCredentialsProvider
  interface
- Make KinesisReceiver take SerializableCredentialsProvider as argument and
  pass credential provider to KCL
- Add new implementations of KinesisUtils.createStream() that take STS
  arguments
- Make JavaKinesisStreamSuite test the entire KinesisUtils Java API
- Update KCL/AWS SDK dependencies to 1.7.x/1.11.x
- Make SerializableCredentialsProvider a sealed trait and classes to their own file
@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73245 has finished for PR 16744 at commit b4bf3a8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73244 has finished for PR 16744 at commit d15affb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2017

Test build #73248 has finished for PR 16744 at commit da18da0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented Feb 22, 2017

Merging to master. Thanks for your patience!

@asfgit asfgit closed this in e406537 Feb 22, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
- Add dependency on aws-java-sdk-sts
- Replace SerializableAWSCredentials with new SerializableCredentialsProvider interface
- Make KinesisReceiver take SerializableCredentialsProvider as argument and
  pass credential provider to KCL
- Add new implementations of KinesisUtils.createStream() that take STS
  arguments
- Make JavaKinesisStreamSuite test the entire KinesisUtils Java API
- Update KCL/AWS SDK dependencies to 1.7.x/1.11.x

## What changes were proposed in this pull request?

[JIRA link with detailed description.](https://issues.apache.org/jira/browse/SPARK-19405)

* Replace SerializableAWSCredentials with new SerializableKCLAuthProvider class that takes 5 optional config params for configuring AWS auth and returns the appropriate credential provider object
* Add new public createStream() APIs for specifying these parameters in KinesisUtils

## How was this patch tested?

* Manually tested using explicit keypair and instance profile to read data from Kinesis stream in separate account (difficult to write a test orchestrating creation and assumption of IAM roles across separate accounts)
* Expanded JavaKinesisStreamSuite to test the entire Java API in KinesisUtils

## License acknowledgement
This contribution is my original work and that I license the work to the project under the project’s open source license.

Author: Budde <[email protected]>

Closes apache#16744 from budde/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants