Skip to content

Conversation

@boneill42
Copy link
Contributor

@boneill42 boneill42 commented Apr 6, 2016

What changes were proposed in this pull request?

Upgrades KCL dependency, backported to 1.6.1.

How was this patch tested?

Used kinesis word count example against a stream containing aggregated data.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

See: SPARK-14421

pom.xml Outdated
<log4j.version>1.2.17</log4j.version>
<hadoop.version>2.2.0</hadoop.version>
<protobuf.version>2.5.0</protobuf.version>
<protobuf.version>2.6.1</protobuf.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily OK since it changes protobuf for the whole project, and could conflict with versions used by, say, Hadoop. 2.4 and 2.5 were terribly incompatible, and I am paranoid about 2.5 vs 2.6.

The KCL update itself is probably fine. Is that helpful at all by itself? you'll have to update the deps files to reflect the updated dependencies for tests to pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed... definitely needs testing.

Just upgrading the KCL didn't work. The new KCL relies on the following class from protobuf: com/google/protobuf/ProtocolStringList.class, which doesn't exist until 2.6.1.

@srowen
Copy link
Member

srowen commented Apr 6, 2016

If it turns out this will move forward, you need to make a JIRA and connect it. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

@boneill42
Copy link
Contributor Author

I had created a JIRA:
https://issues.apache.org/jira/browse/SPARK-14421

@srowen
Copy link
Member

srowen commented Apr 6, 2016

Yeah, this could be tough if the problem is really a protobuf difference. You need to update the title of this PR. See the link above.

@boneill42 boneill42 changed the title Upgraded KCL version to support de-aggregation of Kinesis records. SPARK-14421 Apr 6, 2016
@boneill42
Copy link
Contributor Author

Agreed. I just wanted to get it on the radar. People might experience sporadic problems consuming from kinesis streams until we can get it in there.

(title updated)

@srowen
Copy link
Member

srowen commented Apr 6, 2016

If you're actively working on this, that's fine, but if ultimately we can't merge this change because of protobuf then the PR should be closed. It stays around as does the JIRA

@boneill42
Copy link
Contributor Author

I'm not actively working it. We'll likely just fork and build a custom kinesis-asl jar.

Does it make sense to upgrade the dependencies for kinesis-asl? (and not touch the parent pom?)

@srowen
Copy link
Member

srowen commented Apr 6, 2016

I looked at the KCL dependencies, and 1.4.0 already depended on protobuf 1.6.1. What was the error you got -- clearly a version mismatch problem? just checking that it now becomes unavoidable but somehow worked before.

Yes, it might be viable to explicitly depend on protobuf 1.6.1 only in the KCL assembly. Add a version and un-mark it as provided and add a comment. That could be interesting to try. Although there's a risk of conflict in theory, it sounds like it at least fixes a real problem in practice now.

@boneill42
Copy link
Contributor Author

Here is the full stack:

Caused by: java.lang.NoClassDefFoundError: com/google/protobuf/ProtocolStringList
    at com.amazonaws.services.kinesis.clientlibrary.types.UserRecord.deaggregate(UserRecord.java:235)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:146)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
    at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.ProtocolStringList
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

I'll see if I can perform surgery at the kinesis-asl level and leave the top pom alone.

@boneill42
Copy link
Contributor Author

Yep, that worked (and updated PR to reflect changes)... but also acknowledge that there is a chance for conflict since both 2.5.0 and 2.6.1 will be in play.

@srowen
Copy link
Member

srowen commented Apr 7, 2016

Summoning @vanzin to ask if this would indeed only affect the Kinesis artifacts? if that's all it affects and it fixes a problem, seems OK. Seems risky of course to modify protobuf any more widely.

@vanzin
Copy link
Contributor

vanzin commented Apr 7, 2016

This would only affect the kinesis artifacts. But to actually use that at runtime, don't you need to relocate the classes in the assembly?

Otherwise, when you add the kinesis assembly to an app with "--packages", by default it will try to load protobuf from the parent class loader, which might still contain the old version.

@boneill42
Copy link
Contributor Author

That was my fear as well. Right now, we are adding the kinesis-asl-assembly jar via spark-defaults.conf.

spark.driver.extraClassPath /Users/brianoneill/tools/spark/spark-streaming-kinesis-asl-assembly_2.10-1.6.1-monetate.jar
spark.executor.extraClassPath /Users/brianoneill/tools/spark/spark-streaming-kinesis-asl-assembly_2.10-1.6.1-monetate.jar

That appears to work, but it could be luck. It could be that some of the 2.5.0 and 2.6.1 classes are compatible, or a 2.6.1-specific class is getting loaded first. I'm not familiar enough with how Spark constructs the classpaths.

Note also that we are running this from PySpark.

Here is a gist with the example that fails w/ KCL 1.4.0, but works with 1.6:
https://gist.github.com/boneill42/020dde814346c6b4ad0ba28406c3ea10

If you add the assembly to the classpath as above, you can run this gist with:

python dump.py boneill-2 my-stream https://kinesis.us-east-1.amazonaws.com us-east-1

@vanzin
Copy link
Contributor

vanzin commented Apr 7, 2016

Adding the assembly to the classpath that way means the newer protobuf will override any older version on the Spark jars. So kinesis will work, but any code in Spark that is not compatible with the new protobuf will fail. It sounds a little risky.

Doing relocation in the kinesis assembly should be pretty straight forward. Here's an example from Spark's own build: https://github.com/apache/spark/blob/master/pom.xml#L2207

@boneill42
Copy link
Contributor Author

Ah, brilliant. I just tested out the relocation and it appeared to work. (PR updated)

@boneill42 boneill42 changed the title SPARK-14421 SPARK-14421 : Upgrade Kinesis Client Library (KCL) to 1.6.2, fixes support for de-aggregation. Apr 7, 2016
<relocations>
<relocation>
<pattern>com.google.protobuf</pattern>
<shadedPattern>kcl.protobuf</shadedPattern>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent, the package name should be ${spark.shade.packageName}.kcl.protobuf(or kinesis.protobuf? kcl is not immediately obvious).

@vanzin
Copy link
Contributor

vanzin commented Apr 7, 2016

Just a naming suggestion but otherwise this LGTM.

@boneill42
Copy link
Contributor Author

Good point. I updated to kinesis.protobuf. (I wanted to stay away from any potential collision with the parent, which is why I didn't re-used the spark.shade.packageName)

</includes>
</artifactSet>
<relocations>
<relocation>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this section is indented 2 spaces too far, though it won't matter

@srowen
Copy link
Member

srowen commented Apr 8, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55346 has finished for PR 12209 at commit fe98fc1.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 8, 2016

Right, you'll also have to update the build/deps files to reflect this. dev/test-dependencies.sh can help.

@boneill42
Copy link
Contributor Author

@srowen ok, i'm able to run dev/test-dependencies.sh.
I had to manually download mvn and put it in the right place though.
The script didn't give me any errors.

In looking at the Jenkins output, it looks like it can't find mvn. (exactly what I ran into)

build/mvn: line 152: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.3.3/bin/mvn: No such file or directory

Is that just a red herring?

I can proceed and just add the deps, but I'm not sure how to test it.

@srowen
Copy link
Member

srowen commented Apr 8, 2016

build/mvn will download Maven 3.3.9. You may need --force to override your local install. 3.3.3 is the wrong version.

@boneill42
Copy link
Contributor Author

Locally, i'm okay.

I'm seeing the 3.3.3 error in the Jenkins build:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55346/console

@srowen
Copy link
Member

srowen commented Apr 8, 2016

A-ha, well, two problems here. The underlying cause is that it can't download Maven because the ASF site is in maintenance mode now. Try http://archive.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz

But I was surprised to see 3.3.3 since we are on 3.3.9 in master, and I see now you opened the PR vs the 1.6 branch. This should be opened vs master, and then it is back-ported as needed.

@boneill42
Copy link
Contributor Author

ah, got it. I will re-open a new against master. (and reference this one)

@boneill42 boneill42 closed this Apr 8, 2016
@srowen
Copy link
Member

srowen commented Apr 12, 2016

@boneill42 if you'll reopen vs master I'll merge it

@boneill42
Copy link
Contributor Author

@srowen it looks like master is already migrated to kinesis consumer library v 1.6.2:
https://github.com/apache/spark/blob/master/pom.xml#L150

Can that be backported safely to 1.6.x?

@srowen
Copy link
Member

srowen commented Apr 15, 2016

@boneill42 Aha right. That indicates that updating the library actually entailed some significant dependency changes. I don't know if that's OK to back-port to 1.6, but it's already in master. Presumably master needs your additional changes here to resolve your issue, and you could bump to 1.6.2 while you're at it.

@srowen
Copy link
Member

srowen commented Apr 22, 2016

@boneill42 do you want to take a crack at adding this change and bumping to 1.6.2 in master, so that it gets into 2.0?

@boneill42
Copy link
Contributor Author

yeah -- sorry, been busy with the day job.

I'll move this to master.

@boneill42
Copy link
Contributor Author

OK -- I just confirmed that I still see this issue on master.

    Caused by: java.lang.ClassNotFoundException: com.google.protobuf.ProtocolStringList

I'll submit a patch now.

@boneill42
Copy link
Contributor Author

New PR available here:
#13054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants