Skip to content

Conversation

@benbromhead
Copy link

Allow users to use %sql syntax to query Cassandra tables with SparkSQL without having to manually create a CassandraSQLContext.

Expanding on "Add build profile for Spark/Cassandra integration #79"

To enable Cassandra SQL context support add -Dzeppelin.spark.useCassandraContext=true to the ZEPPELIN_JAVA_OPTS parameter in conf/zeppelin-env.sh file. Alternatively you can add this parameter in the parameter list of the Spark interpreter on the GUI.

Also supported are "spark.cassandra.connection.host", "spark.cassandra.auth.username" and "spark.cassandra.auth.password" parameters.

@boneill42
Copy link

Dude, nice work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think line 268 sets all property into conf that property starts with 'spark.'. So isn't it duplicated?

@Leemoonsoo
Copy link
Member

Thanks for great contribution!

And how about add zeppelin.spark.useCassandraContext into property builder?

  static {
     Interpreter.register(
         "spark",
         "spark",
         SparkInterpreter.class.getName(),
         new InterpreterPropertyBuilder()
             .add("spark.app.name", "Zeppelin", "The name of spark application.")
             ...
             .add("zeppelin.spark.useHiveContext", "true",
                  "Use HiveContext instead of SQLContext if it is true.")
             ...

both zeppelin.spark.useCassandraContext, zeppelin.spark.useHiveContext can be true or false. In this case, in user point of view, it's very unclear to see what's going to happen. Do you have an idea to handle it?

And Is it going to be difficult to add a test for it?

@devsprint
Copy link

While I have tried to make Zeppelin to work with DSE 4.6 I have integrated also this change. I'm able to properly query cassandra db. However, if there is an error, the notebook will display a generic error that is not helping at all to identify the root cause. (java.lang.reflect.InvocationTargetException).

if zeppelin.spark.useCassandraContext is set to true, then the value of zeppelin.spark.useHiveContext it doesn't matter. Even if it set to true it will still bind to cassandra context.
I think that there is a need for a small change to make sure that just on of the context could be set to true and not both....

@benbromhead
Copy link
Author

Apologies I haven't had a whole heap of bandwidth to address the comments above, after chatting @Leemoonsoo at the spark conference the other week I have a good idea about integrating some tests and cleaning up the above mentioned issues.

Stay tuned.

@devsprint
Copy link

@benbromhead let me know if I can help. I have played with it all day long today, using zeppelin forms and registering custom functions to be applied on results. Every thing works fine so far.

bbromhead added 5 commits July 1, 2015 12:09
…y for all spark.* props

Set default use of Cassandra Context to false
Conflicts:
	spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java
Maven will now only run Cassandra tests when a Cassandra profile is enabled
Add some default values for the CassandraContext
Raise exception if Cassandra and Hive contexts are enabled at the same time
… set, plain old spark should be the default with an opt-in to hive/cassandra/etc ...

Change SparkSqlInterpreterTest to use Hive context as the tests use Hive specific syntax.

Note: This changes the default SparkInterpreter behaviour and those upgrading will need to change their configuration.
@benbromhead
Copy link
Author

Ok I finally got around to completing some unit tests and changing the default values for the various interpreters.

In this PR I've changed the default behaviour of the SparkInterpreter to use the standard SQLContext by default rather than the HiveContext. This will potentially break peoples notebooks on upgrade if they are relying on Hive syntax/functionality rather than Spark SQL syntax without realising it.

@doanduyhai
Copy link
Contributor

@benbromhead, I'm writing a Cassandra interpreter (connecting Zeppelin directly to C* without going through Spark). Does it worth the effort to have a CassandraSQL interpreter ?

@benbromhead
Copy link
Author

@doanduyhai I think it is... particularly if you want to do things like joins :)

Allow Cassandra-spark-1.1 profile to run Cassandra tests
Apache license for cassandra.cql file
Add extra params to InterpreterContext
Removed wrong syntax test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any good way to keep this value unchanged?
Changing default value might confuse user. what do you think?

@benbromhead
Copy link
Author

@Leemoonsoo yeah I struggled to come up with a good answer for this. On the one hand I think the default behaviour should be to use the standard SQLSpark context, but by changing this it will break the default behaviour for users that rely on Hive features.

You will notice that the sql context tests actually rely on Hive syntax rather than spark sql compliant syntax. My vote would be to go with what is correct early on in this projects life, but I know this will be irritating for existing users.

Happy to revert to the previous behaviour.

@Leemoonsoo
Copy link
Member

zeppelin.spark.useHiveContext was originally 'false' but changed to 'true'. The reason was to give the same experience with spark-shell, which uses HiveContext by default.

I think changing zeppelin.spark.useHiveContext need to be handled in separate issue if it is not breaking this contribution.

@benbromhead
Copy link
Author

Ok makes sense, changed back to use the HiveContext by default

@Leemoonsoo
Copy link
Member

+1

@benbromhead
Copy link
Author

Anything left to do on this before it gets merged?

@falconair
Copy link

Looks like this request was close to getting merged months ago, any status update on this?

@jongyoul
Copy link
Member

@Leemoonsoo @benbromhead Do you guys have any update for this? It looks like pending state.

@Leemoonsoo
Copy link
Member

Can someone provide additional review?

@bbromhead
Copy link

The codebase has probably shifted a little bit since this last passed tests etc, do you want me to update and test against the latest master?

@corneadoug
Copy link
Contributor

@bbromhead @jongyoul @Leemoonsoo any more feedback on changes needed on this PR?

@benbromhead
Copy link
Author

I've updated this PR in the past and also asked for further interest.

I'm happy to update against the latest if there is an appetite to actually merge this into the project, it would be great to get feedback either way!

@doanduyhai
Copy link
Contributor

doanduyhai commented Sep 27, 2016

@benbromhead some remarks before rebasing

  1. maybe upgrade C* version and Spark/Cassandra connector version
  2. maybe use Achilles-embedded (look Cassandra interpreter unit tests) instead of cassandra-unit. The project is no longer actively maintained
  3. the SparkInterpreter class has changed a lot, probably a good idea to cherry-pick your changes and apply them instead of going through the complete rebase
  4. you've hard-coded the Spark/Cassandra connector dependency directly in spark/pom.xml. I'm not sure it's the cleanest way because people who don't use Cassandra still need to pull this dependency. Maybe it's better to create a separate Maven build profile for this. Until now it is done in spark-dependencies/pom.xml but since it will be removed soon by @AhyoungRyu (see [ZEPPELIN-1332] Remove spark-dependencies & suggest new way #1339) we'll need to find an alternative

@benbromhead
Copy link
Author

I'll look to address the remarks. I've also noticed the project now has a formal tracker via Jira. I'll also look to complete the requirements listed in https://zeppelin.apache.org/contribution/contributions.html

@asfgit asfgit closed this in c38a0a0 May 9, 2018
asfgit pushed a commit that referenced this pull request May 9, 2018
close #83
close #86
close #125
close #133
close #139
close #146
close #193
close #203
close #246
close #262
close #264
close #273
close #291
close #299
close #320
close #347
close #389
close #413
close #423
close #543
close #560
close #658
close #670
close #728
close #765
close #777
close #782
close #783
close #812
close #822
close #841
close #843
close #878
close #884
close #918
close #989
close #1076
close #1135
close #1187
close #1231
close #1304
close #1316
close #1361
close #1385
close #1390
close #1414
close #1422
close #1425
close #1447
close #1458
close #1466
close #1485
close #1492
close #1495
close #1497
close #1536
close #1545
close #1561
close #1577
close #1600
close #1603
close #1678
close #1695
close #1739
close #1748
close #1765
close #1767
close #1776
close #1783
close #1799
egorklimov pushed a commit to Tinkoff/zeppelin that referenced this pull request Sep 18, 2019
…w-rest-api to V_1.0.0

* commit 'dfa8bbe35bde8ebcc9f691f48b7fbfd5d3c96142':
  [ZP-329] New rest module for tzeppelin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants