Skip to content

Conversation

@conker84
Copy link
Contributor

@conker84 conker84 commented Nov 2, 2016

What is this PR for?

This contribution would to introduce Neo4j Cypher intepreter and the new network visualization;
at the same time would provide base APIs that enable other graph databases (or graph framworks such as GraphX or Giraph).

What type of PR is it?

[Feature]

Todos

  • - Create the network visualization
  • - Provide base APIs to manage graph results (under: org.apache.zeppelin.interpreter.graph.*)
  • - Create the Neo4j intepreter

What is the Jira issue?

[ZEPPELIN-1604]

How should this be tested?

Donwload and execute Neo4j v3.x, you can also pull a Docker image.

In order to execute test cases, if you are running Java 7, you need to also provide an environment variable telling the tests where to find Java 8, because Neo4j-the-database needs it to run.

export NEO4J_JAVA=<path/to/java/home>

Use this statement to create a dummy dataset

%neo4j
UNWIND range(1,100) as id
CREATE (p:Person {id:id, name: "Name " + id, age: id % 3}) WITH collect(p) as people
UNWIND people as p1
UNWIND range(1,10) as friend
WITH p1, people[(p1.id + friend) % size(people)] as p2
CREATE (p1)-[:KNOWS {years: abs(p2.id - p2.id)}]->(p2)

Then you can write some simple queries like:

%neo4j
MATCH (p:Person)-[r:KNOWS]-(p1:Person)
RETURN p, r, p1
LIMIT 10;
%neo4j
MATCH (p:Person)-[r:KNOWS]-(p1:Person)
RETURN p.id AS ID_A, p.name AS NAME_A, r.years AS YEARS, p1.id AS ID_B, p1.name AS NAME_B
LIMIT 20;

Screenshots

Simple cypher query network visualization

Simple cypher query tabular visualization

Video

localhost_9000___-google-chrome-01_11_2016-15_44_24

Questions:

  • Does the licenses files need update? Yes
  • Is there breaking changes for older versions? No
  • Does this needs documentation? Yes. I have create the neo4j.md file in /docs/intepreter folder


<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the license for these? could you add to the LICENSE file?

Copy link
Contributor Author

@conker84 conker84 Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung this dependency has Apache 2 license.
There is another one

<dependency>
  <groupId>org.neo4j.test</groupId>
  <artifactId>neo4j-harness</artifactId>
  <version>${neo4j.version}</version>
  <scope>test</scope>
</dependency>

which is listed both AGPL 3.0 and GPL 3.0, which of two have i to include in LICENSE file?

Another question, also must i include the license for sigma.js right?
Thanks for the support.

Copy link
Member

@Leemoonsoo Leemoonsoo Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While org.neo4j.test:neo4j-harness is 'test' scope and will not bundled into the binary package, we don't need include it in zeppelin-distribution/src/bin_license/LICENSE file.

Right, sigma.js is going to be bundled in binary package and we need to include it in zeppelin-distribution/src/bin_license/LICENSE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@felixcheung
Copy link
Member

Thank you for the great contribution. Could you check https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/development/writingzeppelininterpreter.html#contributing-a-new-interpreter-to-zeppelin-releases

Specifically,

  • add a link to your documentation in the navigation menu (docs/_includes/themes/zeppelin/_navigation.html).
  • licenses of the transitive closure of all dependencies are list in license

"select2": "^4.0.3",
"github-markdown-css": "^2.4.0"
"github-markdown-css": "^2.4.0",
"sigma.js": "https://github.com/jacomyal/sigma.js.git#master"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a specific tag instead of master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

SVG,
NULL
NULL,
NETWORK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new interpreter result type means adding a new DisplaySystem

And every display system should accessible from interpreter's output. for example, Table DisplaySystem,

%spark
print(s"""%table
key\tvalue
sun\t100
moon\t20""")
%sh echo -e "%table key\tvalue\nsun\t100\nmoon\t20"

Any output of interpreter is serialized as a string into InterpreterResult.msg and if %table is detected, it is deserialized in a front-end side and rendered

Can we do the same with NETWORK?

Or another approach could be remove 'NETWORK' result type, describe node/edge information some how in TABLE format. And let loadNetworkData() parse and construct network.

What do you think?

Copy link
Contributor Author

@conker84 conker84 Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
i updated the code base in order to manage simple graphs like this:

%spark
print(s"""
%network {
    "nodes" : [
        {"id" : 1},
        {"id" : 2}
    ],
    "edges" : [{"source" : 2, "target" : 1, "id" : 1}]
}
""")

neo4-simple-graph

I prefer to manage the graphs via json format because the goal of this contribution is to manage Property Graphs, not just simple graphs. Is this a viable way?
Thanks for the feedback!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM


<button type="button" class="btn btn-default btn-sm"
ng-if="paragraph.result.type == 'TABLE'"
ng-if="paragraph.result.type == 'TABLE' || paragraph.result.type == 'NETWORK'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all other types of visualizations - table, bar, line, area, scatter shouldn't be selectable when result type is 'NETWORK' while they can not display it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to full advantage the property graph by the zeppelin platfom.

%spark
print(s"""
%network {
    "nodes" : [
        {"id" : 1, "label" : "User", "data" : { "fullname" : "Andrea Santurbano" }},
        {"id" : 2, "label" : "User", "data" : { "fullname" : "Moon soo Lee" }}
    ],
    "edges" : [{"source" : 2, "target" : 1, "id" : 1, "label" : "HELPS", "data" : { "project" : "Zeppelin", "githubUrl": "https://github.com/apache/zeppelin/pull/1582" } }]
}
""")

neo4-property-graph
This type of graph can be displayed in other ways like tabular, donut, etc...
neo4j-donut-chart
So i think the other visualizations must stay selectable.
Thanks for the feedback!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! could you add examples/info on this to DisplaySystem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Updated the documentation.
Updated documentation
Fixed perfomance issues
@Leemoonsoo
Copy link
Member

Front-end visualization code has been refactored from #1529. I think that's the biggest conflicts that this branch have right now.

Could you merge master branch to this branch and resolve conflict?
Basically you'll need to move network visualization into zeppelin-web/src/app/visualization/builtins like other built-in visualizations.
Let me know if you need any help on it. I'd like to help.

@conker84
Copy link
Contributor Author

@Leemoonsoo i'll study the code this we. I'll let you know if i need help! Thanks!!!

@felixcheung
Copy link
Member

Let's go with (c) & (b) and then (a)?

@Leemoonsoo
Copy link
Member

Sorry for late response.
Sounds like a plan!

@conker84
Copy link
Contributor Author

Perfect just to clarify what i have to do:

  1. Close this PR
  2. Open a new PR (and the related jira issue) for (c) & (b) (i have to made a new fork?)

I'm right?

@Leemoonsoo
Copy link
Member

Right!

@jexp
Copy link

jexp commented Mar 4, 2017

@conker84 wow what a long story. Do you need any help from us (Neo4j) or @felixcheung @Leemoonsoo is there anything we can do?

@conker84
Copy link
Contributor Author

conker84 commented Mar 5, 2017

@jexp any help is appreciated!

@conker84 conker84 closed this Mar 5, 2017
@felixcheung
Copy link
Member

@jexp it would be great if you could review the neo4j parts in this PR.
@conker84 would you be opening a new PR? please link to this one (type #1582) so I could help review and get this in soon

@conker84
Copy link
Contributor Author

conker84 commented Mar 7, 2017

@felixcheung @Leemoonsoo i'm ready to make a new PR but my travis fails in one job
Can you help me to understand why?

@conker84
Copy link
Contributor Author

conker84 commented Mar 9, 2017

Hi guys any news?

@1ambda
Copy link
Member

1ambda commented Mar 9, 2017

Zeppelin start �[60G[�[1;32m  OK  �[0;39m]
< HTTP/1.1 200 OK
�[31mException encountered when invoking run on a nested suite - The code passed to eventually never returned normally. Attempted 832 times over 3.002551618033333 minutes. Last failure message: AbstractFunctionalSuite.this.find("welcome")(AbstractFunctionalSuite.this.webDriver).isDefined was false. *** ABORTED ***�[0m
�[31m  Exception encountered when invoking run on a nested suite - The code passed to eventually never returned normally. Attempted 832 times over 3.002551618033333 minutes. Last failure message: AbstractFunctionalSuite.this.find("welcome")(AbstractFunctionalSuite.this.webDriver).isDefined was false. (AbstractFunctionalSuite.scala:44)�[0m
�[36mRun completed in 3 minutes, 18 seconds.�[0m
�[36mTotal number of tests run: 0�[0m
�[36mSuites: completed 1, aborted 1�[0m
�[36mTests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0�[0m
�[31m*** 1 SUITE ABORTED ***�[0m
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project zeppelin-server: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :zeppelin-server

I think this profile failed due to one of flaky tests. So, just restarting the profile would work.

@conker84
Copy link
Contributor Author

I can't understand if it is related to something made by me becouse i haven't touched the zeppelin-server

@1ambda
Copy link
Member

1ambda commented Mar 11, 2017

Then, clicking restart button in the failed job would work. It'a flaky test.

<name>Zeppelin: Neo4j interpreter</name>

<properties>
<neo4j.driver.version>1.0.4</neo4j.driver.version>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps update to 1.2.0


<properties>
<neo4j.driver.version>1.0.4</neo4j.driver.version>
<test.neo4j.kernel.version>3.0.4</test.neo4j.kernel.version>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update to 3.1.2

}
List<String> line = new ArrayList<>();
for (String col : cols) {
Object value = record.get(col);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Value instead.


private void setResultValue(Object value, Set<Node> nodes, Set<Relationship> relationships,
List<String> line) {
if (value instanceof NodeValue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rely on internal types or instanceof checks.

(PathValue) value : (PathValue) ((InternalPath) value).asValue();
nodes.addAll(Iterables.asList(pathVal.asPath().nodes()));
relationships.addAll(Iterables.asList(pathVal.asPath().relationships()));
} else if (value instanceof ListValue) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also handle Map and map values.

setResultValue(val, nodes, relationships, line);
}
} else {
line.add(String.valueOf(value));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value.asString()

}
}

private StatementResult execute(Session session, String cypherQuery,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Neo4j 3.1 also $param is valid parameter syntax.


@Override
public void cancel(InterpreterContext context) {
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could abort a running transaction, e.g. with session.reset()

Map<String, String> graphLabels) {
Set<String> labels = new LinkedHashSet<>();
String firstLabel = null;
for (String label : n.labels()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they probably be sorted to assure consistency ?

public static final String NEO4J_SERVER_URL = "neo4j.url";
public static final String NEO4J_SERVER_USER = "neo4j.user";
public static final String NEO4J_SERVER_PASSWORD = "neo4j.password";
public static final String NEO4J_MAX_CONCURRENCY = "neo4j.max.concurrency";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be used for neo4j-driver session pool size?

@jexp
Copy link

jexp commented Mar 12, 2017

@conker84 @felixcheung I looked at the java-driver Neo4j bits and made some comments.

In general it looks great and I really like the ideas.

If you need anything else from me to make this PR happen, please let me know.

@conker84
Copy link
Contributor Author

@jexp thanks for the hints i will integrate them and when i'll be ready i'll ask you another review!

@conker84
Copy link
Contributor Author

@felixcheung @Leemoonsoo see the #2125

asfgit pushed a commit that referenced this pull request Jun 9, 2017
### What is this PR for?
This issue is about a new network visualization that can leverage the Property Graph Model (https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph), but also simple graphs in order to provide a set of base apis that can be used by graph dbs (like Neo4j) or graph processing frameworks (like GraphX or Giraph).

### What type of PR is it?
[Feature]
Is related to the #1582

### Todos
* [x] - Added the intepreter apis to manage graphs (under the pakage **org.apache.zeppelin.interpreter.graph**)
* [x] - Added the frontend apis to manage graphs (via d3js)

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-2222

### How should this be tested?
You can download [this notebook](https://gist.github.com/conker84/9574127c2389d08164423894aa93b67f) to test the PR

### Screenshots (if appropriate)
![zeppelin-pr-screen](https://cloud.githubusercontent.com/assets/1833335/23830683/b883e916-0710-11e7-980d-c8ab6bf6d26b.PNG)

### Video
![zeppelin](https://cloud.githubusercontent.com/assets/1833335/23902630/1e121a0c-08c2-11e7-9f28-134866dba077.gif)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: conker84 <santand@gmail.com>

Closes #2125 from conker84/master and squashes the following commits:

b6062a0 [conker84] Removed package org.apache.zeppelin.interpreter.graph
e98ca7a [conker84] Comments of review 14/03/2017
b31b7b7 [conker84] Rebase of 07/04/2017
3257bea [conker84] Rebase 30/4
6e74eb9 [conker84] Rebase 30/04
@conker84
Copy link
Contributor Author

Hi guys,
I added the #2478 for the last part of the original PR.

@conker84 conker84 mentioned this pull request Aug 9, 2017
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants