Skip to content
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -363,4 +363,9 @@ S3method(structField, jobj)
S3method(structType, jobj)
S3method(structType, structField)

export("newJObject")
export("callJMethod")
export("callJStatic")
export("cleanup.jobj")
Copy link
Contributor

@sun-rui sun-rui Aug 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused about the discussion at https://issues.apache.org/jira/browse/SPARK-16611:

    d) cleanup.jobj:
    SystemML uses the MLContext and matrixCharacteristic class that is instantiated in JVM and whose object
   ref is kept alive in the sparkR and later when systemML has done it’s computation. we cleanup the  
   objects.The way we achieve it using the References classes in R and use it’s finalize method to register the
   cleanup.jobj once we have created the jobj via newJObject(“sysml.class”)

It seems that systemML still uses automatic GC to cleanup jobjs as it is said that "use it’s finalize method to register the cleanup.jobj ". But a jobj already has cleanup.jobj registered as its finalizer.
If my understanding is correct, I don't see strong reason for exposing cleanup.jobj. If active cleanup is demanded, a call to "gc" can be made?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@aloknsingh aloknsingh Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sun-rui

Thanks for pointing this out. I look at the code and you are right in mentioning that the jobj has the cleanup.jobj registered.

Initially, we were under the assumption that we have to register the created obj, cleanup table, otherwise the jvm (running scala) won't cleanup our obj, as there is the ref in the SparkR.

In summary,

  1. For the systemML, with the information you had mentioned, we would be fine with if it is made private also.
  2. Also as you had mentioned, if package writer (on the top of sparkR) feels that they have absolutely run the cleanup, they can call the R's explicit "gc".

thanks,
Alok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sun-rui and @aloknsingh - I'll update the PR today


export("install.spark")
54 changes: 49 additions & 5 deletions R/pkg/R/backend.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,23 @@ isInstanceOf <- function(jobj, className) {
callJMethod(cls, "isInstance", jobj)
}

# Call a Java method named methodName on the object
# specified by objId. objId should be a "jobj" returned
# from the SparkRBackend.
#' Call Java Methods
#'
#' Call a Java method in the JVM running the Spark driver.
#'
#' @param objId object to invoke the method on. Should be a "jobj" created by newJObject.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a wrapper, it would be better to name this x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#' @param methodName method name to call.
Copy link
Member

@felixcheung felixcheung Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(scratch this)

#' @param ... parameters to pass to the Java method.
#' @export
#' @seealso callJStatic, newJObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#' @examples
Copy link
Member

@felixcheung felixcheung Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need @Aliases, @note since 2.0.1?
@return
@Rdname

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this return invisible() for exported version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the @note and @return to all the methods. I am not sure we need the @aliases or @rdname because these are S3 methods ? The rdnames come out as sparkR.callJMethod.rd etc. and look fine to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roxygen2 infers the rdname pretty well, so I agree that is largely optional.
From a best practice/guideline point-of-view though I think it'd be good to have @rdname because it would make it clear which Rd and doc page it goes to. I think it would get people to think about it.

For example, instead of n->n by default we want n->count since n is an alias of count.

As for @Aliases, agree we don't need it. On a related note, I think I'm seeing some issues with it in other places we have that I want to look into shortly. Let me check back with you later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added rdname for all 3 methods now.

#' \dontrun{
#' sparkR.session() # Need to have a Spark JVM running before calling newJObject
#' # Create a Java ArrayList and populate it
#' jarray <- newJObject("java.util.ArrayList")
#' callJMethod(jarray, "add", 42L)
#' callJMethod(jarray, "get", 0L) # Will print 42
#' }
callJMethod <- function(objId, methodName, ...) {
stopifnot(class(objId) == "jobj")
if (!isValidJobj(objId)) {
Expand All @@ -37,12 +51,42 @@ callJMethod <- function(objId, methodName, ...) {
invokeJava(isStatic = FALSE, objId$id, methodName, ...)
}

# Call a static method on a specified className
#' Call Static Java Methods
#'
#' Call a static method in the JVM running the Spark driver.
#'
#' @param className class containing the static method to invoke.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java fully qualified class name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#' @param methodName name of static method to invoke.
#' @param ... parameters to pass to the Java method.
#' @export
#' @seealso callJMethod, newJObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#' @examples
#' \dontrun{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aliases, @note since 2.0.1?
@return
@Rdname

#' sparkR.session() # Need to have a Spark JVM running before calling callJStatic
#' callJStatic("java.lang.System", "currentTimeMillis")
#' callJStatic("java.lang.System", "getProperty", "java.home")
#' }
callJStatic <- function(className, methodName, ...) {
invokeJava(isStatic = TRUE, className, methodName, ...)
}

# Create a new object of the specified class name
#' Create Java Objects
#'
#' Create a new Java object in the JVM running the Spark driver.
#'
#' @param className name of the class to create
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when certain JVM types are created because our SerDe they don't always come back as jobj - should we call this out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java fully qualified class name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to qualify this in the comment on what gets returned. The trouble is that we dont have an externally visible documentation of what types will get converted vs. what will not. Also I think this is bound to change with versions (like the SQL decimal change for example).

However this is a very low level API that is only for advanced developers. So I wonder if we should just leave a pointer to the source file ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior could use some explaining and agree that this API is really for advanced or package developers. How about we put a statement or a few words to that effect in the API doc? I think it's ok we don't put a link or describe this in details in the programming guide. Maybe a .md file later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments in a Details section. Let me know if it reads ok. I agree that having a md file might be useful in the long run.

#' @param ... arguments to be passed to the constructor
#' @export
#' @seealso callJMethod, callJStatic
Copy link
Member

@felixcheung felixcheung Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Seealso needs to be in the form \link{callJMethod}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done the seealso and return - Lets discuss the alias, rdname in the other thread

#' @examples
#' \dontrun{
#' sparkR.session() # Need to have a Spark JVM running before calling newJObject
#' # Create a Java ArrayList and populate it
#' jarray <- newJObject("java.util.ArrayList")
#' callJMethod(jarray, "add", 42L)
#' callJMethod(jarray, "get", 0L) # Will print 42
#' }
#' @note newJObject since 2.0.1
newJObject <- function(className, ...) {
invokeJava(isStatic = TRUE, className, methodName = "<init>", ...)
}
Expand Down
15 changes: 14 additions & 1 deletion R/pkg/R/jobj.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,20 @@ getClassName.jobj <- function(x) {
callJMethod(cls, "getName")
}

cleanup.jobj <- function(jobj) {
#' Garbage collect Java Objects
#'
#' Garbage collect an object allocated on Spark driver JVM heap.
#'
#' cleanup.jobj is a low level method that lets developers manually garbage collect objects
#' allocated using newJObject. This is only to be used for advanced use cases as objects allocated
#' on the JVM heap are automatically garbage collected when the corresponding R reference goes out
#' of scope.
#'
#' @param x the Java object that should be garbage collected.
#' @note cleanup.jobj since 2.0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed now

#' @export
cleanup.jobj <- function(x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cleanup a good descriptive name for R audience? Would gc.jobj() better?
http://stackoverflow.com/questions/8813753/what-is-the-difference-between-gc-and-rm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thats a good point - but I've removed this as we don't need to export this for now

jobj <- x
if (isValidJobj(jobj)) {
Copy link
Member

@felixcheung felixcheung Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isValidJobj doesn't currently check, I think we should add a if (class(jobj) == 'jobj') before that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess since this is back to an internal method we dont need to worry about it. But I guess I can add the check if we still want it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are good for now if we are not exporting cleanup.jobj

objId <- jobj$id
# If we don't know anything about this jobj, ignore it
Expand Down
41 changes: 41 additions & 0 deletions R/pkg/inst/tests/testthat/test_jvm_api.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

context("JVM API")

sparkSession <- sparkR.session(enableHiveSupport = FALSE)

test_that("Create and call methods on object", {
jarr <- newJObject("java.util.ArrayList")
# Add an element to the array
callJMethod(jarr, "add", 1L)
# Check if get returns the same element
expect_equal(callJMethod(jarr, "get", 0L), 1L)
})

test_that("Call static methods", {
# Convert a boolean to a string
strTrue <- callJStatic("java.lang.String", "valueOf", TRUE)
expect_equal(strTrue, "true")
})

test_that("Manually garbage collect objects", {
jarr <- newJObject("java.util.ArrayList")
cleanup.jobj(jarr)
# Using a jobj after GC should throw an error
expect_error(print(jarr), "Error in invokeJava.*")
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add sparkR.session.stop()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done