-
Notifications
You must be signed in to change notification settings - Fork 151
How Jep Works
Jep uses JNI and the CPython API to start up the Python interpreter inside the JVM. When you create an Interpreter instance in Java, either a SubInterpreter or SharedInterpreter, a Python interpreter will be created for that Java Interpreter instance and will remain in memory until the Interpreter instance is closed with Interpreter.close()
. Internally Jep maintains a main interpreter that is only used to initialize and shut down Python. The internal main interpreter will remain in the JVM until the JVM exits.
Each Interpreter instance's Python interpreter is sandboxed apart from the other Python interpreters to some degree. This means a change to the global variables in one interpreter will not be reflected in other interpreters. However, this rule does not apply to CPython extensions. There is no way to strictly enforce a CPython extension is implemented in a way that supports sandboxing. A simple example would be if a CPython extension library has a global static variable that is used throughout the library. A change to that static variable in one sub-interpreter would affect the other sub-interpreters since it is the same reference in memory. Note that the same rule applies to Java static variables or singletons. Since only one exists in the JVM, a change to that static variable will be reflected in all Python sub-interpreters. See the sections below on shared modules and shared interpreters to see other ways that a module may not be sandboxed into a single interpreter.
Shared modules: Since Jep 3.6
Jep 3.6 added the concept of shared modules. Shared modules intentionally step outside of the sandboxed sub-intepreters to share a module across sub-interpreters. This can be used to workaround issues with CPython extensions. It can also potentially be used to share Python modules and their state with other sub-interpreters.
Shared interpreters: Since Jep 3.8
Jep 3.8 added the concept of shared interpreters. Shared interpreters share all modules while retaining their own globals dictionary. This is an alternative way to workaround issues with CPython extentions. All SharedInterpreter instances are shared with one another but remain separate from Jep instances.
Due to the need to manage a consistent Python thread state, a thread that creates an Interpreter instance must be reused for all method calls to that Interpreter instance. Jep will enforce this and throw exceptions mentioning invalid thread access. Furthermore, for each thread, you can only have one Interpreter open/unclosed at a time. Closing an Interpreter instance on a thread and then reusing that thread to create a new Interpreter is allowed.
Jep will automatically convert Java primitives, Strings, and jep.NDArrays sent into the Python interpreter into Python primitives, strings, and numpy.ndarrays respectively. The Python versions of these objects will have no reference to their original Java counterparts, they are entirely new objects that exist solely in Python's system memory.
A Java object that does not match one of the types listed above will automatically be wrapped as a PyJObject (or one of its related classes). A PyJObject wraps the reference to the original Java object and presents the Python interpreter with an interface for understanding the object as a Python object. From the point-of-view of the Python interpreter, a PyJObject is just another Python object with a select set of attributes (fields and methods) on it.
For more detailed discussion about PyJObject and Java Object conversion see the wiki page on Accessing Java Objects in Python
Python strings, primitives, and numpy.ndarrays will be automatically converted to their Java equivalent when passed/returned to Java. These Java objects will be equivalent copies, not references to the Python objects. Interpreter.getValue(String)
has support for some automatic conversions where Python object -> Java object:
- None -> null
- PyJClass (wrapped class) -> java.lang.Class
- PyJObject (wrapped object) -> java.lang.Object
- Python 3 Unicode -> java.lang.String
- True -> java.lang.Boolean
- False -> java.lang.Boolean
- Python 3 Int -> java.lang.Long
- Float -> java.lang.Double
- List -> java.util.ArrayList
- Tuple -> Collections.unmodifiableList(ArrayList)
- Dict -> java.util.HashMap
- Callable -> jep.python.PyCallable
- numpy.ndarray -> jep.NDArray
- object -> jep.python.PyObject
Since Jep 3.8
Jep 3.8 improved support for retrieving Python objects in Java by the addition of the method Interpreter.getValue(String name, Class<T> desiredType)
. By specifying a desired type, Jep will do its best to provide you with that type if the type conversion to a Java object is reasonable. Jep also supports retrieving references to native Python objects by using Interpreter.getValue(name, PyObject.class)
or Interpreter.getValue(name, PyCallable.class)
. PyObjects in Java have the methods getAttr
and setAttr
to enable getting and setting Python attributes from Java, similar to the dot .
operator. PyCallable is a subclass of PyObject and supports invoking Python methods. For more information, please see the javadoc.
Jep will use both Java heap memory and native (aka direct or system) memory. All the Java objects will use heap memory as usual, while any Python objects will use native memory as usual. The wrapper objects such as PyJObject will use both heap memory for the Java object and native memory for the associated pointers and metadata of the PyJObject.
When Jep wraps a Java object as a PyJObject, it notifies the JVM that it holds a reference to that Object, ensuring that the JVM will not garbage collect the object. When the Python garbage collector detects that there are no more references to that PyJObject (in Python at least), it will garbage collect the PyJObject wrapper. An example of this is when a variable is defined in a method scope and goes out of scope when the method returns/exits. When Python garbage collects the wrapper object, Jep will release the associated native memory of the PyJObject and notify the JVM that it no longer has a reference to the object. This then enables the JVM to garbage collect the underlying Java object if there are no more references to it.
Another way to explain the memory management of Jep is to view the JVM as delegating to Python until Python is done with the object. The Java garbage collector defers collecting a Java object inside a Python interpreter until the Python garbage collector collects it, at which point the Java garbage collector then treats it as just another Java object.