diff --git a/doc/panama_ffi.html b/doc/panama_ffi.html deleted file mode 100644 index 604aea7c7e3..00000000000 --- a/doc/panama_ffi.html +++ /dev/null @@ -1,627 +0,0 @@ - - - - -ffi_v2 - - -

State of foreign function support

March 2021

Maurizio Cimadamore

In this document we explore the main concepts behind Panama's foreign function support; as we shall see, the central abstraction in the foreign function support is the so called foreign linker, an abstraction that allows clients to construct native method handles — that is, method handles whose invocation targets a native function defined in some native library. As we shall see, Panama foreign function support is completely expressed in terms of Java code and no intermediate native code is required.

Native addresses

Before we dive into the specifics of the foreign function support, it would be useful to briefly recap some of the main concepts we have learned when exploring the foreign memory access support. The Foreign Memory Access API allows client to create and manipulate memory segments. A memory segment is a view over a memory source (either on- or off-heap) which is spatially bounded, temporally bounded and thread-confined. The guarantees ensure that dereferencing a segment that has been created by Java code is always safe, and can never result in a VM crash, or, worse, in silent memory corruption.

Now, in the case of memory segments, the above properties (spatial bounds, temporal bounds and confinement) can be known in full when the segment is created. But when we interact with native libraries we will often be receiving raw pointers; such pointers have no spatial bounds (does a char* in C refer to one char, or a char array of a given size?), no notion of temporal bounds, nor thread-confinement. Raw addresses in our interop support are modeled using the MemoryAddress abstraction.

A memory address is just what the name implies: it encapsulates a memory address (either on- or off-heap). Since, in order to dereference memory using a memory access var handle, we need a segment, it follows that it is not possible to directly dereference a memory address — to do that we need a segment first. So clients can proceed in three different ways here.

First, if the memory address is known to belong to a segment the client already owns, a rebase operation can be performed; in other words, the client can ask the address what is its offset relative to a given segment, and then proceed to dereference the original segment accordingly:

Secondly, if the client does not have a segment which contains a given memory address, it can create one unsafely, using the MemoryAddress::asSegmentRestricted; this can also be useful to inject extra knowledge about spatial bounds which might be available in the native library the client is interacting with:

Alternatively, the client can fall back to use the so called everything segment - that is, a primordial segment which covers the entire native heap and whose scope is always alive (the so called global scope). Since this segment is available as a constant, dereference can happen without the need of creating any additional segment instances:

Of course, since accessing the entire native heap is inherently unsafe, accessing the everything segment is considered a restricted operation which is only allowed after explicit opt in by setting the foreign.restricted=permit runtime flag.

MemoryAddress, like MemorySegment , implements the Addressable interface, which is a functional interface whose method projects an entity down to a MemoryAddress instance. In the case of MemoryAddress such a projection is the identity function; in the case of a memory segment, the projection returns the MemoryAddres instance for the segment's base address. This abstraction allows to pass either memory address or memory segments where an address is expected (this is especially useful when generating native bindings).

Segment allocators

Idiomatic C code implicitly relies on stack allocation to allow for concise variable declarations; consider this example:

Here the function foo takes an output parameter, a pointer to an int variable. This idiom can be implemented as follows, using the Foreign Memory Access API:

There are a number of issues with the above code snippet:

To address these problems, Panama provides a SegmentAllocator abstraction, a functional interface which provides many useful operation to allocate commonly used values. For instance, the above code can be rewritten as follows:

The above code retrieves the default allocator (an allocator built on top of MemorySegment::allocateNative), and then uses this allocator to create a native array which is initialized to the values { 1, 2, 3, 4, 5}. The array initialization is more efficient, compared to the previous snippet, as the Java array is copied in bulk into the memory region associated with the newly allocated memory segment.

Memory associated with segments returned by the default allocator is released as soon as said segments become unreachable. To have better control over the lifetime of the segments returned by an allocator, clients can use the so called scoped allocator, which returns segments associated with a given scope:

Scoped allocator make sure that all segments allocated with a scoped allocator are no longer usable after the scope associated with the allocator has been closed. This makes it easier to manage multiple resources which share the same lifecycle.

Custom segment allocators are also critical to achieve optimal allocation performance; for this reason, a number of predefined allocators are available via factories in the SegmentAllocator interface. For instance, it is possible to create an arena-based allocator, as follows:

The above code creates a confined scope; inside the try-with-resources, a new unbounded arena allocation is created, associated with the existing scope. The allocator will allocate slabs of memory, of a specific size, and respond to allocation request by returning different slices of the pre-allocated slab. If a slab does not have sufficient space to accommodate a new allocation request, a new one will be allocated. If the scope associated with the arena allocator is closed, all memory associated with the segments created by the allocator (see the body of the for loop) will be deallocated at once. This idiom combines the advantages of deterministic deallocation (provided by the Memory Access API) with a more flexible and scalable allocation scheme, and can be very useful when writing large applications.

For these reasons, all the methods in the Foreign Linker API which produce memory segments (see CLinker::toCString), allow an optional allocator to be provided by user code — this is key in ensuring that an application using the Foreign Linker API achieves optimal allocation performances, especially in non-trivial use cases.

Symbol lookups

The first ingredient of any foreign function support is a mechanism to lookup symbols in native libraries. In traditional Java/JNI, this is done via the System::loadLibrary and System::load methods, which internally map into calls to dlopen. In Panama, library lookups are modeled more directly, using a class calledLibraryLookup (similar to a method handle lookup), which provides capabilities to lookup named symbols in a given native library; we can obtain a library lookup in 3 different ways:

Once a lookup has been obtained, a client can use it to retrieve handles to library symbols (either global variables or functions) using the lookup(String) method, which returns an Optional<LibraryLookup.Symbol>. A lookup symbol is just a proxy for a memory address (in fact, it implements Addressable) and a name.

For instance, the following code can be used to lookup the clang_getClangVersion function provided by the clang library:

One crucial distinction between the library loading support of the Foreign Linker API and of JNI is that JNI libraries are loaded into a class loader. Furthermore, to preserve classloader integrity integrity, the same JNI library cannot be loaded into more than one classloader. The foreign function support described here is more primitive — the Foreign Linker API allows clients to target native libraries directly, without any intervening JNI code. Crucially, Java objects are never passed to and from native code by the Foreign Linker API. Because of this, libraries loaded through the LibraryLookup hook are not tied to any class loader and can be (re)loaded as many times as needed.

C Linker

At the core of Panama foreign function support we find the CLinker abstraction. This abstraction plays a dual role: first, for downcalls, it allows to model native function calls as plain MethodHandle calls (see ForeignLinker::downcallHandle); second, for upcalls, it allows to convert an existing MethodHandle (which might point to some Java method) into a MemorySegment which could then be passed to native functions as a function pointer (see ForeignLinker::upcallStub):

In the following sections we will dive deeper into how downcall handles and upcall stubs are created; here we want to focus on the similarities between these two routines. First, both take a FunctionDescriptor instance — essentially an aggregate of memory layouts which is used to describe the signature of a foreign function in full. Speaking of C, the CLinker class defines many layout constants (one for each main C primitive type) — these layouts can be combined using a FunctionDescriptor to describe the signature of a C function. For instance, assuming we have a C function taking a char* and returning a long we can model such a function with the following descriptor:

The layouts used above will be mapped to the right layout according to the platform we are executing on. This also means that these layouts will be platform dependent and that e.g. C_LONG will be a 32 bit value layout on Windows, while being a 64-bit value on Linux.

Layouts defined in the CLinker class are not only handy, as they already model the C types we want to work on; they also contain hidden pieces of information which the foreign linker support uses in order to compute the calling sequence associated with a given function descriptor. For instance, the two C types int and float might share a similar memory layout (they both are expressed as 32 bit values), but are typically passed using different machine registers. The layout attributes attached to the C-specific layouts in the CLinker class ensures that arguments and return values are interpreted in the correct way.

Another similarity between downcallHandle and upcallStub is that they both accept (either directly, or indirectly) a MethodType instance. The method type describes the Java signatures that clients will be using when interacting with said downcall handles, or upcall stubs. The C linker implementation adds constraints on which layouts can be used with which Java carrier — for instance by enforcing that the size of the Java carrier is equal to that of the corresponding layout, or by making sure that certain layouts are associated with specific carriers. The following table shows the Java carrier vs. layout mappings enforced by the Linux/macOS foreign linker implementation:

C layoutJava carrier
C_BOOLbyte
C_CHARbyte
C_SHORTshort
C_INTint
C_LONGlong
C_LONGLONGlong
C_FLOATfloat
C_DOUBLEdouble
C_POINTERMemoryAddress
GroupLayoutMemorySegment
C_VALISTCLinker.VaList

Aside from the mapping between primitive layout and primitive Java carriers (which might vary across platforms), it is important to note how all pointer layouts must correspond to a MemoryAddress carrier, whereas structs (whose layout is defined by a GroupLayout) must be associated with a MemorySegment carrier; there is also a layout/carrier pair for native va_list (which are covered later in this document).

Downcalls

We will now look at how foreign functions can be called from Java using the foreign linker abstraction. Assume we wanted to call the following function from the standard C library:

In order to do that, we have to:

Here's an example of how we might want to do that (a full listing of all the examples in this and subsequent sections will be provided in the appendix):

Note that, since the function strlen is part of the standard C library, which is loaded with the VM, we can just use the default lookup to look it up. The rest is pretty straightforward — the only tricky detail is how to model size_t: typically this type has the size of a pointer, so we can use C_LONG on Linux, but we'd have to use C_LONGLONG on Windows. On the Java side, we model the size_t using a long and the pointer is modeled using a MemoryAddress parameter.

One we have obtained the downcall native method handle, we can just use it as any other method handle:

Here we are using one of the helper methods in CLinker to convert a Java string into an off-heap memory segment which contains a NULL terminated C string. We then pass that segment to the method handle and retrieve our result in a Java long. Note how all this has been possible without any piece of intervening native code — all the interop code can be expressed in (low level) Java. Note also how we used an explicit resource scope to control the lifecycle of the allocated C string; while using the implicit default scope is an option, extra care must be taken when using segments featuring implicitly deallocation which are then converted into MemoryAddress instances: since the address is eventually converted (by the linker support) into a raw Java long, there is no guarantee that the memory segment would be kept reachable for the entire duration of the native call.

The CLinker interfaces also supports linking of native function without an address known at link time; when that happens, an address must be provided when the method handle returned by the linker is invoked - this is very useful to support virtual calls. For instance, the above code can be rewritten as follows:

Now that we have seen the basics of how foreign function calls are supported in Panama, let's add some additional considerations. First, it is important to note that, albeit the interop code is written in Java, the above code can not be considered 100% safe. There are many arbitrary decisions to be made when setting up downcall method handles such as the one above, some of which might be obvious to us (e.g. how many parameters does the function take), but which cannot ultimately be verified by the Panama runtime. After all, a symbol in a dynamic library is, mostly a numeric offset and, unless we are using a shared library with debugging information, no type information is attached to a given library symbol. This means that, in this case, the Panama runtime has to trust our description of the strlen function. For this reason, access to the foreign linker is a restricted operation, which can only be performed if the runtime flag foreign.restricted=permit is passed on the command line of the Java launcher 1.

Finally let's talk about the life-cycle of some of the entities involved here; first, as a downcall native handle wraps a lookup symbol, the library from which the symbol has been loaded will stay loaded until there are reachable downcall handles referring to one of its symbols; in the above example, this consideration is less important, given the use of the default lookup object, which can be assumed to stay alive for the entire duration of the application.

Certain functions might return pointers, or structs; it is important to realize that if a function returns a pointer (or a MemoryAddress), no life-cycle whatsoever is attached to that pointer. It is then up to the client to e.g. free the memory associated with that pointer, or do nothing (in case the library is responsible for the life-cycle of that pointer). If a library returns a struct by value, things are different, as a fresh, memory segment is allocated off-heap and returned to the callee. In such cases, the foreign linker API will request an additional prefix SegmentAllocator (see above) parameter which will be responsible for allocating the returned segment. The allocation will likely associate the segment with a resource scope that is known to the callee and which can then be used to release the memory associated with that segment. An additional overload of downcallHandle is also provided by CLinker where a client can specify which allocator should be used in such cases at link-time.

Performance-wise, the reader might ask how efficient calling a foreign function using a native method handle is; the answer is very. The JVM comes with some special support for native method handles, so that, if a give method handle is invoked many times (e.g, inside an hot loop), the JIT compiler might decide to just generate a snippet of assembly code required to call the native function, and execute that directly. In most cases, invoking native function this way is as efficient as doing so through JNI [^3a][^3b].

Upcalls

Sometimes, it is useful to pass Java code as a function pointer to some native function; we can achieve that by using foreign linker support for upcalls. To demonstrate this, let's consider the following function from the C standard library:

This is a function that can be used to sort the contents of an array, using a custom comparator function — compar — which is passed as a function pointer. To be able to call the qsort function from Java we have first to create a downcall native method handle for it:

As before, we use C_LONG and long.class to map the C size_t type, and we use MemoryAddess.class both for the first pointer parameter (the array pointer) and the last parameter (the function pointer).

This time, in order to invoke the qsort downcall handle, we need a function pointer to be passed as the last parameter; this is where the upcall support in foreign linker comes in handy, as it allows us to create a function pointer out of an existing method handle. First, let's write a function that can compare two int elements (passed as pointers):

Here we can see that the function is performing some unsafe dereference of the pointer contents, by using the everything segment.

Now let's create a method handle pointing to the comparator function above:

Now that we have a method handle for our Java comparator function, we can create a function pointer, using the foreign linker upcall support — as for downcalls, we have to describe the signature of the foreign function pointer using the layouts in the CLinker class:

When no resource scope is specified (as in the above case), the upcall stub segment will be associated with the default scope - a non-closeable scope which does not support deterministic deallocation. This means that the upcall stub will be uninstalled when the upcall segment becomes unreachable. In cases where this is not desirable, the API also support associating a custom ResourceScope instance to the returned upcall segment.

So, we finally have all the ingredients to create an upcall segment, and pass it to the qsort downcall handle:

The above code creates a memory segment — comparFunc — containing a stub that can be used to invoke our Java comparator function. The memory segment is associated with the provided resource scope instance; this means that the stub will be uninstalled when the resource scope is closed. It is also possible (not shown here) to create upcall stubs associated with the default scope, in which case the stub will be uninstalled when the upcall segment becomes unreachable.

The snippet then creates an off-heap array from a Java array (using a SegmentAllocator), which is then passed to the qsort handle, along with the comparator function we obtained from the foreign linker. As a side-effect, after the call, the contents of the off-heap array will be sorted (as instructed by our comparator function, written in Java). We can than extract a new Java array from the segment, which contains the sorted elements. This is a more advanced example, but one that shows how powerful the native interop support provided by the foreign linker abstraction is, allowing full bidirectional interop support between Java and native.

Varargs

Some C functions are variadic and can take an arbitrary number of arguments. Perhaps the most common example of this is the printf function, defined in the C standard library:

This function takes a format string, which features zero or more holes, and then can take a number of additional arguments that is identical to the number of holes in the format string.

The foreign function support can support variadic calls, but with a caveat: the client must provide a specialized Java signature, and a specialized description of the C signature. For instance, let's say we wanted to model the following C call:

To do this using the foreign function support provided by Panama we would have to build a specialized downcall handle for that call shape [^6]:

Then we can call the specialized downcall handle as usual:

While this works, it is easy to see how such an approach is not very desirable:

To mitigate these issues, the standard C foreign linker comes equipped with support for C variable argument lists — or va_list. When a variadic function is called, C code has to unpack the variadic arguments by creating a va_list structure, and then accessing the variadic arguments through the va_list one by one (using the va_arg macro). To facilitate interop between standard variadic functions and va_list many C library functions in fact define two flavors of the same function, one using standard variadic signature, one using an extra va_list parameter. For instance, in the case of printf we can find that a va_list-accepting function performing the same task is also defined:

The behavior of this function is the same as before — the only difference is that the ellipsis notation ... has been replaced with a single va_list parameter; in other words, the function is no longer variadic.

It is indeed fairly easy to create a downcall for vprintf:

Here, the notable thing is that CLinker comes equipped with the special C_VA_LIST layout (the layout of a va_list parameter) as well as a VaList carrier, which can be used to construct and represent variable argument lists from Java code.

To call the vprintf handle we need to create an instance of VaList which contains the arguments we want to pass to the vprintf function — we can do so, as follows:

While the callee has to do more work to call the vprintf handle, note that that now we're back in a place where the downcall handle vprintf can be shared across multiple callees. Note that both the format string and the VaList are associated with the given resource scope — this means that both will remain valid throughout the native function call. As for other APIs, it is also possible (not shown here) to create a VaList associated with the default scope - meaning that the resources allocated by the VaList will remain available as long as the VaList remains reachable.

Another advantage of using VaList is that this approach also scales to upcall stubs — it is therefore possible for clients to create upcalls stubs which take a VaList and then, from the Java upcall, read the arguments packed inside the VaList one by one using the methods provided by the VaList API (e.g. VaList::vargAsInt(MemoryLayout)), which mimic the behavior of the C va_arg macro.

Appendix: full source code

The full source code containing most of the code shown throughout this document can be seen below:

 

 


-
1 In reality this is not entirely new; even in JNI, when you call a native method the VM trusts that the corresponding implementing function in C will feature compatible parameter types and return values; if not a crash might occur.
- - \ No newline at end of file diff --git a/doc/panama_ffi.md b/doc/panama_ffi.md index ccdef2a4be4..f1c7555ab84 100644 --- a/doc/panama_ffi.md +++ b/doc/panama_ffi.md @@ -1,14 +1,10 @@ ## State of foreign function support -**March 2021** - -* Rewrite section on NativeScope (now Segment allocators) and move it earlier in the doc -* Discuss life-cycle options for downcalls (struct returned by value), upcalls and valist -* Tweak examples +**May 2021** **Maurizio Cimadamore** -In this document we explore the main concepts behind Panama's foreign function support; as we shall see, the central abstraction in the foreign function support is the so called *foreign linker*, an abstraction that allows clients to construct *native* method handles — that is, method handles whose invocation targets a native function defined in some native library. As we shall see, Panama foreign function support is completely expressed in terms of Java code and no intermediate native code is required. +In this document we explore the main concepts behind Panama's foreign function support; as we shall see, the central abstraction in the foreign function support is the so called *foreign linker*, an abstraction that allows clients to construct *native* method handles — that is, method handles whose invocation targets a native function defined in some native library. In other words, Panama foreign function support is completely expressed in terms of Java code and no intermediate native code is required. ### Native addresses @@ -21,17 +17,17 @@ A memory address is just what the name implies: it encapsulates a memory address First, if the memory address is known to belong to a segment the client *already* owns, a *rebase* operation can be performed; in other words, the client can ask the address what is its offset relative to a given segment, and then proceed to dereference the original segment accordingly: ```java -MemorySegment segment = MemorySegment.allocateNative(100); +MemorySegment segment = MemorySegment.allocateNative(100, ResourceScope.newImplicitScope()); ... MemoryAddress addr = ... //obtain address from native code int x = MemoryAccess.getIntAtOffset(segment, addr.segmentOffset(segment)); ``` -Secondly, if the client does *not* have a segment which contains a given memory address, it can create one *unsafely*, using the `MemoryAddress::asSegmentRestricted`; this can also be useful to inject extra knowledge about spatial bounds which might be available in the native library the client is interacting with: +Secondly, if the client does *not* have a segment which contains a given memory address, it can create one *unsafely*, using the `MemoryAddress::asSegment`; this can also be useful to inject extra knowledge about spatial bounds which might be available in the native library the client is interacting with: ```java MemoryAddress addr = ... //obtain address from native code -MemorySegment segment = addr.asSegmentRestricted(100); +MemorySegment segment = addr.asSegment(100); int x = MemoryAccess.getInt(segment); ``` @@ -39,10 +35,10 @@ Alternatively, the client can fall back to use the so called *everything* segmen ```java MemoryAddress addr = ... //obtain address from native code -int x = MemoryAccess.getIntAtOffset(MemorySegment.ofNativeRestricted(), addr.toRawLongValue()); +int x = MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), addr.toRawLongValue()); ``` -Of course, since accessing the entire native heap is inherently *unsafe*, accessing the *everything* segment is considered a *restricted* operation which is only allowed after explicit opt in by setting the `foreign.restricted=permit` runtime flag. +Of course, since accessing the entire native heap is inherently *unsafe*, accessing the *everything* segment is considered a *restricted* operation which is only allowed if the module performing the operation is listed in the `--enable-native-access` command-line flag. `MemoryAddress`, like `MemorySegment` , implements the `Addressable` interface, which is a functional interface whose method projects an entity down to a `MemoryAddress` instance. In the case of `MemoryAddress` such a projection is the identity function; in the case of a memory segment, the projection returns the `MemoryAddres` instance for the segment's base address. This abstraction allows to pass either memory address or memory segments where an address is expected (this is especially useful when generating native bindings). @@ -51,15 +47,17 @@ Of course, since accessing the entire native heap is inherently *unsafe*, access Idiomatic C code implicitly relies on stack allocation to allow for concise variable declarations; consider this example: ```c -int arr[] = { 1, 2, 3, 4, 5 }; +int arr[] = { 0, 1, 2, 3, 4 }; ``` -Here the function `foo` takes an output parameter, a pointer to an `int` variable. This idiom can be implemented as follows, using the Foreign Memory Access API: +A variable initializer such as the one above can be implemented as follows, using the Foreign Memory Access API: ```java -MemorySegment arr = MemorySegment.allocateNative(C_INT); -for (int i = 1 ; i <= 5 ; i++) { - MemoryAccess.setInt(arr, i); +try (ResourceScope scope = ResourceScope.newConfinedScope()) { + MemorySegment arr = MemorySegment.allocateNative(MemoryLayout.sequenceLayout(5, JAVA_INT), scope); + for (int i = 0 ; i <= 5 ; i++) { + MemoryAccess.setIntAtIndex(arr, i); + } } ``` @@ -72,56 +70,45 @@ There are a number of issues with the above code snippet: To address these problems, Panama provides a `SegmentAllocator` abstraction, a functional interface which provides many useful operation to allocate commonly used values. For instance, the above code can be rewritten as follows: ```java -MemorySegment arr = SegmentAllocator.ofDefault().allocateArray(C_INT, new int[] { 1, 2, 3, 4, 5 }); -``` - -The above code retrieves the *default allocator* (an allocator built on top of `MemorySegment::allocateNative`), and then uses this allocator to create a native array which is initialized to the values `{ 1, 2, 3, 4, 5}`. The array initialization is more efficient, compared to the previous snippet, as the Java array is copied *in bulk* into the memory region associated with the newly allocated memory segment. - -Memory associated with segments returned by the default allocator is released as soon as said segments become *unreachable*. To have better control over the lifetime of the segments returned by an allocator, clients can use the so called *scoped* allocator, which returns segments associated with a given scope: - -```java -try (ResourceScope scope = ResourceScope.ofConfined()) { - MemorySegment arr = SegmentAllocator.scoped(scope).allocateArray(C_INT, new int[] { 1, 2, 3, 4, 5 }); +try (ResourceScope scope = ResourceScope.newConfinedScope()) { + MemorySegment arr = SegmentAllocator.ofScope(scope).allocateArray(JAVA_INT, new int[] { 0, 1, 2, 3, 4 }); } // 'arr' is released here ``` -Scoped allocator make sure that all segments allocated with a scoped allocator are no longer usable after the scope associated with the allocator has been closed. This makes it easier to manage multiple resources which share the same lifecycle. +The above code obtains a *scoped allocator* (an allocator built on top of `MemorySegment::allocateNative`), and then uses this allocator to create a native array which is initialized to the values `{ 0, 1, 2, 3, 4 }`. The array initialization is more efficient, compared to the previous snippet, as the Java array is copied *in bulk* into the memory region associated with the newly allocated memory segment. The scoped allocator makes sure that all segments allocated with it are no longer usable after the scope associated with the allocator has been closed. This makes it easier to manage multiple resources which share the same lifecycle. Custom segment allocators are also critical to achieve optimal allocation performance; for this reason, a number of predefined allocators are available via factories in the `SegmentAllocator` interface. For instance, it is possible to create an arena-based allocator, as follows: ```java -try (ResourceScope scope = ResourceScope.ofConfined()) { - SegmentAllocator allocator = SegmentAllocator.arenaUnbounded(scope); +try (ResourceScope scope = ResourceScope.newConfinedScope()) { + SegmentAllocator allocator = SegmentAllocator.arenaAllocator(scope); for (int i = 0 ; i < 100 ; i++) { - allocator.allocateArray(C_INT, new int[] { 1, 2, 3, 4, 5 }); + allocator.allocateArray(JAVA_INT, new int[] { 0, 1, 2, 3, 4 }); } ... } // all memory allocated is released here ``` -The above code creates a confined scope; inside the *try-with-resources*, a new unbounded arena allocation is created, associated with the existing scope. The allocator will allocate slabs of memory, of a specific size, and respond to allocation request by returning different slices of the pre-allocated slab. If a slab does not have sufficient space to accommodate a new allocation request, a new one will be allocated. If the scope associated with the arena allocator is closed, all memory associated with the segments created by the allocator (see the body of the `for` loop) will be deallocated at once. This idiom combines the advantages of deterministic deallocation (provided by the Memory Access API) with a more flexible and scalable allocation scheme, and can be very useful when writing large applications. +The above code creates a confined scope; inside the *try-with-resources*, a new unbounded arena allocation is created, associated with the existing scope. The allocator will allocate slabs of memory, of a specific size, and respond to allocation requests by returning different slices of the pre-allocated slab. If a slab does not have sufficient space to accommodate a new allocation request, a new one will be allocated. If the scope associated with the arena allocator is closed, all memory associated with the segments created by the allocator (see the body of the `for` loop) will be deallocated at once. This idiom combines the advantages of deterministic deallocation (provided by the Memory Access API) with a more flexible and scalable allocation scheme, and can be very useful when writing large applications. For these reasons, all the methods in the Foreign Linker API which *produce* memory segments (see `CLinker::toCString`), allow an optional allocator to be provided by user code — this is key in ensuring that an application using the Foreign Linker API achieves optimal allocation performances, especially in non-trivial use cases. ### Symbol lookups -The first ingredient of any foreign function support is a mechanism to lookup symbols in native libraries. In traditional Java/JNI, this is done via the `System::loadLibrary` and `System::load` methods, which internally map into calls to `dlopen`. In Panama, library lookups are modeled more directly, using a class called`LibraryLookup` (similar to a method handle lookup), which provides capabilities to lookup named symbols in a given native library; we can obtain a library lookup in 3 different ways: +The first ingredient of any foreign function support is a mechanism to lookup symbols in native libraries. In traditional Java/JNI, this is done via the `System::loadLibrary` and `System::load` methods, which internally map into calls to `dlopen`. Unfortunately, these methods do not provide a way for clients to obtain the address associated with a given library symbol. For this reason, the Foreign Linker API introduces a new abstraction, namely `SymbolLookup` (similar in spirit to a method handle lookup), which provides capabilities to lookup named symbols; we can obtain a symbol lookup in 2 different ways 1: -* `LibraryLookup::ofDefault` — returns the library lookup which can *see* all the symbols that have been loaded with the VM -* `LibraryLookup::ofPath` — creates a library lookup associated with the library found at the given absolute path -* `LibraryLookup::ofLibrary` — creates a library lookup associated with the library with given name (this might require setting the `java.library.path` variable accordingly) +* `SymbolLookup::loaderLookup` — creates a symbol lookup which can be used to search symbols in all the libraries loaded by the caller's classloader (e.g. using `System::loadLibrary` or `System::load`) +* `CLinker::getSystemLookup` — returns a platform-specific symbol lookup which can be used e.g. to search symbols in the standard C library -Once a lookup has been obtained, a client can use it to retrieve handles to library symbols (either global variables or functions) using the `lookup(String)` method, which returns an `Optional`. A lookup symbol is just a proxy for a memory address (in fact, it implements `Addressable`) and a name. +Once a lookup has been obtained, a client can use it to retrieve handles to library symbols (either global variables or functions) using the `lookup(String)` method, which returns an `Optional`. For instance, the following code can be used to lookup the `clang_getClangVersion` function provided by the `clang` library: ```java -LibraryLookup libclang = LibraryLookup.ofLibrary("clang"); -LibraryLookup.Symbol clangVersion = libclang.lookup("clang_getClangVersion").get(); +System.loadLibrary("clang"); +MemoryAddress clangVersion = SymbolLookup.loaderLookup().lookup("clang_getClangVersion").get(); ``` -One crucial distinction between the library loading support of the Foreign Linker API and of JNI is that JNI libraries are loaded into a class loader. Furthermore, to preserve [classloader integrity](https://docs.oracle.com/javase/7/docs/technotes/guides/jni/jni-12.html#libmanage) integrity, the same JNI library cannot be loaded into more than one classloader. The foreign function support described here is more primitive — the Foreign Linker API allows clients to target native libraries directly, without any intervening JNI code. Crucially, Java objects are never passed to and from native code by the Foreign Linker API. Because of this, libraries loaded through the `LibraryLookup` hook are not tied to any class loader and can be (re)loaded as many times as needed. - ### C Linker At the core of Panama foreign function support we find the `CLinker` abstraction. This abstraction plays a dual role: first, for downcalls, it allows to model native function calls as plain `MethodHandle` calls (see `ForeignLinker::downcallHandle`); second, for upcalls, it allows to convert an existing `MethodHandle` (which might point to some Java method) into a `MemorySegment` which could then be passed to native functions as a function pointer (see `ForeignLinker::upcallStub`): @@ -152,7 +139,7 @@ Another similarity between `downcallHandle` and `upcallStub` is that they both a | ------------- | ---------------- | | `C_BOOL` | `byte` | | `C_CHAR` | `byte` | -| `C_SHORT` | `short` | +| `C_SHORT` | `short`, `char` | | `C_INT` | `int` | | `C_LONG` | `long` | | `C_LONGLONG` | `long` | @@ -180,27 +167,27 @@ In order to do that, we have to: * select a Java signature we want to *overlay* on the native function — this will be the signature that clients of the native method handles will interact with * create a *downcall* native method handle with the above information, using the standard C foreign linker -Here's an example of how we might want to do that (a full listing of all the examples in this and subsequent sections will be provided in the [appendix](#appendix: full-source-code)): +Here's an example of how we might want to do that (a full listing of all the examples in this and subsequent sections will be provided in the [appendix](#appendix-full-source-code)): ```java MethodHandle strlen = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("strlen").get(), + CLinker.systemLookup().lookup("strlen").get(), MethodType.methodType(long.class, MemoryAddress.class), FunctionDescriptor.of(C_LONG, C_POINTER) ); ``` -Note that, since the function `strlen` is part of the standard C library, which is loaded with the VM, we can just use the default lookup to look it up. The rest is pretty straightforward — the only tricky detail is how to model `size_t`: typically this type has the size of a pointer, so we can use `C_LONG` on Linux, but we'd have to use `C_LONGLONG` on Windows. On the Java side, we model the `size_t` using a `long` and the pointer is modeled using a `MemoryAddress` parameter. +Note that, since the function `strlen` is part of the standard C library, which is loaded with the VM, we can just use the system lookup to look it up. The rest is pretty straightforward — the only tricky detail is how to model `size_t`: typically this type has the size of a pointer, so we can use `C_LONG` on Linux, but we'd have to use `C_LONGLONG` on Windows. On the Java side, we model the `size_t` using a `long` and the pointer is modeled using a `MemoryAddress` parameter. One we have obtained the downcall native method handle, we can just use it as any other method handle: ```java -try (ResourceScope scope = ResourceScope.ofConfined()) { +try (ResourceScope scope = ResourceScope.newConfinedScope()) { long len = strlen.invokeExact(CLinker.toCString("Hello", scope).address()); // 5 } ``` -Here we are using one of the helper methods in `CLinker` to convert a Java string into an off-heap memory segment which contains a `NULL` terminated C string. We then pass that segment to the method handle and retrieve our result in a Java `long`. Note how all this has been possible *without* any piece of intervening native code — all the interop code can be expressed in (low level) Java. Note also how we used an explicit resource scope to control the lifecycle of the allocated C string; while using the implicit *default scope* is an option, extra care must be taken when using segments featuring implicitly deallocation which are then converted into `MemoryAddress` instances: since the address is eventually converted (by the linker support) into a raw Java long, there is no guarantee that the memory segment would be kept *reachable* for the entire duration of the native call. +Here we are using one of the helper methods in `CLinker` to convert a Java string into an off-heap memory segment which contains a `NULL` terminated C string. We then pass that segment to the method handle and retrieve our result in a Java `long`. Note how all this has been possible *without* any piece of intervening native code — all the interop code can be expressed in (low level) Java. Note also how we used an explicit resource scope to control the lifecycle of the allocated C string, which ensures timely deallocation of the memory segment holding the native string. The `CLinker` interfaces also supports linking of native function without an address known at link time; when that happens, an address must be provided when the method handle returned by the linker is invoked - this is very useful to support *virtual calls*. For instance, the above code can be rewritten as follows: @@ -210,21 +197,19 @@ MethodHandle strlen_virtual = CLinker.getInstance().downcallHandle( // address p FunctionDescriptor.of(C_LONG, C_POINTER) ); -try (ResourceScope scope = ResourceScope.ofConfined()) { +try (ResourceScope scope = ResourceScope.newConfinedScope()) { long len = strlen_virtual.invokeExact( - LibraryLookup.ofDefault().lookup("strlen").get() // address provided here! + (Addressable)CLinker.systemLookup().lookup("strlen").get() // address provided here! CLinker.toCString("Hello", scope).address() ); // 5 } ``` -Now that we have seen the basics of how foreign function calls are supported in Panama, let's add some additional considerations. First, it is important to note that, albeit the interop code is written in Java, the above code can *not* be considered 100% safe. There are many arbitrary decisions to be made when setting up downcall method handles such as the one above, some of which might be obvious to us (e.g. how many parameters does the function take), but which cannot ultimately be verified by the Panama runtime. After all, a symbol in a dynamic library is, mostly a numeric offset and, unless we are using a shared library with debugging information, no type information is attached to a given library symbol. This means that, in this case, the Panama runtime has to *trust* our description of the `strlen` function. For this reason, access to the foreign linker is a restricted operation, which can only be performed if the runtime flag `foreign.restricted=permit` is passed on the command line of the Java launcher 1. - -Finally let's talk about the life-cycle of some of the entities involved here; first, as a downcall native handle wraps a lookup symbol, the library from which the symbol has been loaded will stay loaded until there are reachable downcall handles referring to one of its symbols; in the above example, this consideration is less important, given the use of the default lookup object, which can be assumed to stay alive for the entire duration of the application. +Now that we have seen the basics of how foreign function calls are supported in Panama, let's add some additional considerations. First, it is important to note that, albeit the interop code is written in Java, the above code can *not* be considered 100% safe. There are many arbitrary decisions to be made when setting up downcall method handles such as the one above, some of which might be obvious to us (e.g. how many parameters does the function take), but which cannot ultimately be verified by the Panama runtime. After all, a symbol in a dynamic library is, mostly a numeric offset and, unless we are using a shared library with debugging information, no type information is attached to a given library symbol. This means that, in this case, the Panama runtime has to *trust* our description of the `strlen` function. For this reason, access to the foreign linker is a restricted operation, which can only be performed if the requesting module is listed in the `--enable-native-access` command-line flag 2. Certain functions might return pointers, or structs; it is important to realize that if a function returns a pointer (or a `MemoryAddress`), no life-cycle whatsoever is attached to that pointer. It is then up to the client to e.g. free the memory associated with that pointer, or do nothing (in case the library is responsible for the life-cycle of that pointer). If a library returns a struct by value, things are different, as a *fresh*, memory segment is allocated off-heap and returned to the callee. In such cases, the foreign linker API will request an additional prefix `SegmentAllocator` (see above) parameter which will be responsible for allocating the returned segment. The allocation will likely associate the segment with a *resource scope* that is known to the callee and which can then be used to release the memory associated with that segment. An additional overload of `downcallHandle` is also provided by `CLinker` where a client can specify which allocator should be used in such cases at *link-time*. -Performance-wise, the reader might ask how efficient calling a foreign function using a native method handle is; the answer is *very*. The JVM comes with some special support for native method handles, so that, if a give method handle is invoked many times (e.g, inside an *hot* loop), the JIT compiler might decide to just generate a snippet of assembly code required to call the native function, and execute that directly. In most cases, invoking native function this way is as efficient as doing so through JNI 3a3b. +Performance-wise, the reader might ask how efficient calling a foreign function using a native method handle is; the answer is *very*. The JVM comes with some special support for native method handles, so that, if a give method handle is invoked many times (e.g, inside an *hot* loop), the JIT compiler might decide to just generate a snippet of assembly code required to call the native function, and execute that directly. In most cases, invoking native function this way is as efficient as doing so through JNI 3. ### Upcalls @@ -239,7 +224,7 @@ This is a function that can be used to sort the contents of an array, using a cu ```java MethodHandle qsort = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("qsort").get(), + CLinker.systemLookup().lookup("qsort").get(), MethodType.methodType(void.class, MemoryAddress.class, long.class, long.class, MemoryAddress.class), FunctionDescriptor.ofVoid(C_POINTER, C_LONG, C_LONG, C_POINTER) ); @@ -252,8 +237,8 @@ This time, in order to invoke the `qsort` downcall handle, we need a *function p ```java class Qsort { static int qsortCompare(MemoryAddress addr1, MemoryAddress addr2) { - return MemoryAccess.getIntAtOffset(MemorySegment.ofNativeRestricted(), addr1.toRawLongValue()) - - MemoryAccess.getIntAtOffset(MemorySegment.ofNativeRestricted(), addr2.toRawLongValue()); + return MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), addr1.toRawLongValue()) - + MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), addr2.toRawLongValue()); } } ``` @@ -268,33 +253,21 @@ MethodHandle comparHandle = MethodHandles.lookup() MethodType.methodType(int.class, MemoryAddress.class, MemoryAddress.class)); ``` -Now that we have a method handle for our Java comparator function, we can create a function pointer, using the foreign linker upcall support — as for downcalls, we have to describe the signature of the foreign function pointer using the layouts in the `CLinker` class: +Now that we have a method handle for our Java comparator function, we finally have all the ingredients to create an upcall segment, and pass it to the `qsort` downcall handle: ```java -MemorySegment comparFunc = CLinker.getInstance().upcallStub( - comparHandle, - FunctionDescriptor.of(C_INT, C_POINTER, C_POINTER) -); -``` - -When no resource scope is specified (as in the above case), the upcall stub segment will be associated with the *default scope* - a non-closeable scope which does not support deterministic deallocation. This means that the upcall stub will be uninstalled when the upcall segment becomes *unreachable*. In cases where this is not desirable, the API also support associating a custom `ResourceScope` instance to the returned upcall segment. - -So, we finally have all the ingredients to create an upcall segment, and pass it to the `qsort` downcall handle: - -```java -try (ResourceScope scope = ResourceScope.ofConfined()) { +try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment comparFunc = CLinker.getInstance().upcallStub( comparHandle, FunctionDescriptor.of(C_INT, C_POINTER, C_POINTER), - scope - ); - MemorySegment array = SegmentAllocator.scoped(scope).allocateArray(new int[] { 0, 9, 3, 4, 6, 5, 1, 8, 2, 7 })); - qsort.invokeExact(array.address(), 10L, 4L, comparFunc.address()); + scope); + MemorySegment array = SegmentAllocator.ofScope(scope).allocateArray(new int[] { 0, 9, 3, 4, 6, 5, 1, 8, 2, 7 })); + qsort.invokeExact(array.address(), 10L, 4L, comparFunc); int[] sorted = array.toIntArray(); // [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] } ``` -The above code creates a memory segment — `comparFunc` — containing a stub that can be used to invoke our Java comparator function. The memory segment is associated with the provided resource scope instance; this means that the stub will be uninstalled when the resource scope is closed. It is also possible (not shown here) to create upcall stubs associated with the *default scope*, in which case the stub will be uninstalled when the upcall segment becomes *unreachable*. +The above code creates a memory segment — `comparFunc` — containing a stub that can be used to invoke our Java comparator function. The memory segment is associated with the provided resource scope instance; this means that the stub will be uninstalled when the resource scope is closed. It is also possible (not shown here) to create upcall stubs associated with an *implicit scope*, in which case the stub will be uninstalled when the upcall segment becomes *unreachable*. The snippet then creates an off-heap array from a Java array (using a `SegmentAllocator`), which is then passed to the `qsort` handle, along with the comparator function we obtained from the foreign linker. As a side-effect, after the call, the contents of the off-heap array will be sorted (as instructed by our comparator function, written in Java). We can than extract a new Java array from the segment, which contains the sorted elements. This is a more advanced example, but one that shows how powerful the native interop support provided by the foreign linker abstraction is, allowing full bidirectional interop support between Java and native. @@ -314,11 +287,11 @@ The foreign function support can support variadic calls, but with a caveat: the printf("%d plus %d equals %d", 2, 2, 4); ``` -To do this using the foreign function support provided by Panama we would have to build a *specialized* downcall handle for that call shape 6: +To do this using the foreign function support provided by Panama we would have to build a *specialized* downcall handle for that call shape 4: ```java MethodHandle printf = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("printf").get(), + CLinker.systemLookup().lookup("printf").get(), MethodType.methodType(int.class, MemoryAddress.class, int.class, int.class, int.class), FunctionDescriptor.of(C_INT, C_POINTER, C_INT, C_INT, C_INT) ); @@ -347,7 +320,7 @@ It is indeed fairly easy to create a downcall for `vprintf`: ```java MethodHandle vprintf = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("vprintf").get(), + CLinker.systemLookup().lookup("vprintf").get(), MethodType.methodType(int.class, MemoryAddress.class, VaList.class), FunctionDescriptor.of(C_INT, C_POINTER, C_VA_LIST)); ``` @@ -357,7 +330,7 @@ Here, the notable thing is that `CLinker` comes equipped with the special `C_VA_ To call the `vprintf` handle we need to create an instance of `VaList` which contains the arguments we want to pass to the `vprintf` function — we can do so, as follows: ```java -try (ResourceScope scope = ResourceScope.ofConfined()) { +try (ResourceScope scope = ResourceScope.newConfinedScope()) { vprintf.invoke( CLinker.toCString("%d plus %d equals %d", scope).address(), VaList.make(builder -> @@ -367,7 +340,7 @@ try (ResourceScope scope = ResourceScope.ofConfined()) { ); //prints "2 plus 2 equals 4" ``` -While the callee has to do more work to call the `vprintf` handle, note that that now we're back in a place where the downcall handle `vprintf` can be shared across multiple callees. Note that both the format string and the `VaList` are associated with the given resource scope — this means that both will remain valid throughout the native function call. As for other APIs, it is also possible (not shown here) to create a `VaList` associated with the *default scope* - meaning that the resources allocated by the `VaList` will remain available as long as the `VaList` remains *reachable*. +While the callee has to do more work to call the `vprintf` handle, note that that now we're back in a place where the downcall handle `vprintf` can be shared across multiple callees. Note that both the format string and the `VaList` are associated with the given resource scope — this means that both will remain valid throughout the native function call. As for other APIs, it is also possible (not shown here) to create a `VaList` associated with an *implicit scope* - meaning that the resources allocated by the `VaList` will remain available as long as the `VaList` remains *reachable*. Another advantage of using `VaList` is that this approach also scales to upcall stubs — it is therefore possible for clients to create upcalls stubs which take a `VaList` and then, from the Java upcall, read the arguments packed inside the `VaList` one by one using the methods provided by the `VaList` API (e.g. `VaList::vargAsInt(MemoryLayout)`), which mimic the behavior of the C `va_arg` macro. @@ -379,11 +352,12 @@ The full source code containing most of the code shown throughout this document import jdk.incubator.foreign.Addressable; import jdk.incubator.foreign.CLinker; import jdk.incubator.foreign.FunctionDescriptor; -import jdk.incubator.foreign.LibraryLookup; +import jdk.incubator.foreign.SymbolLookup; import jdk.incubator.foreign.MemoryAccess; import jdk.incubator.foreign.MemoryAddress; import jdk.incubator.foreign.MemorySegment; -import jdk.incubator.foreign.NativeScope; +import jdk.incubator.foreign.ResourceScope; +import jdk.incubator.foreign.SegmentAllocator; import java.lang.invoke.MethodHandle; import java.lang.invoke.MethodHandles; @@ -396,6 +370,7 @@ public class Examples { public static void main(String[] args) throws Throwable { strlen(); + strlen_virtual(); qsort(); printf(); vprintf(); @@ -403,12 +378,12 @@ public class Examples { public static void strlen() throws Throwable { MethodHandle strlen = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("strlen").get(), + CLinker.systemLookup().lookup("strlen").get(), MethodType.methodType(long.class, MemoryAddress.class), FunctionDescriptor.of(C_LONG, C_POINTER) ); - try (ResourceScope scope = ResourceScope.ofConfined()) { + try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment hello = CLinker.toCString("Hello", scope); long len = (long) strlen.invokeExact(hello.address()); // 5 System.out.println(len); @@ -421,10 +396,10 @@ public class Examples { FunctionDescriptor.of(C_LONG, C_POINTER) ); - try (ResourceScope scope = ResourceScope.ofConfined()) { + try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment hello = CLinker.toCString("Hello", scope); long len = (long) strlen_virtual.invokeExact( - LibraryLookup.ofDefault().lookup("strlen").get(), + (Addressable)CLinker.systemLookup().lookup("strlen").get(), hello.address()); // 5 System.out.println(len); } @@ -432,15 +407,15 @@ public class Examples { static class Qsort { static int qsortCompare(MemoryAddress addr1, MemoryAddress addr2) { - int v1 = MemoryAccess.getIntAtOffset(MemorySegment.ofNativeRestricted(), addr1.toRawLongValue()); - int v2 = MemoryAccess.getIntAtOffset(MemorySegment.ofNativeRestricted(), addr2.toRawLongValue()); + int v1 = MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), addr1.toRawLongValue()); + int v2 = MemoryAccess.getIntAtOffset(MemorySegment.globalNativeSegment(), addr2.toRawLongValue()); return v1 - v2; } } public static void qsort() throws Throwable { MethodHandle qsort = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("qsort").get(), + CLinker.systemLookup().lookup("qsort").get(), MethodType.methodType(void.class, MemoryAddress.class, long.class, long.class, MemoryAddress.class), FunctionDescriptor.ofVoid(C_POINTER, C_LONG, C_LONG, C_POINTER) ); @@ -449,14 +424,14 @@ public class Examples { .findStatic(Qsort.class, "qsortCompare", MethodType.methodType(int.class, MemoryAddress.class, MemoryAddress.class)); - try (ResourceScope scope = ResourceScope.ofConfined()) { - MemorySegment comparFunc = CLinker.getInstance().upcallStub( + try (ResourceScope scope = ResourceScope.newConfinedScope()) { + MemoryAddress comparFunc = CLinker.getInstance().upcallStub( comparHandle, FunctionDescriptor.of(C_INT, C_POINTER, C_POINTER), scope); - - MemorySegment array = SegmentAllocator.scoped(scope) + + MemorySegment array = SegmentAllocator.ofScope(scope) .allocateArray(C_INT, new int[] { 0, 9, 3, 4, 6, 5, 1, 8, 2, 7 }); - qsort.invokeExact(array.address(), 10L, 4L, comparFunc.address()); + qsort.invokeExact(array.address(), 10L, 4L, comparFunc); int[] sorted = array.toIntArray(); // [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] System.out.println(Arrays.toString(sorted)); } @@ -464,11 +439,11 @@ public class Examples { public static void printf() throws Throwable { MethodHandle printf = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("printf").get(), + CLinker.systemLookup().lookup("printf").get(), MethodType.methodType(int.class, MemoryAddress.class, int.class, int.class, int.class), FunctionDescriptor.of(C_INT, C_POINTER, C_INT, C_INT, C_INT) ); - try (ResourceScope scope = ResourceScope.ofConfined()) { + try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment s = CLinker.toCString("%d plus %d equals %d\n", scope); printf.invoke(s.address(), 2, 2, 4); } @@ -477,11 +452,11 @@ public class Examples { public static void vprintf() throws Throwable { MethodHandle vprintf = CLinker.getInstance().downcallHandle( - LibraryLookup.ofDefault().lookup("vprintf").get(), + CLinker.systemLookup().lookup("vprintf").get(), MethodType.methodType(int.class, MemoryAddress.class, CLinker.VaList.class), FunctionDescriptor.of(C_INT, C_POINTER, C_VA_LIST)); - try (ResourceScope scope = ResourceScope.ofConfined()) { + try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment s = CLinker.toCString("%d plus %d equals %d\n", scope); CLinker.VaList vlist = CLinker.VaList.make(builder -> builder.vargFromInt(C_INT, 2) @@ -495,8 +470,8 @@ public class Examples { -* (1): In reality this is not entirely new; even in JNI, when you call a `native` method the VM trusts that the corresponding implementing function in C will feature compatible parameter types and return values; if not a crash might occur. -* (2): As an advanced option, Panama allows the user to opt-in to remove Java to native thread transitions; while, in the general case it is unsafe doing so (removing thread transitions could have a negative impact on GC for long running native functions, and could crash the VM if the downcall needs to pop back out in Java, e.g. via an upcall), greater efficiency can be achieved; performance sensitive users should consider this option at least for the functions that are called more frequently, assuming that these functions are *leaf* functions (e.g. do not go back to Java via an upcall) and are relatively short-lived. -* (3): On Windows, layouts for variadic arguments have to be adjusted using the `CLinker.Win64.asVarArg(ValueLayout)`; this is necessary because the Windows ABI passes variadic arguments using different rules than the ones used for ordinary arguments. - +* (1): In the future, we might add more ways to obtain a symbol lookup - for instance: ``` SymbolLookup.ofLibrary(String libName, ResourceScope scope) ``` . This would allow developers to load a library and associate its lifecycle with a `ResourceScope` (rather than a classloader). That is, when the scope is closed, the library will be unloaded. However, adding these new mode will require some additional foundational work on the `CLinker` support - as we need to make sure that the memory address used by a downcall method handle cannot be unloaded while the downcall method handle is being invoked. +* (2): In reality this is not entirely new; even in JNI, when you call a `native` method the VM trusts that the corresponding implementing function in C will feature compatible parameter types and return values; if not a crash might occur. +* (3): As an advanced option, Panama allows the user to opt-in to remove Java to native thread transitions; while, in the general case it is unsafe doing so (removing thread transitions could have a negative impact on GC for long running native functions, and could crash the VM if the downcall needs to pop back out in Java, e.g. via an upcall), greater efficiency can be achieved; performance sensitive users should consider this option at least for the functions that are called more frequently, assuming that these functions are *leaf* functions (e.g. do not go back to Java via an upcall) and are relatively short-lived. +* (4): On Windows, layouts for variadic arguments have to be adjusted using the `CLinker.Win64.asVarArg(ValueLayout)`; this is necessary because the Windows ABI passes variadic arguments using different rules than the ones used for ordinary arguments. diff --git a/doc/panama_memaccess.html b/doc/panama_memaccess.html deleted file mode 100644 index 384f9792e88..00000000000 --- a/doc/panama_memaccess.html +++ /dev/null @@ -1,630 +0,0 @@ - - - - -foreign-memaccess_v2 - - -

State of foreign memory support

March 2021

  • Reorganize the document, starting from simpler use cases to more advanced ones
  • Reorganized section on shared segments and confinement into a brand new section on deterministic deallocation

Maurizio Cimadamore

A crucial part of any native interop story lies in the ability of accessing off-heap memory in an efficient fashion. Panama achieves this goal through the so called Foreign Memory Access API. This API has been made available as an incubating API in Java 14 , 15 and 16 and is, to date, the most mature part of the Panama interop story.

Segments

Memory segments are abstractions which can be used to model contiguous memory regions, located either on- or off- the Java heap. Segments can be allocated from native memory (e.g. like a malloc), or can be wrapped around existing memory sources (e.g. a Java array or a ByteBuffer). Memory segments provide strong spatial, temporal and thread-confinement guarantees which make memory dereference operation safe (more on that later), although in most simple cases some of the properties of memory segments can safely be ignored.

For instance, the following snippet allocates 100 bytes off-heap:

The above code allocates a 100-bytes long memory segment. The segment memory will not be freed as long as the segment instance is deemed reachable. In other words, the above factory creates a segment whose behavior closely matches that of a ByteBuffer allocated with the allocateDirect factory. Of course, the memory access API also supports deterministic memory release; we will cover that in a later section of this document.

Memory segments support slicing — that is, given a segment, it is possible to create a new segment whose spatial bounds are stricter than that of the original segment:

The above code creates a slice that starts at offset 4 and has a length of 4 bytes. Generally speaking, slices have the same temporal bounds as the parent segment (we will refine this concept later in this document). In this example, the memory associated with the parent segment will not be released as long as there is at least one reachable slice derived from that segment.

The Foreign Memory Access API provides ready-made static accessors in the MemoryAccess class, which allows to dereference a segment in various ways. The following example reads pair of 32-bit values (as Java ints) and uses them to construct an array of points:

The above snippet allocates a flat array of 80 bytes using MemorySegment::allocateNative. Then, inside the loop, elements in the array are accessed using the MemoryAccess.getIntAtIndex method, which accesses int elements in a segment at a certain logical index (in other words, the segment offset being accessed is obtained by multiplying the index by 4, which is the stride of a Java int array). Thus, all coordinates x and y are collected into instances of a Point record.

Memory segments are pretty flexible when it comes to interacting with existing memory sources and APIs. For instance it is possible to create a ByteBuffer view out of an existing memory segment, as follows:

Creating buffer views out of existing segment is a crucial tool enabling interoperability with existing API (especially those dealing with I/O) which might be expressed in terms of the ByteBuffer API.

Layouts and structured access

Expressing byte offsets (as in the example above) can lead to code that is hard to read, and very fragile — as memory layout invariants are captured, implicitly, in the constants used to scale offsets. To address this issue, we add a memory layout API which allows clients to define memory layouts programmatically. For instance, the layout of the array used in the above example can be expressed using the following code 1:

That is, our layout is a repetition of 10 struct elements, each struct element containing two 32-bit values each. The advantage of defining a memory layout upfront, using an API, is that we can then query the layout — for instance we can compute the offset of the y coordinate in the 4th element of the points array:

To specify which nested layout element should be used for the offset calculation we use a so called layout path - that is, a selection expression that navigates the layout, from the root layout, down to the leaf layout we wish to select; in this case we need to select the 4th layout element in the sequence, and then select the layout named y inside the selected group layout.

One of the things that can be derived from a layout is a so called memory access var handle. A memory access var handle is a special kind of var handle which takes a memory segment access coordinate, together with a byte offset — the offset, relative to the segment's base address at which the dereference operation should occur. With memory access var handles we can rewrite our example above as follows:

In the above, xHandle and yHandle are two var handle instances whose type is int and which takes two access coordinates:

  1. a MemorySegment instance; the segment whose memory should be dereferenced
  2. a logical index, which is used to select the element of the sequence we want to access (as the layout path used to construct these var handles contains one free dimension)

Note that memory access var handles (as any other var handle) are strongly typed; and to get maximum efficiency, it is generally necessary to introduce casts to make sure that the access coordinates match the expected types — in this case we have to cast i into a long; similarly, since the signature polymorphic method VarHandle::get notionally returns Object a cast is necessary to force the right return type the var handle operation 2.

In other words, manual offset computation is no longer needed — offsets and strides can in fact be derived from the layout object; note how yHandle is able to compute the required offset of the y coordinate in the flat array without the need of any error-prone arithmetic computation.

Deterministic deallocation

In addition to spatial bounds, memory segments also feature temporal bounds as well as thread-confinement. In the examples shown so far, we have always used the API in its simpler form, leaving the runtime to handle details such as whether it was safe or not to reclaim memory associated with a given memory segment. But there are cases where this behavior is not desirable: consider the case where a large memory segment is mapped from a file (this is possible using MemorySegment::map); in this case, an application would probably prefer to deterministically release (e.g. unmap) the memory associated with this segment, to ensure that memory doesn't remain available for longer than in needs to (and therefore potentially impacting the performance of the application).

Memory segments support deterministic deallocation, through an abstraction called ResourceScope. A resource scope models the lifecycle associated with one or more resources (in this document, by resources we mean mostly memory segments); a resource scope has a state: it starts off in the alive state, which means that all the resources it manages can be safely accessed - and, at the user request, it can be closed. After a resource scope is closed, access to resources managed by that scope is no longer allowed. Resource scope support the AutoCloseable interface, which means that user can use resource scopes with the try-with-resources construct, as demonstrated in the following code:

Here, we create a new confined resource scope, which is then used when creating a mapped segment; this means that the lifecycle of the mapped segment will be tied to that of the resource scope, and that accessing the segment (e.g. dereference) after scope has been closed will not be possible.

As this example alludes to, resource scopes can come in many flavors: they can be confined (where access is restricted to the thread which created the scope), shared 3 (where access can occur in any thread) and can be optionally associated with a Cleaner object, which would take care of performing implicit deallocation, in case the resource scope becomes unreachable and the close method has not been called by the user. In fact, all the memory segments we have seen previously were associated with the so called default scope: a shared scope which does not support deterministic deallocation (e.g. calling close will fail), and whose resources are managed by a Cleaner.

Resource scopes are very handy when managing the lifecycle of multiple resources:

Here we create another confined scope, and then, inside the try-with-resources we use the scope to create many segments; all such segments share the same resource scope - meaning that when such scope is closed, the memory associated with all these segments will be reclaimed at once.

Dealing with shared access and deterministic deallocation at the same time is tricky, and poses new problems for the user code; consider the case where a method receives a segment and has to write two values in that segment (e.g. two point coordinates):

If the segment is associated with a confined scope, no problem arises: the thread that created the segment is the same thread that performs the dereference - as such, when writePoint is called, the segment's scope is either alive (and will remain so for the duration of the call), or already closed (in which case some exception will be thrown, and no value will be written).

But, if the segment is associated with a shared scope, there is a new problem we are faced with: the segment might be closed (concurrently) in between the two accesses! This means that, the method ends up writing only one value instead of two; in other words, the behavior of the method is no longer atomic. Note that this cannot happen in the case where the scope is shared but associated with the default scope - as that scope does not support explicit deallocation.

To avoid this problem, clients can acquire a so called resource scope handle. A resource scope handle effectively prevents a scope to be closed, until said handle is released by the application. Let's illustrate how that works in practice:

A resource scope handle acts as a more restricted version 4 of an atomic reference count; each time a scope is acquired, its acquired count goes up; conversely the count goes down each time an handle associated with that scope is released. A scope can only be closed if its acquired count is exactly zero - meaning that no other client is attempting to access that (shared) segment. In our example above, the semantics of resource scope handles guarantees that the method will be able to either acquire the handle successfully, and write both values, or fail to acquire the handle, and write no value.

Parallel processing

The contents of a memory segment can be processed in parallel (e.g. using a framework such as Fork/Join) — by obtaining a Spliterator instance out of a memory segment. For instance to sum all the 32 bit values of a memory segment in parallel, we can use the following code:

The MemorySegment::spliterator takes a segment, a sequence layout and returns a spliterator instance which splits the segment into chunks which corresponds to the elements in the provided sequence layout. Here, we want to sum elements in an array which contains a million of elements; now, doing a parallel sum where each computation processes exactly one element would be inefficient, so instead we use the layout API to derive a bulk sequence layout. The bulk layout is a sequence layout which has the same size of the original layouts, but where the elements are arranged into groups of 100 elements — which should make it more amenable to parallel processing.

Once we have the spliterator, we can use it to construct a parallel stream and sum the contents of the segment in parallel. Since the segment operated upon by the spliterator is shared, the segment can be accessed from multiple threads concurrently; the spliterator API ensures that the access occurs in a regular fashion: a slice is created from the original segment, and given to a thread to perform some computation — thus ensuring that no two threads can ever operate concurrently on the same memory region.

Combining memory access handles

We have seen in the previous sections how memory access var handle dramatically simplify user code when structured access is involved. While deriving memory access var handles from layout is the most convenient option, the Foreign Memory Access API also allows to create such memory access var handles in a standalone fashion, as demonstrated in the following code:

The above code creates a memory access var handle which reads/writes int values at a certain byte offset in a segment. To create this var handle we have to specify a carrier type — the type we want to use e.g. to extract values from memory, as well as to whether any byte swapping should be applied when contents are read from or stored to memory. Additionally, the user can supply an extra alignment parameter (not shown here) — this can be useful to impose additional constraints on how memory dereferences should occur; for instance, a client might want to prevent access to misaligned 32 bit values. Of course, when deriving memory access var handles from layouts, all the above information can more simply inferred from the selected layout.

The attentive reader might have noted how rich the var handles returned by the layout API are, compared to the simple memory access var handle we have constructed above. How do we go from a simple access var handle that takes a byte offset to a var handle that can dereference a complex layout path? The answer is, by using var handle combinators. Developers familiar with the method handle API know how simpler method handles can be combined into more complex ones using the various combinator methods in the MethodHandles API. These methods allow, for instance, to insert (or bind) arguments into a target method handle, filter return values, permute arguments and much more.

Sadly, none of these features are available when working with var handles. The Foreign Memory Access API rectifies this, by adding a rich set of var handle combinators in the MemoryHandles class; with these tools, developers can express var handle transformations such as:

  • mapping a var handle carrier type into a different one, using an embedding/projection method handle pairs
  • filter one or more var handle access coordinates using unary filters
  • permute var handle access coordinates
  • bind concrete access coordinates to an existing var handle

Without diving too deep, let's consider how we might want to take a basic memory access handle and turn it into a var handle which dereference a segment at a specific offset (again using the points layout defined previously):

We have been able to derive, from a basic memory access var handle, a new var handle that dereferences a segment at a given fixed offset. It is easy to see how other, richer, var handles obtained using the layout API can be constructed manually using the var handle combinator API.

Unsafe segments

The memory access API provides basic safety guarantees for all memory segments created using the API. More specifically, dereferencing memory should either succeed, or result in a runtime exception - but, crucially, should never result in a VM crash, or, more subtly, in memory corruption occurring outside the region of memory associated with a memory segment. This is possible, since all segments have immutable spatial bounds, and, as we have seen, are associated with a resource scope which make sure that the segment cannot be dereferenced after the scope has been closed, or, in case of a confined scope, that the segment is dereferenced from the very same thread which created the scope.

That said, it is sometimes necessary to create a segment out of an existing memory source, which might be managed by native code. This is the case, for instance, if we want to create a segment out of memory managed by a custom allocator.

The ByteBuffer API allows such a move, through a JNI method, namely NewDirectByteBuffer. This native method can be used to wrap a long address in a fresh byte buffer instance which is then returned to unsuspecting Java code.

Memory segments provide a similar capability - that is, given an address (which might have been obtained through some native calls), it is possible to wrap a segment around it, with given spatial bounds and resource scope; a cleanup action to be executed when the segment is closed might also be specified.

For instance, assuming we have an address pointing at some externally managed memory block, we can construct an unsafe segment, as follows:

The above code creates a shared scope and then, inside the try-with-resources it creates a new unsafe segment from a given address; the size of the segment is 10 bytes, and the unsafe segment is associated with the current shared scope. This means that the unsafe segment cannot be dereferenced after the shared scope has been closed.

Of course, segments created this way are completely unsafe. There is no way for the runtime to verify that the provided address indeed points to a valid memory location, or that the size of the memory region pointed to by addr is indeed 10 bytes. Similarly, there are no guarantees that the underlying memory region associated with addr will not be deallocated prior to the call to ResourceScope::close.

For these reasons, creating unsafe segments is a restricted operation in the Foreign Memory Access API. Restricted operations can only be performed if the running application has set a read-only runtime property — foreign.restricted=permit. Any attempt to call restricted operations without said runtime property will fail with a runtime exception.

We plan, in the future, to make access to restricted operations more integrated with the module system; that is, certain modules might require restricted native access; when an application which depends on said modules is executed, the user might need to provide permissions to said modules to perform restricted native operations, or the runtime will refuse to build the application's module graph.

 


-
1 In general, deriving a complete layout from a C struct declaration is no trivial matter, and it's one of those areas where tooling can help greatly.
-
2 Clients can enforce stricter type checking when interacting with VarHandle instances, by obtaining an exact var handle, using the VarHandle::withInvokeExactBehavior method.
-
3 Shared segments rely on VM thread-local handshakes (JEP 312) to implement lock-free, safe, shared memory access; that is, when it comes to memory access, there should no difference in performance between a shared segment and a confined segment. On the other hand, MemorySegment::close might be slower on shared segments than on confined ones.
-
4 The main difference between reference counting and the mechanism proposed here is that reference counting is symmetric - meaning that any client is able to both increment and decrement the reference count at will. The resource scope handle mechanism is asymmetric, since only the client acquiring a handle has the capability to release that handle. This avoids situation where a client might be tempted to e.g. decrement the reference count multiple times in order to perform some task which would otherwise be forbidden.
- - \ No newline at end of file diff --git a/doc/panama_memaccess.md b/doc/panama_memaccess.md index 0a53ddd2b7a..bff108ea685 100644 --- a/doc/panama_memaccess.md +++ b/doc/panama_memaccess.md @@ -1,9 +1,6 @@ ## State of foreign memory support -**March 2021** - -* Reorganize the document, starting from simpler use cases to more advanced ones -* Reorganized section on shared segments and confinement into a brand new section on deterministic deallocation +**May 2021** **Maurizio Cimadamore** @@ -16,15 +13,15 @@ Memory segments are abstractions which can be used to model contiguous memory re For instance, the following snippet allocates 100 bytes off-heap: ```java -MemorySegment segment = MemorySegment.allocateNative(100); +MemorySegment segment = MemorySegment.allocateNative(100, ResourceScope.newImplicitScope()); ``` -The above code allocates a 100-bytes long memory segment. The segment memory will not be *freed* as long as the segment instance is deemed *reachable*. In other words, the above factory creates a segment whose behavior closely matches that of a `ByteBuffer` allocated with the `allocateDirect` factory. Of course, the memory access API also supports deterministic memory release; we will cover that in a later section of this document. +The above code allocates a 100-bytes long memory segment. The lifecycle of a memory segment is controlled by an abstraction called `ResourceScope`. In this example, the segment memory will not be *freed* as long as the segment instance is deemed *reachable*, as specified by the `newImplicitScope()` parameter. In other words, the above factory creates a segment whose behavior closely matches that of a `ByteBuffer` allocated with the `allocateDirect` factory. Of course, the memory access API also supports deterministic memory release; we will cover that in a later section of this document. Memory segments support *slicing* — that is, given a segment, it is possible to create a new segment whose spatial bounds are stricter than that of the original segment: ```java -MemorySegment segment = MemorySement.allocateNative(10); +MemorySegment segment = MemorySement.allocateNative(10, ResourceScope.newImplicitScope()); MemorySegment slice = segment.asSlice(4, 4); ``` @@ -34,7 +31,7 @@ The Foreign Memory Access API provides ready-made static accessors in the `Memor ```java record Point(int x, int y); -MemorySegment segment = MemorySement.allocateNative(10 * 4 * 2); +MemorySegment segment = MemorySement.allocateNative(10 * 4 * 2, ResourceScope.newImplicitScope()); Point[] values = new Point[10]; for (int i = 0 ; i < values.length ; i++) { int x = MemoryAccess.getIntAtIndex(segment, i * 2); @@ -64,8 +61,8 @@ Creating buffer views out of existing segment is a crucial tool enabling interop Expressing byte offsets (as in the example above) can lead to code that is hard to read, and very fragile — as memory layout invariants are captured, implicitly, in the constants used to scale offsets. To address this issue, we add a *memory layout* API which allows clients to define memory layouts *programmatically*. For instance, the layout of the array used in the above example can be expressed using the following code 1: ```java -MemoryLayout points = MemoryLayout.ofSequence(10, - MemoryLayout.ofStruct( +MemoryLayout points = MemoryLayout.sequenceLayout(10, + MemoryLayout.structLayout( MemoryLayouts.JAVA_INT.withName("x"), MemoryLayouts.JAVA_INT.withName("y") ) @@ -83,7 +80,7 @@ To specify which nested layout element should be used for the offset calculation One of the things that can be derived from a layout is a so called *memory access var handle*. A memory access var handle is a special kind of var handle which takes a memory segment access coordinate, together with a byte offset — the offset, relative to the segment's base address at which the dereference operation should occur. With memory access var handles we can rewrite our example above as follows: ```java -MemorySegment segment = MemorySegment.allocateNative(points); +MemorySegment segment = MemorySegment.allocateNative(points, ResourceScope.newImplicitScope()); VarHandle xHandle = points.varHandle(PathElement.sequenceElement(), PathElement.groupElement("x")); VarHandle yHandle = points.varHandle(PathElement.sequenceElement(), PathElement.groupElement("y")); Point[] values = new Point[10]; @@ -109,19 +106,21 @@ In addition to spatial bounds, memory segments also feature temporal bounds as w Memory segments support deterministic deallocation, through an abstraction called `ResourceScope`. A resource scope models the lifecycle associated with one or more resources (in this document, by resources we mean mostly memory segments); a resource scope has a state: it starts off in the *alive* state, which means that all the resources it manages can be safely accessed - and, at the user request, it can be *closed*. After a resource scope is closed, access to resources managed by that scope is no longer allowed. Resource scope support the `AutoCloseable` interface, which means that user can use resource scopes with the *try-with-resources* construct, as demonstrated in the following code: ```java -try (ResourceScope scope = ResourceScope.ofConfined()) { +try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment mapped = MemorySegment.map(Path.of("someFile"), 0, 100000, MapMode.READ_WRITE, scope); } // segment is unmapped here ``` Here, we create a new *confined* resource scope, which is then used when creating a mapped segment; this means that the lifecycle of the `mapped` segment will be tied to that of the resource scope, and that accessing the segment (e.g. dereference) *after* `scope` has been closed will not be possible. -As this example alludes to, resource scopes can come in many flavors: they can be *confined* (where access is restricted to the thread which created the scope), *shared* 3 (where access can occur in any thread) and can be optionally associated with a `Cleaner` object, which would take care of performing implicit deallocation, in case the resource scope becomes *unreachable* and the `close` method has not been called by the user. In fact, all the memory segments we have seen previously were associated with the so called *default* scope: a shared scope which does not support deterministic deallocation (e.g. calling `close` will fail), and whose resources are managed by a `Cleaner`. +As this example alludes to, resource scopes can come in many flavors: they can be *confined* (where access is restricted to the thread which created the scope), *shared* 3 (where access can occur in any thread) and can be optionally associated with a `Cleaner` object, which would take care of performing implicit deallocation, in case the resource scope becomes *unreachable* and the `close` method has not been called by the user. + +Some resource scopes do *not* support deterministic deallocation. Such scopes are called *implicit* scopes. Calling `close` on an implicit scope will fail; instead, resources associated with implicit scopes are *always* managed by a `Cleaner`. A new implicit scope can be obtained using the `ResourceScope::newImplicitScope` factory (which has been used in many examples throughout this document). Resource scopes are very handy when managing the lifecycle of multiple resources: ```java -try (ResourceScope scope = ResourceScope.ofConfined()) { +try (ResourceScope scope = ResourceScope.newConfinedScope()) { MemorySegment segment1 = MemorySegment.allocateNative(100, scope); MemorySegment segment2 = MemorySegment.allocateNative(100, scope); ... @@ -142,16 +141,19 @@ void writePoint(MemorySegment segment, int x, int y) { If the segment is associated with a confined scope, no problem arises: the thread that created the segment is the same thread that performs the dereference - as such, when `writePoint` is called, the segment's scope is either alive (and will remain so for the duration of the call), or already closed (in which case some exception will be thrown, and no value will be written). -But, if the segment is associated with a shared scope, there is a new problem we are faced with: the segment might be closed (concurrently) in between the two accesses! This means that, the method ends up writing only one value instead of two; in other words, the behavior of the method is no longer atomic. Note that this cannot happen in the case where the scope is shared but associated with the *default scope* - as that scope does not support explicit deallocation. +But, if the segment is associated with a shared scope, there is a new problem we are faced with: the segment might be closed (concurrently) in between the two accesses! This means that, the method ends up writing only one value instead of two; in other words, the behavior of the method is no longer atomic. Note that this cannot happen in the case where the scope is shared but associated with an *implicit scope* - as implicit scopes do not support explicit deallocation. -To avoid this problem, clients can acquire a so called resource scope *handle*. A resource scope handle effectively prevents a scope to be closed, until said handle is released by the application. Let's illustrate how that works in practice: +To avoid this problem, clients can acquire a so called resource scope *handle*. A resource scope handle effectively prevents a scope from being closed, until said handle is released by the application. Let's illustrate how that works in practice: ```java void writePointSafe(MemorySegment segment, int x, int y) { - try (var handle = segment.scope().acquire()) { + var handle = segment.scope().acquire(); + try { MemoryAccess.setIntAtIndex(segment, 0, x); MemoryAccess.setIntAtIndex(segment, 1, y); - } // handle released here + } finally { + segment.scope().release(handle); + } } ``` @@ -162,11 +164,12 @@ A resource scope handle acts as a more restricted version 4 { int res = 0; for (int i = 0; i < 100 ; i++) { @@ -174,15 +177,16 @@ int sum = StreamSupport.stream(MemorySegment.spliterator(segment, seq_bulk), tru } return res; }).sum(); +} ``` -The `MemorySegment::spliterator` takes a segment, a *sequence* layout and returns a spliterator instance which splits the segment into chunks which corresponds to the elements in the provided sequence layout. Here, we want to sum elements in an array which contains a million of elements; now, doing a parallel sum where each computation processes *exactly* one element would be inefficient, so instead we use the layout API to derive a *bulk* sequence layout. The bulk layout is a sequence layout which has the same size of the original layouts, but where the elements are arranged into groups of 100 elements — which should make it more amenable to parallel processing. +The `MemorySegment::elements` method takes an element layout and returns a new stream. The stream is built on top of a spliterator instance (see `MemorySegment::spliterator`) which splits the segment into chunks which corresponds to the elements in the provided layout. Here, we want to sum elements in an array which contains a million of elements; now, doing a parallel sum where each computation processes *exactly* one element would be inefficient, so instead we use a *bulk* element layout. The bulk element layout is a sequence layout containing a group of 100 elements — which should make it more amenable to parallel processing. -Once we have the spliterator, we can use it to construct a parallel stream and sum the contents of the segment in parallel. Since the segment operated upon by the spliterator is shared, the segment can be accessed from multiple threads concurrently; the spliterator API ensures that the access occurs in a regular fashion: a slice is created from the original segment, and given to a thread to perform some computation — thus ensuring that no two threads can ever operate concurrently on the same memory region. +Since the segment operated upon by the spliterator is associated with a shared scope, the segment can be accessed from multiple threads concurrently; the spliterator API ensures that the access occurs in a disjoint fashion: a slice is created from the original segment, and given to a thread to perform some computation — thus ensuring that no two threads can ever operate concurrently on the same memory region. ### Combining memory access handles -We have seen in the previous sections how memory access var handle dramatically simplify user code when structured access is involved. While deriving memory access var handles from layout is the most convenient option, the Foreign Memory Access API also allows to create such memory access var handles in a standalone fashion, as demonstrated in the following code: +We have seen in the previous sections how memory access var handles dramatically simplify user code when structured access is involved. While deriving memory access var handles from layout is the most convenient option, the Foreign Memory Access API also allows to create such memory access var handles in a standalone fashion, as demonstrated in the following code: ```java VarHandle intHandle = MemoryHandles.varHandle(int.class, ByteOrder.nativeOrder()) @@ -222,9 +226,9 @@ Memory segments provide a similar capability - that is, given an address (which For instance, assuming we have an address pointing at some externally managed memory block, we can construct an *unsafe* segment, as follows: ```java -try (ResourceScope scope = ResourceScope.ofShared()) { +try (ResourceScope scope = ResourceScope.newSharedScope()) { MemoryAddress addr = MemoryAddress.ofLong(someLongAddr); - var unsafeSegment = addr.asSegmentRestricted(10, scope); + var unsafeSegment = addr.asSegment(10, scope); ... } ``` @@ -233,11 +237,10 @@ The above code creates a shared scope and then, inside the *try-with-resources* Of course, segments created this way are completely *unsafe*. There is no way for the runtime to verify that the provided address indeed points to a valid memory location, or that the size of the memory region pointed to by `addr` is indeed 10 bytes. Similarly, there are no guarantees that the underlying memory region associated with `addr` will not be deallocated *prior* to the call to `ResourceScope::close`. -For these reasons, creating unsafe segments is a *restricted* operation in the Foreign Memory Access API. Restricted operations can only be performed if the running application has set a read-only runtime property — `foreign.restricted=permit`. Any attempt to call restricted operations without said runtime property will fail with a runtime exception. - -We plan, in the future, to make access to restricted operations more integrated with the module system; that is, certain modules might *require* restricted native access; when an application which depends on said modules is executed, the user might need to provide *permissions* to said modules to perform restricted native operations, or the runtime will refuse to build the application's module graph. +For these reasons, creating unsafe segments is a *restricted* operation in the Foreign Memory Access API. Restricted operations can only be performed from selected modules. To grant a given module `M` the permission to execute restricted methods, the option `--enable-native-access=M` must be specified on the command line. Multiple module names can be specified in a comma-separated list, where the special name `ALL-UNNAMED` is used to enable restricted access for all code on the class path. Any attempt to call restricted operations from a module not listed in the above flag will fail with a runtime exception. * (1): In general, deriving a complete layout from a C `struct` declaration is no trivial matter, and it's one of those areas where tooling can help greatly. * (2): Clients can enforce stricter type checking when interacting with `VarHandle` instances, by obtaining an *exact* var handle, using the `VarHandle::withInvokeExactBehavior` method. * (3): Shared segments rely on VM thread-local handshakes (JEP [312](https://openjdk.java.net/jeps/312)) to implement lock-free, safe, shared memory access; that is, when it comes to memory access, there should no difference in performance between a shared segment and a confined segment. On the other hand, `MemorySegment::close` might be slower on shared segments than on confined ones. * (4): The main difference between reference counting and the mechanism proposed here is that reference counting is *symmetric* - meaning that any client is able to both increment and decrement the reference count at will. The resource scope handle mechanism is *asymmetric*, since only the client acquiring a handle has the capability to release that handle. This avoids situation where a client might be tempted to e.g. decrement the reference count multiple times in order to perform some task which would otherwise be forbidden. +