JCuda

Code samples



This page contains code samples demonstrating different features of JCuda and the JCuda runtime libraries.

NOTE: Some of these examples still use CUBIN files. As explained in the Tutorial, using PTX files may be more flexible. Additionally, most of the driver examples still use the kernel invocation mechanisms of CUDA 3.2, which have become deprecated with CUDA 4.0. The samples are still working, and the main parts remain unaffected by these changes. However, the respective samples will be updated as soon as possible.


JCublasSample.java A JCublas sample, which performs a 'sgemm' operation, once in plain Java and once in JCublas, and verifies the result.
JCublas2Sample.java A JCublas2 sample, which performs a 'sgemm' operation, once in plain Java and once in JCublas2, and verifies the result.
JCublas2PointerModes.java A sample that shows how to obtain the results of a CUBLAS computation once in a pointer to host memory and once in a pointer to device memory.
JCufftSample.java Shows how to perform an in-place 1D real-to-complex transform using JCufft, and compares the result to a reference solution that is computed with JTransforms.
JCurandSample.java A simple example of how to use JCurand.
JCusparseSample.java An example showing how to use JCusparse. This example is a direct port of the example from the CUSPARSE documentation.
JCudppSample.java A sample that uses JCudpp to sort an array of integers and verifies the result.

(Note that JCudpp is no longer part of the main JCuda package)
JCudppHashSample.java A sample that shows how to use the hash functions that have been introduced with CUDPP 2.0.

(Note that JCudpp is no longer part of the main JCuda package)
JCudaVectorAdd.java
JCudaVectorAddKernel.cu

The example that is used in the Tutorial:

This sample shows how to load and execute a simple vector addtion kernel using the JCuda driver bindings.

The CUDA source file is compiled into a PTX file at runtime using the NVCC. The PTX file is loaded as a module, and the kernel function is executed.
JCudaReduction.java
reduction.cu

The example that performs a reduction, using a kernel that is based on the reduction example from the CUDA SDK.
JCudaDriverSample.java
JCudaSampleKernel.cu

This sample is similar to the vector addition example, but also shows how to to pass a 2D array (i.e. an array of pointers) to a kernel function.
JCudaDeviceQuery.java
A program that queries and prints all attributes of all available devices.
JCudaRuntimeDriverMixSample.java

The kernel that is executed: invertVectorElements.cu
invertVectorElements.cubin (for 32 bit)
With CUDA 3.0 it is possible to mix runtime- and driver API calls. This is a simple example that shows how data may be allocated and modified with a mixed sequence of runtime- and driver operations.


Please refer to the CUDA Programming Guide for more information about how driver and runtime calls may be mixed.
JCudaBandwidthTest.java
A sample program that computes the bandwidth for host-to-device memory copies
TestPointerToBuffer.java
This is a sample- and test class pointing out the difference between the Pointer#to(Buffer) method and the Pointer#toBuffer(Buffer) method.
JCudaAsyncCopyTest.java
This is a sample- and test class showing synchronous and asynchronous memory copies between different kinds of source- and target memory. It summarizes the information from the section about asynchronous operations.
JCudaMatrixCgSample20141023.zip

This sample contains some classes that show the implementation of CG solvers for sparse matrices using JCusparse and JCublas. It compares the convergence of a default CG solver to a CG solver that uses an incomplete Cholesky preconditioner.

The core of the solver classes is based on the 'conjugateGradientPrecond' sample from NVIDIA, which originally was ported to JCuda by Kashif Rasul.

NOTE: These classes are only intended as a sample. They are not an official part of JCuda, and should not be considered to have a public API.

JCublasMatrixInvert.java

This is an example showing how a matrix may be inverted using JCublas.

(based on the code from this forum post)
Matrix Inversion

32bit CUBIN files
64bit CUBIN files
This is an example of a matrix inversion. It inverts a matrix by calling several kernels that perform the individual steps of a Gauss elemination. To load and launch the kernels, it uses the KernelLauncher class from the Utilities package.

The first archive contains the Java source files, and the source files of the CUDA kernels.

The other archives contain the precompiled CUBIN files for 32 and 64 bit, respectively. If you have a C compiler installed (for example, Visual Studio or GCC) then you do not need the CUBIN files, since they will be compiled automatically from the source files at program startup.


Acknowledgements:

The original kernels and the host implementation have been developed by Christoph Wagner (Hochschule Mannheim) in his diploma thesis in the ZAFH-AMSER project, based on a presentation (PDF file) by Christian Heinrich (Fraunhofer SCAI). The source code of the kernels has first been published in this NVIDIA forum thread, and is redistributed here with permission of the original author.
JCudaDriverGLSample3.java
simpleGL_kernel.cu

A sample application demonstrating basic JCuda/JOGL interoperability. It puts a simple, animated sine wave pattern onto a grid of 512x512 points which are stored in a vertex buffer object. The CUDA kernel function is called during each rendering pass to update the vertex positions inside the vertex buffer object, and then the vertex buffer object is rendered using JOGL.


In order to compile and run this sample, you will have to download JOGL from JogAmp.org.

The kernel file is taken from the Simple OpenGL sample from the NVIDIA CUDA samples web site.

JCudaDriverLWJGLSample3.java
simpleGL_kernel.cu

A sample application demonstrating basic JCuda/LWJGL interoperability. It puts a simple, animated sine wave pattern onto a grid of 512x512 points which are stored in a vertex buffer object. The CUDA kernel function is called during each rendering pass to update the vertex positions inside the vertex buffer object, and then the vertex buffer object is rendered using LWJGL.


In order to compile and run this sample, you will have to download LWJGL.

The kernel file is taken from the Simple OpenGL sample from the NVIDIA CUDA samples web site.

JCudaDriverTextureSample.java
volumeRender_kernel.sm_10.cubin (for 32 bit)

The volume data set that is loaded:
Bucky.raw
A sample application demonstrating how to use textures with JCuda. The application loads a RAW volume data file, stores the volume data in a 3D texture, uses a CUDA kernel to render the volume data into a pixel buffer object and displays the pixel buffer object using JOGL.

In order to compile and run this sample, you will have to download JOGL from JogAmp.org.

The CUBIN file for this sample is created from the Volume rendering sample from the NVIDIA CUDA samples web site. It is compiled for 32 bit architectures. For 64 bit architectures, you may have to compile your own CUBIN file.
Additional data sets may be obtained from http://volvis.org/

JCudaDriverTextureTest.java
JCudaDriverTextureTestKernels.cu

JCudaDriverTextureTestKernels.cubin (for 32bit)
This is a sample- and test class for texture handling. It shows how 1D, 2D and 3D arrays of float and float4 values may be accessed via texture references.