The new generation of P3DFFT library (dubbed P3DFFT++ or P3DFFT v.3) is a generalization of the concept of P3DFFT for more arbitrary data formats and transform functions.
P3DFFT++ is written in C++ and contains wrappers providing easy interfaces with C and Fortran.
For C++ users all P3DFFT++ objects are defined within the p3dfft namespace, in order to avoid confusion with user-defined objects. For example, to initialize P3DFFT++ it is necessary to call the function p3dfft::setup(), and to exit P3DFFT++ one should call p3dfft::cleanup() (alternatively, one can use namespace p3dfft and call setup() and cleanup()). From here on in this document we will omit the implicit p3dfft:: prefix from all C++ names.
In C and Fortran these functions become p3dfft_setup and p3dfft_cleanup. While C++ users can directly access P3DFFT objects such as grid class, C and Fortran users will access these through handles provided by corresponding wrappers (see more details below).
P3DFFT++ currently operates on four main data types:
While P3DFFT had the assumption of predetermined 2D pencils in X and in Z dimensions as the primary data storage, P3DFFT++ relaxes this assumption to include more general formats, such as arbitrary-shape and memory order 2D pencils as well as 3D blocks. Below is the technical description of how to specify the data layout formats.
A basic P3DFFT++ descriptor is the "grid" construct. It defines all necessary information about decomposition of a grid among parallel tasks/processors. In C++ it is defined as a class, while in C and in Fortran it is defined through handles to a C++ object through inter-language wrappers. Below is the technical description of the definition for each language.
The following is the main constructor call for the grid class:
grid(int gdims,int pgrid,int proc_order,int mem_order,MPI_Comm mpicomm);
int gdims,pgrid,proc_order, mem_order;
grid mygrid(gdims, pgrid, proc_order, mem_order, mpicomm);
Upon construction the grid object defines several useful parameters, available by accessing the following public class members:
int ldims : dimensions of the local portion of the grid (ldims=gdims/pgrid etc)
int nd : number of dimensions of the processor grid (1, 2 or 3).
int L: 0 to 3 local dimensions (i.e. not split).
int D: 0 to 3 split dimensions
int glob_start: coordinates of the lowest element of the local grid within the global array
and other useful information. The grid class also provides a copy constructor.
For C users grid initialization is accomplished by a call to p3dfft_init_grid, returning a pointer to an object of type Grid. This type is a C structure containing a large part of the C++ class grid. Calling p3dfft_init_grid initializes the C++ grid object and also copies the information into a Grid object accessible from C, returning its pointer. For example:
grid1 = p3dfft_init_grid(gdims, pgrid, proc_order, mem_order, mpicomm);
xdim = grid1->ldims; /* Size of zero logical dimension of the local portion of the grid for a given processor */
For Fortran users the grid object is represented as a handle of type integer(C_INT). For example:
grid1 = p3dfft_init_grid(ldims, glob_start, gdims, pgrid, proc_order, mem_order, mpicomm)
This call initializes a C++ grid object as a global variable and assigns an integer ID, returned in this example as grid1. In addition this call also returns the dimensions of the local portion of the grid (ldims) and the position of this portion within the global array (glob_start).
Other elements of the C++ grid object can be accessed through respective functions, such as p3dfft_grid_get_...
P3DFFT++ aims to provide a versatile toolkit of algorithms/transforms in frequent use for solving multiscale problems. To give the user maximum flexibility there is a range of algorithms from top-level algorithms operating on the entire 3D array, to 1D algorithms which can function as building blocks the user can arrange to suit his/her needs. In addition, inter-processor exchanges/transposes are provided, so as to enable the user to rearrange the data from one orientation of pencils to another, as well as other types of exchanges. In P3DFFT++ the one-dimensional transforms are assumed to be expensive in terms of memory bandwidth, and therefore such transforms are performed on local data (i.e. in the dimension that is not distributed across processor grid). Transforms in three dimensions consist of three transforms in one dimension, interspersed by inter-processor interchange as needed to rearrange the data. The 3D transforms is a convenient tool saving the user work in arranging the 1D transforms and transposes, as well as often providing superior performance. We recommend to use 3D transforms if they fit within the user's program.
As mentioned above, three-dimensional (3D) transforms consist of three one-dimensional transforms in sequence (one for each dimension), interspersed by inter-processor transposes. In order to specify a 3D transform, three main things are needed:
In order to define the 3D transform type one needs to know three 1D transform types comprising the 3D transform. (See next section for possible 1D types).
Usage of 3D transforms is different depending on the language used.
In C++ 3D transform type is interfaced through a class trans_type3D, which is constructed as follows:
trans_type3D name_type3D(int types1D);
Here types1D is the array of three 1D transform types. Copy constructor is also provided for this class.
3D transforms are provided as the class template:
template<class Type1,class Type2> class transform3D;
Here Type1 and Type2 are initial and final data types. Most of the times these will be the same, however some transforms have different types on input and output, for example real-to-complex FFT. In all cases the floating point precision (single/double) of the initial and final types should match.
The constructor of transform3D takes the following arguments:
Here type is a 3D transform type, inplace is a bool variable indicating whether this is an in-place transform (if the input can be rewritten), and grid1 and grid2 are initial and final grid objects. Calling a transform3D constructor creates a detailed step-by-step plan for execution of the 3D transform.
Once a 3D transform has been defined and planned, execution of a 3D transform can be done by calling
my_transform_name.exec(Type1 *in,Type2 *out, int OW);
Here in and out are initial and final data arrays of appropriate types. These are assumed to be one-dimensional contiguous arrays containing the three-dimensional grid for input and output, according to the dimensions and memory ordering specified in the grid1 and grid2 objects, respectively.