Performance/Useability problem in my C++ matrix library

aerosayan · January 11, 2021, 04:12

Summary :

I had a performance problem in my C++ matrix library. I found a basic solution. I need to make it easier to use.

Body:

I'm using Fortran-C++ hybrid programming model for my solver.
Fortran *only* does the core calculations in the solver.
C++ handles everything else like file IO, memory management, error handling.

For each 2D matrix, I allocate a NROWS*NCOLS size linear contiguous block of memory using malloc in C++.

I can then send the pointer to the first element of the allocated memory block to Fortran and use it directly as a NROWS*NCOLS matrix.

However in C++, I have to write my own methods to calculate the array index of an element (i,j) to access the data stored at index (i,j). For higher dimensions, I have functions that calculate the array index from (i,j,k) in 3D, (i,j,k,q) in 4D etc.

Unfortunately, the address/index calculation becomes notoriously complex and slow for the higher dimensions. That's a biiiiiiiiiiiiiiiiiiiiiiiiggggg problem.

For 1D:

Code:

    /// Linear accessor
    /// NOTE : used when the operator() becomes slow for higher dimensions
    ///
    inline REAL& operator [] (int i)
    {
        return _head[i];
    }

For 2D:

Code:

    /// Access 2D matrices
    ///
    inline REAL& operator () (int irow, int icol)
    {
        return _head[nrows*icol + irow];
    }

For 4D:

Code:

    /// Access 4D matrices
    ///
    inline REAL& operator () (int i, int j, int k, int q)
    {
        return _head[ndim1*ndim2*ndim3*q + ndim1*ndim2*k + ndim1*j + i];
    }

The multiplication operations nrows*ncols, ndim1*ndim2, etc are going to have an overhead. I don't want that. I could pre-calculate the values of ndim1*ndim2*xyz etc, but still we can't get rid of the addition operations.

This is slow. This is horrendously slow. A linear memory access shouldn't be this much complex. I hate it.

I found a basic solution that works for now.

Instead of accessing the elements through the operator() overloads, I'm directly calculating the index to the first element of the column (fortran style memory allignment is used) that I want to see, then I use the 1D accessor (operator [] overload) shown above to access the elements.

This is fast, but I want to know if there is any other way to do it.

Now, the final code looks like:

Code:

/// Fast 2D matrix access *pseudocode for now*
/// I haven't tested it, so the logic could have bugs, 

/// but kindly assume it's correct.
///

for(j=0; j<NCOLS; ++j)
{
    // We get the index to the first element of each column

    int col_start_index = matrix_object.get_element_index(i,j);



    for(i=0; i<NROWS; ++i)
    {
        cout << matrix_object[col_start_index + i] << endl; // We access the matrix linearly

    }

}

The code should work. But there are still some big issues.

- Notice how i,j need to be defined outside of the loops (like in fortran). I don't like it.

I'm solving it by doing : for(int i=0, j=0; j<NCOLS; ++j) { for(i=0; i<NROWS; ++i) { CODE(); } }

EDIT : If I think about it, for(int i=0, j=0; j<NCOLS; ++j){ i=matrix_object.get_element_index(i,j); for( ; i<i+NROWS; ++i) { cout << matrix_object[i] << endl; }} could be used.

- Notice how writing stencils will quickly become more complex.

This will become more complex in higher dimensions. The only solution I can think of, is to use C++ iterators. I'm looking for other better options.

I want to improve this. What do?

January 11, 2021, 04:12	Performance/Useability problem in my C++ matrix library	#1
aerosayan Senior Member Sayan Bhattacharjee Join Date: Mar 2020 Posts: 495 Rep Power: 8	Summary : I had a performance problem in my C++ matrix library. I found a basic solution. I need to make it easier to use. Body: I'm using Fortran-C++ hybrid programming model for my solver. Fortran only does the core calculations in the solver. C++ handles everything else like file IO, memory management, error handling. For each 2D matrix, I allocate a NROWSNCOLS size linear contiguous block of memory using malloc in C++. I can then send the pointer to the first element of the allocated memory block to Fortran and use it directly as a NROWSNCOLS matrix. However in C++, I have to write my own methods to calculate the array index of an element (i,j) to access the data stored at index (i,j). For higher dimensions, I have functions that calculate the array index from (i,j,k) in 3D, (i,j,k,q) in 4D etc. Unfortunately, the address/index calculation becomes notoriously complex and slow for the higher dimensions. That's a biiiiiiiiiiiiiiiiiiiiiiiiggggg problem. For 1D: Code: /// Linear accessor /// NOTE : used when the operator() becomes slow for higher dimensions /// inline REAL& operator [] (int i) { return _head[i]; } For 2D: Code: /// Access 2D matrices /// inline REAL& operator () (int irow, int icol) { return _head[nrowsicol + irow]; } For 4D: Code: /// Access 4D matrices /// inline REAL& operator () (int i, int j, int k, int q) { return _head[ndim1ndim2ndim3q + ndim1ndim2k + ndim1j + i]; } The multiplication operations nrowsncols, ndim1ndim2, etc are going to have an overhead. I don't want that. I could pre-calculate the values of ndim1ndim2xyz etc, but still we can't get rid of the addition operations. This is slow. This is horrendously slow. A linear memory access shouldn't be this much complex. I hate it. I found a basic solution that works for now. Instead of accessing the elements through the operator() overloads, I'm directly calculating the index to the first element of the column (fortran style memory allignment is used) that I want to see, then I use the 1D accessor (operator [] overload) shown above to access the elements. This is fast, but I want to know if there is any other way to do it. Now, the final code looks like: Code: /// Fast 2D matrix access pseudocode for now* /// I haven't tested it, so the logic could have bugs, /// but kindly assume it's correct. /// for(j=0; j<NCOLS; ++j) { // We get the index to the first element of each column int col_start_index = matrix_object.get_element_index(i,j); for(i=0; i<NROWS; ++i) { cout << matrix_object[col_start_index + i] << endl; // We access the matrix linearly } } The code should work. But there are still some big issues. - Notice how i,j need to be defined outside of the loops (like in fortran). I don't like it. I'm solving it by doing : for(int i=0,* j=0; j<NCOLS; ++j) { for(i=0; i<NROWS; ++i) { CODE(); } }* EDIT : If I think about it, for(int i=0, j=0; j<NCOLS; ++j){ i=matrix_object.get_element_index(i,j); for( ; i<i+NROWS; ++i) { cout << matrix_object[i] << endl; }} could be used. - Notice how writing stencils will quickly become more complex. This will become more complex in higher dimensions. The only solution I can think of, is to use C++ iterators. I'm looking for other better options. I want to improve this. What do? Last edited by aerosayan; January 11, 2021 at 05:01. Reason: formatting is really hard

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
ERROR: unable to find library	HJH	CFX	6	February 26, 2019 06:52
Is there any library for computing the inverse of a sparse matrix using parallelism?	HectorRedal	Main CFD Forum	7	November 23, 2018 04:07
Forcing a solver to use your custom library.	cdunn6754	OpenFOAM Programming & Development	0	March 30, 2017 16:05
Problem compiling custom boundary condition as a library	leroyv	OpenFOAM Programming & Development	2	January 21, 2014 10:44
CG, BICGSTAB(2) : problem with matrix operation and boundary conditions	moomba	Main CFD Forum	2	February 17, 2010 03:37