share your best C/C++ trick/tips

aerosayan · June 27, 2021, 02:59

Developers try to pick up C/C++ for coding scientific programs, and usually fall into one of the thousand different traps waiting for them. Time is short, and bugs are many. I'm starting this thread, and inviting everyone to share their best tricks/tips. Starting with simple tips, maybe in future, I will add more complex things.

In order to avoid spam, I will update it only once every one month or so.

Kindly mark your tips in order, so that anyone can find them easily in future.

>>> TIP 1 : std::cin, std::cout, std::endl are generally very slow if used incorrectly.

std::cin, std::cout and std::endl were made with user comfort in mind, not performance. So, they suck. std::cin and std::cout will by default, sync with the old C functions printf and scanf. Due to this synchronization, cin and cout have to wait for any previous printf or scanf operations to finish.

This is extremely slow. If you want to read or write a very large file, like a CFD solution file, after every N iterations, you're wasting a lot of time doing it, using the default configuration of cin and cout. I have seen a lot of production code, that wasted 4-5 seconds while loading or writing an extremely big solution from an ASCII file.

The trick, is to tell cin and cout to never synchronize with printf or scanf. However, that means that you can't use printf or scanf without re-activating the synchronization again.

Here's how to speedup cin and cout. Include this in your code, and see results:

Code:

ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);

std::endl on the other hand is extremely slow, because it writes a newline, then does a flush, to ensure the newline is written to the console or file. The flush operation is extremely slow. It would be faster to just print a newline, then do the flush operation at the end of a big loop.

Code:

// std::endl is basically std::cout << "\n" << std::flush;


// slow
for(int i=0; ...) { cout << i << endl; }


// faster
for(int i=0; ...) { cout << i << "\n"; } cout << flush;

>>> TIP 2 : Use macros in development, to make your life easier.

Macros are evil. We should know that before we start.
However, they can be used during development, to make our life easier.

I got severe hand pain from using a horrible keyboard at my office, some 1.5 years ago, and I still haven't recovered. So, my keyboard shortcuts, and aliases, are probably the best optimized for speed, and less key crunching.

One of the things I very much hate, is writing for loops in C/C++. Fortran's do loops are significantly better. So, I came up with my own. And, by Gods, I love them.

Code:

// forward loop
//
#define xdo(var, lo, hi) for(decltype(hi) var=(lo); var<(hi) ; ++var)


// reverse loop
//
#define xro(var, hi, lo) for(decltype(hi) var=(hi); var>=(lo); --var)

You use them like:

Code:

long long n = 10;
xdo(i,0,n)   cout << i << " "; cout << endl;
xro(i,n-1,0) cout << i << " "; cout << endl;

You don't need to define types, you don't need to define those pesky inequality symbols, and you can type them out within a second. The type of the variable is defined by the type of n. And, you can create nested loops, as you can change the variable name i, to j,k,l,m,n etc, and it will be reflected back.

However, since this is a macro, use carefully, and with judgement. If you mess up, everything's gonna blow up to high heaven.

Additionally, there's a limitation that OpenMP can't detect these for loops, so you can't easily use #pragma omp parallel directives with this form. You have to write the loop out in normal form. But that's okay for me, as I only use it for rapid development.

praveen · June 28, 2021, 05:53

I am very interested to know how people allocate multi-dimensional arrays.

For example, on a 3d structured grid, one needs to store data like this

double sol[nx][ny][nz][nvar];

This is easy in fortran, but not so in C/C++ as there are no in-built multi-d arrays. What are the best ways you have found ?

piu58 · June 28, 2021, 06:24

Cumbersome to write, but dynamic and located at the stack:

Code:

vector<vector<vector<double>>> myVar(X,vector<vector<double>(Y,vector<double>Z)));

aerosayan · June 28, 2021, 07:32

Quote:

Originally Posted by praveen

I am very interested to know how people allocate multi-dimensional arrays. ... .For example, on a 3d structured grid, one needs to store data like this double sol[nx][ny][nz][nvar]; ... This is easy in fortran, but not so in C/C++ as there are no in-built multi-d arrays.

In case you're not using the data structure for performance critical operations, you can follow the method provided by piu58. That seems correct, and a really good time-saving trick.

In case you're using the data structure for performance critical operations, it's simple. You don't, if you don't understand the underlying memory layout of the nested std::vectors.

Fortran multi dimensional arrays have a good property of using flat memory. i.e the whole multi dimensional array will be flattened out into a single chunk of 1D block on the heap (if your array is dynamic and/or very large) or stack (if your array is small). I'm not an expert on this topic, so I can't guarantee that's what happens in every implementation. However, as per my experience this is how it works.

However, C++ doesn't have multi dimensional array support. It only has 1D array support. So, in most general cases, you'll use std::vector to represent your dynamic 1D arrays. You can also use nested std::vector as std::vector<std::vector<double>> to represent 2 dimensional array or matrix.

The problem is that, in specification of std::vector, it's clearly stated that they can only guarantee uniform flat memory for only 1 level of std::vector. That is, in your 2D multi dimensinal std::vector, only the last nested layer i.e std::vector<double> will be laid out uniformly into a single chunk of memory.

If we consider the std::vector<double> to be the rows in std::vector<std::vector<double>>, there's only a guarantee of uniform memory chunk allocation for each row, and every different row can be far away from each other in memory.

The performance implication being : you'll have severe cache misses, if the different rows are far away from each other in memory. This is extremely bad in case your matrix is small (like 4*4, or 5*5), but due to the implementation of nested std::vectors, you might have sever cache misses when doing even simple mathematical operation on different rows of the matrices.

And since there's no guarantee of how far the different std::vector rows will be in memory, you have to consider the worst case possibility that they'll have to be loaded from RAM.

That's why I manually allocate memory locations in huge chunks of 1D arrays of size NROWS*NCOLS, for 2D matrices. This way, even if your matrix is small or big, your data will be contiguous in memory, so you'll have extremely good performance, provided you access the data correctly.

However, data access is a little bit more complicated, so maybe I'll write about it in future.

Quote:

Originally Posted by piu58

located at the stack

Unfortunately, they aren't : https://stackoverflow.com/questions/8036474

The header information of the std::vector will be on stack. That's the basic housekeeping information regarding the std::vector will be on stack.

However, your data will be stored internally, as a pointer to a huge chunk of 1D contiguous memory block in heap. This allows you to dynamically change the pointer to different locations in heap, in case you need to update the vector's size.

Of course the memory layout for multi dimensional vectors are more complicated, and explained above.

JBeilke · June 28, 2021, 09:46

Is there a reason not to use the boost library?

https://www.boost.org/doc/libs/1_63_.../doc/user.html

aerosayan · June 28, 2021, 09:57

Quote:

Originally Posted by JBeilke

Is there a reason not to use the boost library?

https://www.boost.org/doc/libs/1_63_.../doc/user.html

If you can guarantee good/acceptable performance for your code, then there's nothing wrong with using any library. Personally I don't like using any library except the most standard ones (even there, I don't use every part of STL for my critical sections), simply because I don't feel comfortable with hidden abstractions of each library. Most of these libraries do a whole lot of Template Meta-programming and Object Oriented Programming, and that automatically worsens compilation time, and code performance.

Other than that, boost seems like a well used library for the portions of the codes that aren't as performance critical. Can be very useful in programming different things for the GUI.

Personally, I don't want the headache of downloading the correct boost version when deploying my code to a different machine.

aerosayan · July 4, 2021, 03:34

I remembered why I used to hate boost library.

Here's a "design rationale" for c++ program for calculating distance between 2 points : https://archive.md/2014.04.28-125041...ry/design.html

Read through it. Read through it and realize how much badly they messed up even for a simple function as calculating the distance between 2 points. Their whole codebase is bloated, and now a mental gymnastic routine.

Realize that the boost developers are such geniuses, they've overflowed their integer IQ score variables, and now each have a negative IQ score.

This is why I will never trust boost, and use it in my performance critical code.

piu58 · July 4, 2021, 09:21

Dear aerosayan

that example is great! It shows how badly the code is bloated with a so calle well structured library.

Thank you for correcting my mistake (stack) and for your opinions when using vector. It is not too hard multiplying the indices into a flat memory as you mentioned.

Quote:

Personally I don't like using any library except the most standard ones

Me too. We should keep in mind that our code may be useful even if the most libraries of today are gone or cannot be found anymore. Every use of a library makes your code lesser readable. Decent C++ code is like a mathematic textbook: Is is valid forever.

aerosayan · August 28, 2021, 17:27

>>> TIP 3 : Stop calculating element indices for your matrix class

EDIT1 : After some thorough review, it seems like modern compilers are able to optimize index calculations very well. So, this trick isn't really necessary if your code is being compiled with the latest and greatest compilers. However, if you're supporting older compilers and older hardware, this might be helpful. Although, do your own performance profiling and assembly code analysis on your own, and don't believe everything that's given here, as a fact.

C++ doesn't have matrices, so many tend to create matrix classes that store the data inside long arrays, but access the matrix data using row-major or column-major address formulation, and overloading the parenthesis operators. It is a horrible way to do it, since the matrix element's address calculation requires many addition and multiplication operations, and they are not trivial.

Here's a significantly better method, that's easy to use, and performs well, as per my preliminary analysis. Needs more testing... Needs lots and lots of testing, before I can say for sure that it performs great.

Code:

int main()
{
    // linear 1d array containing data for 4x4 2d matrix
    int array[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};

    // we store the begining of each row, into another array
    int * matrix[4] = { &array[0],  &array[4], &array[8], &array[12] };

    // now access the 1d array, as if it was a 2d matrix
    for(int i=0; i<4; ++i)
    {
        for(int j=0; j<4; ++j)
        {
            // courtesy of c style arrays, you can access pointers as arrays
            // so, we access pointer to the head of each row, as a new array;
            // thus, we essentially access the 1d array as a 2d matrix.
            std::cout << matrix[i][j] << " ";
        }
        std::cout << std::endl;
    }

    return 0;
}

Code:

1 2 3 4 
5 6 7 8 
9 10 11 12 
13 14 15 16

aerosayan · September 1, 2021, 19:17

improved previous code, and made it more efficient : no extra memory required + we can do linear traversal through the column order matrix, resulting in more efficient code.

Code:

//
template<typename tx>
struct dymatrix
{
    tx * head;
    int ecols, erows;
    inline tx* operator[](int i) { return &head[i]; }
    inline const tx* operator[](int i) const { return &head[i]; }
};
//

int main()
{
    printf("matrix use demonstration...\n");
    //
    float a[100]; for(int i=0; i<100; ++i) a[i] = i;
    dymatrix<float> h;

    h.head = reinterpret_cast<float*>(a);
    h.ecols = 10;
    h.erows = 10;

    for(int i=0; i<100; i+=h.ecols)
    {
    for(int j=0; j<10; ++j)
    {
        const float x = h[i][j]; // mark
        cout << x << " ";
    }
    cout << endl;
    }
    //
    return 0;
}

Code:

matrix use demonstration...
0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99

Code:

             const float x = h[i][j]; // mark
  401c99:    vmovss xmm0,DWORD PTR [rsp+r14*4+0x28] // <- NOTICE THIS!!!

Naive C++ OOP overloaded accessors using formulas like address = ncols*i + j; generally use assembly ADD,MUL instructions to calculate the address, and thus tend to be slow. They're used for random access, not when doing linear traversal. Fortran compilers seems to be smarter, and recognize linear traversal through memory of a 2D matrix.

Assembly generated here, uses register based maths, so most likely they'reextremely fast. I have seen fortran generate such code, and previously I thought that they were slow. But since the address calculation is using registers, they were actually extremely fast.

obviously, more test required. i'm tired. bye.

EDIT : Modern C++ compilers can optimize naive implementations very well. Although, I like to write code that works well with older compilers too. So, kindly do your own profiling, as you might not need this technique. But it's a good trick, and useful for writing matrix/tensor math libraries.

June 27, 2021, 02:59	share your best C/C++ trick/tips	#1
aerosayan Senior Member Sayan Bhattacharjee Join Date: Mar 2020 Posts: 495 Rep Power: 8	Developers try to pick up C/C++ for coding scientific programs, and usually fall into one of the thousand different traps waiting for them. Time is short, and bugs are many. I'm starting this thread, and inviting everyone to share their best tricks/tips. Starting with simple tips, maybe in future, I will add more complex things. In order to avoid spam, I will update it only once every one month or so. Kindly mark your tips in order, so that anyone can find them easily in future. >>> TIP 1 : std::cin, std::cout, std::endl are generally very slow if used incorrectly. std::cin, std::cout and std::endl were made with user comfort in mind, not performance. So, they suck. std::cin and std::cout will by default, sync with the old C functions printf and scanf. Due to this synchronization, cin and cout have to wait for any previous printf or scanf operations to finish. This is extremely slow. If you want to read or write a very large file, like a CFD solution file, after every N iterations, you're wasting a lot of time doing it, using the default configuration of cin and cout. I have seen a lot of production code, that wasted 4-5 seconds while loading or writing an extremely big solution from an ASCII file. The trick, is to tell cin and cout to never synchronize with printf or scanf. However, that means that you can't use printf or scanf without re-activating the synchronization again. Here's how to speedup cin and cout. Include this in your code, and see results: Code: ios::sync_with_stdio(false);cin.tie(0);cout.tie(0); std::endl on the other hand is extremely slow, because it writes a newline, then does a flush, to ensure the newline is written to the console or file. The flush operation is extremely slow. It would be faster to just print a newline, then do the flush operation at the end of a big loop. Code: // std::endl is basically std::cout << "\n" << std::flush; // slow for(int i=0; ...) { cout << i << endl; } // faster for(int i=0; ...) { cout << i << "\n"; } cout << flush; >>> TIP 2 : Use macros in development, to make your life easier. Macros are evil. We should know that before we start. However, they can be used during development, to make our life easier. I got severe hand pain from using a horrible keyboard at my office, some 1.5 years ago, and I still haven't recovered. So, my keyboard shortcuts, and aliases, are probably the best optimized for speed, and less key crunching. One of the things I very much hate, is writing for loops in C/C++. Fortran's do loops are significantly better. So, I came up with my own. And, by Gods, I love them. Code: // forward loop // #define xdo(var, lo, hi) for(decltype(hi) var=(lo); var<(hi) ; ++var) // reverse loop // #define xro(var, hi, lo) for(decltype(hi) var=(hi); var>=(lo); --var) You use them like: Code: long long n = 10; xdo(i,0,n) cout << i << " "; cout << endl; xro(i,n-1,0) cout << i << " "; cout << endl; You don't need to define types, you don't need to define those pesky inequality symbols, and you can type them out within a second. The type of the variable is defined by the type of n. And, you can create nested loops, as you can change the variable name i, to j,k,l,m,n etc, and it will be reflected back. However, since this is a macro, use carefully, and with judgement. If you mess up, everything's gonna blow up to high heaven. Additionally, there's a limitation that OpenMP can't detect these for loops, so you can't easily use #pragma omp parallel directives with this form. You have to write the loop out in normal form. But that's okay for me, as I only use it for rapid development. sbaffini, ssh123 and aero_head like this.

June 28, 2021, 05:53		#2
praveen Super Moderator Praveen. C Join Date: Mar 2009 Location: Bangalore Posts: 342 Blog Entries: 6 Rep Power: 18	I am very interested to know how people allocate multi-dimensional arrays. For example, on a 3d structured grid, one needs to store data like this double sol[nx][ny][nz][nvar]; This is easy in fortran, but not so in C/C++ as there are no in-built multi-d arrays. What are the best ways you have found ? __________________ http://cpraveen.github.io http://twitter.com/cfdlab http://github.com/cpraveen

June 28, 2021, 06:24		#3
piu58 Senior Member Uwe Pilz Join Date: Feb 2017 Location: Leipzig, Germany Posts: 744 Rep Power: 15	Cumbersome to write, but dynamic and located at the stack: Code: vector<vector<vector<double>>> myVar(X,vector<vector<double>(Y,vector<double>Z))); __________________ Uwe Pilz -- Die der Hauptbewegung überlagerte Schwankungsbewegung ist in ihren Einzelheiten so hoffnungslos kompliziert, daß ihre theoretische Berechnung aussichtslos erscheint. (Hermann Schlichting, 1950)

July 4, 2021, 03:34		#7
aerosayan Senior Member Sayan Bhattacharjee Join Date: Mar 2020 Posts: 495 Rep Power: 8	I remembered why I used to hate boost library. Here's a "design rationale" for c++ program for calculating distance between 2 points : https://archive.md/2014.04.28-125041...ry/design.html Read through it. Read through it and realize how much badly they messed up even for a simple function as calculating the distance between 2 points. Their whole codebase is bloated, and now a mental gymnastic routine. Realize that the boost developers are such geniuses, they've overflowed their integer IQ score variables, and now each have a negative IQ score. This is why I will never trust boost, and use it in my performance critical code. sbaffini and mb.pejvak like this.

August 28, 2021, 17:27		#9
aerosayan Senior Member Sayan Bhattacharjee Join Date: Mar 2020 Posts: 495 Rep Power: 8	>>> TIP 3 : Stop calculating element indices for your matrix class EDIT1 : After some thorough review, it seems like modern compilers are able to optimize index calculations very well. So, this trick isn't really necessary if your code is being compiled with the latest and greatest compilers. However, if you're supporting older compilers and older hardware, this might be helpful. Although, do your own performance profiling and assembly code analysis on your own, and don't believe everything that's given here, as a fact. C++ doesn't have matrices, so many tend to create matrix classes that store the data inside long arrays, but access the matrix data using row-major or column-major address formulation, and overloading the parenthesis operators. It is a horrible way to do it, since the matrix element's address calculation requires many addition and multiplication operations, and they are not trivial. Here's a significantly better method, that's easy to use, and performs well, as per my preliminary analysis. Needs more testing... Needs lots and lots of testing, before I can say for sure that it performs great. Code: int main() { // linear 1d array containing data for 4x4 2d matrix int array[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; // we store the begining of each row, into another array int * matrix[4] = { &array[0], &array[4], &array[8], &array[12] }; // now access the 1d array, as if it was a 2d matrix for(int i=0; i<4; ++i) { for(int j=0; j<4; ++j) { // courtesy of c style arrays, you can access pointers as arrays // so, we access pointer to the head of each row, as a new array; // thus, we essentially access the 1d array as a 2d matrix. std::cout << matrix[i][j] << " "; } std::cout << std::endl; } return 0; } Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 piu58 and aero_head like this. Last edited by aerosayan; September 4, 2021 at 06:15.

June 28, 2021, 09:46		#5
JBeilke Senior Member Joern Beilke Join Date: Mar 2009 Location: Dresden Posts: 507 Rep Power: 20	Is there a reason not to use the boost library? https://www.boost.org/doc/libs/1_63_.../doc/user.html

September 1, 2021, 19:17		#10
aerosayan Senior Member Sayan Bhattacharjee Join Date: Mar 2020 Posts: 495 Rep Power: 8	improved previous code, and made it more efficient : no extra memory required + we can do linear traversal through the column order matrix, resulting in more efficient code. Code: // template<typename tx> struct dymatrix { tx * head; int ecols, erows; inline tx* operator[](int i) { return &head[i]; } inline const tx* operator[](int i) const { return &head[i]; } }; // int main() { printf("matrix use demonstration...\n"); // float a[100]; for(int i=0; i<100; ++i) a[i] = i; dymatrix<float> h; h.head = reinterpret_cast<float>(a); h.ecols = 10; h.erows = 10; for(int i=0; i<100; i+=h.ecols) { for(int j=0; j<10; ++j) { const float x = h[i][j]; // mark cout << x << " "; } cout << endl; } // return 0; } Code: matrix use demonstration... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 Code: const float x = h[i][j]; // mark 401c99: vmovss xmm0,DWORD PTR [rsp+r144+0x28] // <- NOTICE THIS!!! Naive C++ OOP overloaded accessors using formulas like address = ncolsi + j; generally use assembly ADD,MUL instructions to calculate the address, and thus tend to be slow. They're used for random access, not when doing linear traversal. Fortran compilers seems to be smarter, and recognize linear traversal through memory of a 2D matrix. Assembly generated here, uses register based maths, so most likely* they'reextremely fast. I have seen fortran generate such code, and previously I thought that they were slow. But since the address calculation is using registers, they were actually extremely fast. obviously, more test required. i'm tired. bye. EDIT : Modern C++ compilers can optimize naive implementations very well. Although, I like to write code that works well with older compilers too. So, kindly do your own profiling, as you might not need this technique. But it's a good trick, and useful for writing matrix/tensor math libraries.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[blockMesh] Internal walls of zero thickness	anger	OpenFOAM Meshing & Mesh Conversion	23	February 6, 2020 18:25
[ANSYS Meshing] Share topology and structured mesh	sanket2309	ANSYS Meshing & Geometry	0	December 4, 2019 02:22
[ANSYS Meshing] SolidWorks and Share Topology	ThomasEnzinger	ANSYS Meshing & Geometry	1	May 21, 2018 05:23
[DesignModeler] Share topology issue	rohit.sreekumar	ANSYS Meshing & Geometry	0	August 14, 2017 09:14
[ANSYS Meshing] Connection Group OR Share Topology ?	John_cfd	ANSYS Meshing & Geometry	3	October 9, 2015 10:34