CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

share your best C/C++ trick/tips

Register Blogs Community New Posts Updated Threads Search

Like Tree12Likes
  • 3 Post By aerosayan
  • 2 Post By aerosayan
  • 2 Post By aerosayan
  • 2 Post By aerosayan
  • 1 Post By piu58
  • 2 Post By aerosayan

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   June 27, 2021, 02:59
Default share your best C/C++ trick/tips
  #1
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Developers try to pick up C/C++ for coding scientific programs, and usually fall into one of the thousand different traps waiting for them. Time is short, and bugs are many. I'm starting this thread, and inviting everyone to share their best tricks/tips. Starting with simple tips, maybe in future, I will add more complex things.



In order to avoid spam, I will update it only once every one month or so.


Kindly mark your tips in order, so that anyone can find them easily in future.


>>> TIP 1 : std::cin, std::cout, std::endl are generally very slow if used incorrectly.

std::cin, std::cout and std::endl were made with user comfort in mind, not performance. So, they suck. std::cin and std::cout will by default, sync with the old C functions printf and scanf. Due to this synchronization, cin and cout have to wait for any previous printf or scanf operations to finish.


This is extremely slow. If you want to read or write a very large file, like a CFD solution file, after every N iterations, you're wasting a lot of time doing it, using the default configuration of cin and cout. I have seen a lot of production code, that wasted 4-5 seconds while loading or writing an extremely big solution from an ASCII file.



The trick, is to tell cin and cout to never synchronize with printf or scanf. However, that means that you can't use printf or scanf without re-activating the synchronization again.


Here's how to speedup cin and cout. Include this in your code, and see results:

Code:
ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
std::endl on the other hand is extremely slow, because it writes a newline, then does a flush, to ensure the newline is written to the console or file. The flush operation is extremely slow. It would be faster to just print a newline, then do the flush operation at the end of a big loop.


Code:
// std::endl is basically std::cout << "\n" << std::flush;


// slow
for(int i=0; ...) { cout << i << endl; }


// faster
for(int i=0; ...) { cout << i << "\n"; } cout << flush;
>>> TIP 2 : Use macros in development, to make your life easier.

Macros are evil. We should know that before we start.
However, they can be used during development, to make our life easier.

I got severe hand pain from using a horrible keyboard at my office, some 1.5 years ago, and I still haven't recovered. So, my keyboard shortcuts, and aliases, are probably the best optimized for speed, and less key crunching.


One of the things I very much hate, is writing for loops in C/C++. Fortran's do loops are significantly better. So, I came up with my own. And, by Gods, I love them.



Code:
// forward loop
//
#define xdo(var, lo, hi) for(decltype(hi) var=(lo); var<(hi) ; ++var)


// reverse loop
//
#define xro(var, hi, lo) for(decltype(hi) var=(hi); var>=(lo); --var)
You use them like:
Code:
long long n = 10;
xdo(i,0,n)   cout << i << " "; cout << endl;
xro(i,n-1,0) cout << i << " "; cout << endl;
You don't need to define types, you don't need to define those pesky inequality symbols, and you can type them out within a second. The type of the variable is defined by the type of n. And, you can create nested loops, as you can change the variable name i, to j,k,l,m,n etc, and it will be reflected back.


However, since this is a macro, use carefully, and with judgement. If you mess up, everything's gonna blow up to high heaven.



Additionally, there's a limitation that OpenMP can't detect these for loops, so you can't easily use #pragma omp parallel directives with this form. You have to write the loop out in normal form. But that's okay for me, as I only use it for rapid development.
sbaffini, ssh123 and aero_head like this.
aerosayan is offline   Reply With Quote

Old   June 28, 2021, 05:53
Default
  #2
Super Moderator
 
Praveen. C
Join Date: Mar 2009
Location: Bangalore
Posts: 342
Blog Entries: 6
Rep Power: 18
praveen is on a distinguished road
I am very interested to know how people allocate multi-dimensional arrays.

For example, on a 3d structured grid, one needs to store data like this

double sol[nx][ny][nz][nvar];

This is easy in fortran, but not so in C/C++ as there are no in-built multi-d arrays. What are the best ways you have found ?
praveen is offline   Reply With Quote

Old   June 28, 2021, 06:24
Default
  #3
Senior Member
 
piu58's Avatar
 
Uwe Pilz
Join Date: Feb 2017
Location: Leipzig, Germany
Posts: 744
Rep Power: 15
piu58 is on a distinguished road
Cumbersome to write, but dynamic and located at the stack:

Code:
vector<vector<vector<double>>> myVar(X,vector<vector<double>(Y,vector<double>Z)));
__________________
Uwe Pilz
--
Die der Hauptbewegung überlagerte Schwankungsbewegung ist in ihren Einzelheiten so hoffnungslos kompliziert, daß ihre theoretische Berechnung aussichtslos erscheint. (Hermann Schlichting, 1950)
piu58 is offline   Reply With Quote

Old   June 28, 2021, 07:32
Default
  #4
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by praveen View Post
I am very interested to know how people allocate multi-dimensional arrays. ... .For example, on a 3d structured grid, one needs to store data like this double sol[nx][ny][nz][nvar]; ... This is easy in fortran, but not so in C/C++ as there are no in-built multi-d arrays.

In case you're not using the data structure for performance critical operations, you can follow the method provided by piu58. That seems correct, and a really good time-saving trick.



In case you're using the data structure for performance critical operations, it's simple. You don't, if you don't understand the underlying memory layout of the nested std::vectors.


Fortran multi dimensional arrays have a good property of using flat memory. i.e the whole multi dimensional array will be flattened out into a single chunk of 1D block on the heap (if your array is dynamic and/or very large) or stack (if your array is small). I'm not an expert on this topic, so I can't guarantee that's what happens in every implementation. However, as per my experience this is how it works.


However, C++ doesn't have multi dimensional array support. It only has 1D array support. So, in most general cases, you'll use std::vector to represent your dynamic 1D arrays. You can also use nested std::vector as std::vector<std::vector<double>> to represent 2 dimensional array or matrix.


The problem is that, in specification of std::vector, it's clearly stated that they can only guarantee uniform flat memory for only 1 level of std::vector. That is, in your 2D multi dimensinal std::vector, only the last nested layer i.e std::vector<double> will be laid out uniformly into a single chunk of memory.


If we consider the std::vector<double> to be the rows in std::vector<std::vector<double>>, there's only a guarantee of uniform memory chunk allocation for each row, and every different row can be far away from each other in memory.


The performance implication being : you'll have severe cache misses, if the different rows are far away from each other in memory. This is extremely bad in case your matrix is small (like 4*4, or 5*5), but due to the implementation of nested std::vectors, you might have sever cache misses when doing even simple mathematical operation on different rows of the matrices.


And since there's no guarantee of how far the different std::vector rows will be in memory, you have to consider the worst case possibility that they'll have to be loaded from RAM.


That's why I manually allocate memory locations in huge chunks of 1D arrays of size NROWS*NCOLS, for 2D matrices. This way, even if your matrix is small or big, your data will be contiguous in memory, so you'll have extremely good performance, provided you access the data correctly.


However, data access is a little bit more complicated, so maybe I'll write about it in future.


Quote:
Originally Posted by piu58 View Post
located at the stack

Unfortunately, they aren't : https://stackoverflow.com/questions/8036474


The header information of the std::vector will be on stack. That's the basic housekeeping information regarding the std::vector will be on stack.


However, your data will be stored internally, as a pointer to a huge chunk of 1D contiguous memory block in heap. This allows you to dynamically change the pointer to different locations in heap, in case you need to update the vector's size.


Of course the memory layout for multi dimensional vectors are more complicated, and explained above.
piu58 and aero_head like this.
aerosayan is offline   Reply With Quote

Old   June 28, 2021, 09:46
Default
  #5
Senior Member
 
Joern Beilke
Join Date: Mar 2009
Location: Dresden
Posts: 507
Rep Power: 20
JBeilke is on a distinguished road
Is there a reason not to use the boost library?

https://www.boost.org/doc/libs/1_63_.../doc/user.html
JBeilke is offline   Reply With Quote

Old   June 28, 2021, 09:57
Default
  #6
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
Quote:
Originally Posted by JBeilke View Post
Is there a reason not to use the boost library?

https://www.boost.org/doc/libs/1_63_.../doc/user.html

If you can guarantee good/acceptable performance for your code, then there's nothing wrong with using any library. Personally I don't like using any library except the most standard ones (even there, I don't use every part of STL for my critical sections), simply because I don't feel comfortable with hidden abstractions of each library. Most of these libraries do a whole lot of Template Meta-programming and Object Oriented Programming, and that automatically worsens compilation time, and code performance.


Other than that, boost seems like a well used library for the portions of the codes that aren't as performance critical. Can be very useful in programming different things for the GUI.


Personally, I don't want the headache of downloading the correct boost version when deploying my code to a different machine.
piu58 and aero_head like this.
aerosayan is offline   Reply With Quote

Old   July 4, 2021, 03:34
Default
  #7
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
I remembered why I used to hate boost library.


Here's a "design rationale" for c++ program for calculating distance between 2 points : https://archive.md/2014.04.28-125041...ry/design.html


Read through it. Read through it and realize how much badly they messed up even for a simple function as calculating the distance between 2 points. Their whole codebase is bloated, and now a mental gymnastic routine.


Realize that the boost developers are such geniuses, they've overflowed their integer IQ score variables, and now each have a negative IQ score.


This is why I will never trust boost, and use it in my performance critical code.
sbaffini and mb.pejvak like this.
aerosayan is offline   Reply With Quote

Old   July 4, 2021, 09:21
Default
  #8
Senior Member
 
piu58's Avatar
 
Uwe Pilz
Join Date: Feb 2017
Location: Leipzig, Germany
Posts: 744
Rep Power: 15
piu58 is on a distinguished road
Dear aerosayan

that example is great! It shows how badly the code is bloated with a so calle well structured library.

Thank you for correcting my mistake (stack) and for your opinions when using vector. It is not too hard multiplying the indices into a flat memory as you mentioned.


Quote:
Personally I don't like using any library except the most standard ones
Me too. We should keep in mind that our code may be useful even if the most libraries of today are gone or cannot be found anymore. Every use of a library makes your code lesser readable. Decent C++ code is like a mathematic textbook: Is is valid forever.
aerosayan likes this.
__________________
Uwe Pilz
--
Die der Hauptbewegung überlagerte Schwankungsbewegung ist in ihren Einzelheiten so hoffnungslos kompliziert, daß ihre theoretische Berechnung aussichtslos erscheint. (Hermann Schlichting, 1950)
piu58 is offline   Reply With Quote

Old   August 28, 2021, 17:27
Default
  #9
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
>>> TIP 3 : Stop calculating element indices for your matrix class

EDIT1 : After some thorough review, it seems like modern compilers are able to optimize index calculations very well. So, this trick isn't really necessary if your code is being compiled with the latest and greatest compilers. However, if you're supporting older compilers and older hardware, this might be helpful. Although, do your own performance profiling and assembly code analysis on your own, and don't believe everything that's given here, as a fact.

C++ doesn't have matrices, so many tend to create matrix classes that store the data inside long arrays, but access the matrix data using row-major or column-major address formulation, and overloading the parenthesis operators. It is a horrible way to do it, since the matrix element's address calculation requires many addition and multiplication operations, and they are not trivial.

Here's a significantly better method, that's easy to use, and performs well, as per my preliminary analysis. Needs more testing... Needs lots and lots of testing, before I can say for sure that it performs great.

Code:
int main()
{
    // linear 1d array containing data for 4x4 2d matrix
    int array[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};

    // we store the begining of each row, into another array
    int * matrix[4] = { &array[0],  &array[4], &array[8], &array[12] };

    // now access the 1d array, as if it was a 2d matrix
    for(int i=0; i<4; ++i)
    {
        for(int j=0; j<4; ++j)
        {
            // courtesy of c style arrays, you can access pointers as arrays
            // so, we access pointer to the head of each row, as a new array;
            // thus, we essentially access the 1d array as a 2d matrix.
            std::cout << matrix[i][j] << " ";
        }
        std::cout << std::endl;
    }

    return 0;
}
Code:
1 2 3 4 
5 6 7 8 
9 10 11 12 
13 14 15 16
piu58 and aero_head like this.

Last edited by aerosayan; September 4, 2021 at 06:15.
aerosayan is offline   Reply With Quote

Old   September 1, 2021, 19:17
Default
  #10
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
improved previous code, and made it more efficient : no extra memory required + we can do linear traversal through the column order matrix, resulting in more efficient code.


Code:
//
template<typename tx>
struct dymatrix
{
    tx * head;
    int ecols, erows;
    inline tx* operator[](int i) { return &head[i]; }
    inline const tx* operator[](int i) const { return &head[i]; }
};
//

int main()
{
    printf("matrix use demonstration...\n");
    //
    float a[100]; for(int i=0; i<100; ++i) a[i] = i;
    dymatrix<float> h;

    h.head = reinterpret_cast<float*>(a);
    h.ecols = 10;
    h.erows = 10;

    for(int i=0; i<100; i+=h.ecols)
    {
    for(int j=0; j<10; ++j)
    {
        const float x = h[i][j]; // mark
        cout << x << " ";
    }
    cout << endl;
    }
    //
    return 0;
}
Code:
matrix use demonstration...
0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99
Code:
             const float x = h[i][j]; // mark
  401c99:    vmovss xmm0,DWORD PTR [rsp+r14*4+0x28] // <- NOTICE THIS!!!
Naive C++ OOP overloaded accessors using formulas like address = ncols*i + j; generally use assembly ADD,MUL instructions to calculate the address, and thus tend to be slow. They're used for random access, not when doing linear traversal. Fortran compilers seems to be smarter, and recognize linear traversal through memory of a 2D matrix.

Assembly generated here, uses register based maths, so most likely they'reextremely fast. I have seen fortran generate such code, and previously I thought that they were slow. But since the address calculation is using registers, they were actually extremely fast.

obviously, more test required. i'm tired. bye.


EDIT : Modern C++ compilers can optimize naive implementations very well. Although, I like to write code that works well with older compilers too. So, kindly do your own profiling, as you might not need this technique. But it's a good trick, and useful for writing matrix/tensor math libraries.
aerosayan is offline   Reply With Quote

Reply

Tags
c++ tips and tricks


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[blockMesh] Internal walls of zero thickness anger OpenFOAM Meshing & Mesh Conversion 23 February 6, 2020 18:25
[ANSYS Meshing] Share topology and structured mesh sanket2309 ANSYS Meshing & Geometry 0 December 4, 2019 02:22
[ANSYS Meshing] SolidWorks and Share Topology ThomasEnzinger ANSYS Meshing & Geometry 1 May 21, 2018 05:23
[DesignModeler] Share topology issue rohit.sreekumar ANSYS Meshing & Geometry 0 August 14, 2017 09:14
[ANSYS Meshing] Connection Group OR Share Topology ? John_cfd ANSYS Meshing & Geometry 3 October 9, 2015 10:34


All times are GMT -4. The time now is 21:34.