CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

Motivation and introduction for learning assembly coding

Register Blogs Community New Posts Updated Threads Search

Like Tree21Likes
  • 3 Post By aerosayan
  • 3 Post By aerosayan
  • 1 Post By sbaffini
  • 1 Post By aerosayan
  • 2 Post By sbaffini
  • 1 Post By aerosayan
  • 1 Post By aerosayan
  • 1 Post By aerosayan
  • 2 Post By arjun
  • 1 Post By aerosayan
  • 2 Post By arjun
  • 1 Post By aerosayan
  • 1 Post By aerosayan
  • 1 Post By aerosayan

 
 
LinkBack Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
Old   January 14, 2021, 15:32
Default Motivation and introduction for learning assembly coding
  #1
Senior Member
 
Sayan Bhattacharjee
Join Date: Mar 2020
Posts: 495
Rep Power: 8
aerosayan is on a distinguished road
I'm really thankful to this forum and its members for helping me out in my hobby project. However I have seen that many members, even super experienced ones don't want to mess with understanding the machine(let's say assembly for brevity) code generated by the compiler.


I understand why diving deep into the assembly code might not seem very interesting or even worth the effort for most. Unfortunately it is very useful, and people are missing out on the benefits of understanding and analyzing the assembly code.


This post is to start a discussion and me willing to help anyone interested in learning more.

PART 1 : What are the benefits of understanding assembly code?



Here are a few benefits of understanding assembly code :


- We know exactly how much fast a small portion of code will run, since we know from Intel/AMD CPU manuals how much fast (throughput and latency) any particular machine instruction like (IMUL, ADD, SUB) are and if the generated assembly uses the highest SIMD vector registers (i.e XMM (okay), YMM (fast and most common on Intel i3-i5-i7 family), ZMM (really fast and available on special CPUs))


So, if you see that your generated assembly code uses YMM registers, you're in for a good time. That means your code is using AVX2 vectorization and not the slow SSE vectorization.


- If you know how the assembly code works, you know exactly how your data needs to be in order to gain maximum performance. (Hint : 1D arrays is the best. Linear access is the best.)



You might want to store the CSV as vector = [rho, rho*u, rho*v, rho*E][rho, rho*u, rho*v, rho*E]


You might think that having the data close to each other like that will improve your cache performance. You're right. So what's the problem? The problem is that you're doomed to only using slow XMM registers. That's what the compiler will generate in order to be safe.



If you want to force the compiler to generate AVX2 vectorized code that uses YMM registers, you can. However the problem is that in order to use the data in that form (say CSV to PSV conversion operation is required), AVX2 vectorized code will use machine instructions that do complex operations like : interleave the data, permutate the data, rotate the data and other very complex operations.


This will cause your AVX2 "optimized" code to run slower than the code first generated by the compiler.


What's the "right" way in this context?
Store the data as vector = [rho, rho, rho, rho][rho*u, rho*u, rho*u, rho*u], [rho*v, rho*v, rho*v][rho*E, rho*E, rho*E, rho*E] and access the data as arrays U1[i], U2[i], U3[i], U4[i] etc. Where you'll use pointers to set the first element of each of those arrays.


I have done this, and every loop is vectorized to use AVX2 instructions by the compiler!

This is 4X faster than the serial code for double precision data and 8X faster than serial code for single precision data.


That's because YMM registers are 256 bits wide and can fit 4 and 8 double and single precision values respectively.And then every vector operation will add/multiply/subtract/divide the numbers (4 or 8) at once in a single instruction!

If your code isn't using SIMD vectorization, you're wasting performance.

Sorry, that's mathematically proven.



PART 2 : How do we compile and study the assembly code... coming soon..
sbaffini, ships26 and aero_head like this.
aerosayan is offline   Reply With Quote

 

Tags
assembly code, introduction, tutorial


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 05:22.