CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > SU2

Weak scaling of SU2

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 25, 2021, 05:43
Default Weak scaling of SU2
  #1
New Member
 
Ioannis Mandralis
Join Date: May 2021
Posts: 7
Rep Power: 4
imandralis is on a distinguished road
Dear all,

I am trying to perform a weak scaling analysis of the SU2 code.

I am observing a rapid decrease of weak scaling efficiency as the number of processors increase.

For example, a channel flow with 100,000 points on 1 core is only 78% efficient for 800'000 points on 8 cores. If I go to 27 cores (again keeping the number of points per core to a constant 100,000 points) then the efficiency drops to 38%.

I have compiled the code using gcc/10.2.0 with the following flags to tune specifically for an AMD EPYC 7742 architecture:

CXXFLAGS="-O3 -march=znver1 -mtune=znver1 -mfma -mavx2 -m3dnow -fomit-frame-pointer -funroll-loops"

I then configure SU2 using meson:

./meson.py build --optimization=2 -Denable-openblas=true

And I compile using ninja: ./ninja -C build install

Could I be doing something wrong, or is this weak scaling performance expected ?

Thanks in advance
imandralis is offline   Reply With Quote

Old   May 25, 2021, 07:04
Default
  #2
pcg
Senior Member
 
Pedro Gomes
Join Date: Dec 2017
Posts: 465
Rep Power: 13
pcg is on a distinguished road
Are those cores on the same processor?

If they are, you should not expect perfect scaling, because you are not perfectly scaling the available resources. For example, the same amount of L3 cache will be divided over more cores, the same for memory bandwidth (which is the main bottleneck for low order codes).

Even for a compute-bound task, as you put more load on the processor the frequency drops...

When we measure scaling we typically increase the number of cluster nodes that we use. And I think that partially loading the hardware does not make sense.
https://www.researchgate.net/publica...zations_in_SU2

Please read the link I sent you the other day... O3 gives you worse performance, and the architecture of that processor is znver2 not znver1.
And if you use -march and -mtune you don't need -mfma -mavx2 -m3dnow
pcg is offline   Reply With Quote

Old   May 25, 2021, 07:07
Default
  #3
New Member
 
Ioannis Mandralis
Join Date: May 2021
Posts: 7
Rep Power: 4
imandralis is on a distinguished road
Dear Pedro,

Yes the cores are on the same processor. I suspected that this might be causing problems.

So I should distribute over multiple cluster nodes, and each node should be fully loaded (i.e. using all available cores) ?

Furthermore, I took the CXXFLAGS from the specification sheet for the particular chips I am using. I will change back to O2 and adjust the flags you mentioned.

Thanks
imandralis is offline   Reply With Quote

Old   May 25, 2021, 08:04
Default
  #4
pcg
Senior Member
 
Pedro Gomes
Join Date: Dec 2017
Posts: 465
Rep Power: 13
pcg is on a distinguished road
That is how I do it, but those CPU have a lot of cores vs the number of memory channels (8:1) and so I would not be surprised if at some point the performance got better by using only 32 or 48 cores per processor.
Because domain decomposition introduces some overhead (load balancing issues, redundant computations, etc.). I imagine it will not be a problem at 100k points per core, but it might be something to consider at 5-10k points per core.

To be clear you only need "-march=znver2 -mtune=znver2 -funroll-loops" O2 is implied by --optimization=2

If you are using schemes that support vectorization than you may also consider "-ffast-math -fno-finite-math-only" (gcc's fast-math is very mild) because some operations cannot be fully vectorized without that.

Finally, for most fluid problems it is fine to use single precision linear algebra (meson.py ... -Denable-mixedprec=true) I've used it on meshes with 100M+ nodes and very stretched boundary layers without problems.

Regards,
Pedro
pcg is offline   Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mesh Dimension Scaling in SU2 aasha SU2 4 August 4, 2022 17:07
Introducing SU2 International Developers Society (IDS) fpalacios SU2 News & Announcements 1 June 17, 2019 23:38
SU2 code scaling poorly on multiple nodes Samirs SU2 1 August 25, 2018 20:15
SU2 scaling mesh option disappeared in the v4.0 nortanapura SU2 9 July 29, 2015 17:20
Scaling for parallel run for SU2 nileshjrane SU2 1 January 1, 2015 21:06


All times are GMT -4. The time now is 23:42.