CFD workstation configuration calling for help 2

Freewill1 · July 5, 2020, 02:38

Hi,
I post a thread previously here:

CFD workstation configuration calling for help

After reading Alex's constructive suggests, and making a survey in the past few days, I think my main goals is more clear:
to achive over all performance and good scalability for CFD-related job, i.e.,

OpenFOAM running up to 10~100M FV nodes
Algorithm testing and verification for my own CFD code developing aiming at good parallel performance.

For both jobs, there will be a lot of running of parallel code to solve large-scale linear equations, and I don't want the limitation hardware to get in the way)

Therefore, I am wondering if it possible to reach my key goals with a well-scaling 64-core, 256-GB-RAM machine (or machines).
If so, I don't need to spend much budget for expensive and uneconomical hardware mentioned in the previous thread ($15000~$20000, costs much but of no much use).

So, I thinks its rational to pursue good parallelism (scalability).

Given that Per-Socket Theotetical Memory Bandwidth (204.8 GB/s for EPYC Rome) serving as the memory wall, it seems that the Cores Per RAM Channeal is the key to achieve this scalability for CFD.

According to other people's test (Benchmarking Epyc, Ryzen, and Xeon: Tyranny of Memory) , one should not put more than 2~3 cores per channel in his machine(s).

scalability test.png

Bearing this in mind, a good hardware configuration should include as many memory channels as possible for given budget for CFD usage.

Thus, let one CPU on one socket as a basic computing unit, the above test suggests, take EPYC ROME CPUs as an example, to squeeze their the potentials before breaking the 204.8 GB/s memory wall, ideal cores number for each of this unit are:

8 channelx(2~3) cores/channel = 16~24 cores

If so, computing unit with > 24 core count CPUs are not ideal for the desired scalability, but have to pay much more for useless extra cores (e.g., the 32-core EPYC 7452).

On the other hand, computing unit with < 16 core count CPUs (e.g., 8-core EPYC 7262) can also be avoided because of its low-density (only 8 cores occupying a socket on the motherboard).

Besides, if choosing low core counts CPUs, the extra benefits is their are cheaper (both price and price/core), e.g., for each dual-socket node, prices for key components in my country are:
-CPU:

EPYC 7262 ( 8c/3.20GHz)x2: $ 420x8 = $840
EPYC 7302 (16c/3.00GHz)x2: $ 970x2 = $1,840
EPYC 7402 (24c/2.80GHz)x2: $1,310x2 = $2,620
EPYC 7452 (32c/2.35GHz)x2: $1,770x2 = $3,540
EPYC 7371 (32c/3.60GHz)x2: $ 950x2 = $1,900 (with 3.6GHz all core turbo capability, and cheaper)

-RAM:

3200MHz, RECC, 16GBx16 = 256GB (for one dual-socket node): ~$170x16 = $2,700
3200MHz, RECC, 8GBx32 = 256GB (for two dual-socket nodes): ~$105x32 = $3,360

-Motherboard: Supermicro H11DSi Rev2.0: ~$560

There are several options to built a ≥64-core machine or machines:

option 1: one-node configuration (4 cores/channel): EPYC 7452 (32 cores each)x2 CPUx1 node + 16GBx16 RAM + H11DSix1: ~$6800
option 2: one-node configuration (4 cores/channel): EPYC 7371 (32 cores each)x2 CPUx1 node + 16GBx16 RAM + H11DSix1: ~$5200
option 3: two-node configuration (3 cores/channel): EPYC 7402 (24 cores each)x2 CPUx2 node + 8GBx16x2 RAM + H11DSix2: ~$9800 (NOTE: 96 cores in total)
option 4: two-node configuration (2 cores/channel): EPYC 7302 (16 cores each)x2 CPUx2 node + 8GBx16x2 RAM + H11DSix2: ~$9600
option 5: four-node configuration (1 core/channel): EPYC 7262 (8 cores each)x2 CPUx4 node + 8GBx16x4 RAM + H11DSix4: ~$12500 (NOTE: a little reluctant here to put in 512GB to fill all 64 memory slots to feed EPYC CPU's 8 channels capability)

Questions:
- What configuration can achive best scalability, if putting aside the cost?
- In view of the rule of thumb that favors no more than 2~3 cores/channel in a machine/machines, should I stick to option 3, 4 and even 5? Again, slow node-to-node interconnection and its potentially being cumbersome hardware/hardware setup is my concern.
- Although option 1, 2 are not good for scalability, its advantage seems obvious:

avoid node-to-node interconnection (10Gbps InfiniBand stuff, several OSs...sounds tedious and repetitive for a two/four-node mini cluster if so)
need only one case intead of two (also tedious and repetitive if so)
cheaper as a whole (although budget is not my main concern)

Any suggestions?

July 5, 2020, 02:38	CFD workstation configuration calling for help 2	#1
Freewill1 New Member Join Date: Aug 2014 Posts: 18 Rep Power: 11	Hi, I post a thread previously here: CFD workstation configuration calling for help After reading Alex's constructive suggests, and making a survey in the past few days, I think my main goals is more clear: to achive over all performance and good scalability for CFD-related job, i.e., OpenFOAM running up to 10~100M FV nodes Algorithm testing and verification for my own CFD code developing aiming at good parallel performance. For both jobs, there will be a lot of running of parallel code to solve large-scale linear equations, and I don't want the limitation hardware to get in the way) Therefore, I am wondering if it possible to reach my key goals with a well-scaling 64-core, 256-GB-RAM machine (or machines). If so, I don't need to spend much budget for expensive and uneconomical hardware mentioned in the previous thread ($15000~$20000, costs much but of no much use). So, I thinks its rational to pursue good parallelism (scalability). Given that Per-Socket Theotetical Memory Bandwidth (204.8 GB/s for EPYC Rome) serving as the memory wall, it seems that the Cores Per RAM Channeal is the key to achieve this scalability for CFD. According to other people's test (Benchmarking Epyc, Ryzen, and Xeon: Tyranny of Memory) , one should not put more than 2~3 cores per channel in his machine(s). scalability test.png Bearing this in mind, a good hardware configuration should include as many memory channels as possible for given budget for CFD usage. Thus, let one CPU on one socket as a basic computing unit, the above test suggests, take EPYC ROME CPUs as an example, to squeeze their the potentials before breaking the 204.8 GB/s memory wall, ideal cores number for each of this unit are: 8 channelx(2~3) cores/channel = 16~24 cores If so, computing unit with > 24 core count CPUs are not ideal for the desired scalability, but have to pay much more for useless extra cores (e.g., the 32-core EPYC 7452). On the other hand, computing unit with < 16 core count CPUs (e.g., 8-core EPYC 7262) can also be avoided because of its low-density (only 8 cores occupying a socket on the motherboard). Besides, if choosing low core counts CPUs, the extra benefits is their are cheaper (both price and price/core), e.g., for each dual-socket node, prices for key components in my country are: -CPU: EPYC 7262 ( 8c/3.20GHz)x2: $ 420x8 = $840 EPYC 7302 (16c/3.00GHz)x2: $ 970x2 = $1,840 EPYC 7402 (24c/2.80GHz)x2: $1,310x2 = $2,620 EPYC 7452 (32c/2.35GHz)x2: $1,770x2 = $3,540 EPYC 7371 (32c/3.60GHz)x2: $ 950x2 = $1,900 (with 3.6GHz all core turbo capability, and cheaper) -RAM: 3200MHz, RECC, 16GBx16 = 256GB (for one dual-socket node): ~$170x16 = $2,700 3200MHz, RECC, 8GBx32 = 256GB (for two dual-socket nodes): ~$105x32 = $3,360 -Motherboard: Supermicro H11DSi Rev2.0: ~$560 There are several options to built a ≥64-core machine or machines: option 1: one-node configuration (4 cores/channel): EPYC 7452 (32 cores each)x2 CPUx1 node + 16GBx16 RAM + H11DSix1: ~$6800 option 2: one-node configuration (4 cores/channel): EPYC 7371 (32 cores each)x2 CPUx1 node + 16GBx16 RAM + H11DSix1: ~$5200 option 3: two-node configuration (3 cores/channel): EPYC 7402 (24 cores each)x2 CPUx2 node + 8GBx16x2 RAM + H11DSix2: ~$9800 (NOTE: 96 cores in total) option 4: two-node configuration (2 cores/channel): EPYC 7302 (16 cores each)x2 CPUx2 node + 8GBx16x2 RAM + H11DSix2: ~$9600 option 5: four-node configuration (1 core/channel): EPYC 7262 (8 cores each)x2 CPUx4 node + 8GBx16x4 RAM + H11DSix4: ~$12500 (NOTE: a little reluctant here to put in 512GB to fill all 64 memory slots to feed EPYC CPU's 8 channels capability) Questions: - What configuration can achive best scalability, if putting aside the cost? - In view of the rule of thumb that favors no more than 2~3 cores/channel in a machine/machines, should I stick to option 3, 4 and even 5? Again, slow node-to-node interconnection and its potentially being cumbersome hardware/hardware setup is my concern. - Although option 1, 2 are not good for scalability, its advantage seems obvious: avoid node-to-node interconnection (10Gbps InfiniBand stuff, several OSs...sounds tedious and repetitive for a two/four-node mini cluster if so) need only one case intead of two (also tedious and repetitive if so) cheaper as a whole (although budget is not my main concern) Any suggestions? Last edited by Freewill1; July 10, 2020 at 02:44.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Workstation fot research position in CFD	Laerte	Hardware	9	June 21, 2020 15:17
Home workstation for large memory demand CFD	yutsumi	Hardware	17	May 5, 2020 09:13
Buying refurbished workstation for CFD	fbelga	Hardware	10	November 10, 2019 14:12
Alienware Area 51 R5 as a CFD workstation	fusij	Hardware	1	June 13, 2019 10:15
CFD Online Celebrates 20 Years Online	jola	Site News & Announcements	22	January 31, 2015 00:30