CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

Dual Epyc 7742 is vastly slower than single threadripper.. help

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By juyoung518

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 5, 2023, 12:08
Default Dual Epyc 7742 is vastly slower than single threadripper.. help
  #1
New Member
 
juyoung
Join Date: May 2023
Posts: 4
Rep Power: 2
juyoung518 is on a distinguished road
I’ve got two systems - a dual 7742 node, and a single threadripper 5965wx system.

Both have 256gb of 3200mhz ram and runs on windows 10 pro for workstation.

My main use for ansys is explicit dynamics;

However, today I see that the dual 7742 system is x4 slower than the threadripper in exactly the same settings, with only the core number different (64 for the 7742 system, 20 for threadripper).

Moreover, I see that the simulation speed for the 7742 system is inversely proportional to the core count. 64 cores show 2000h remaining, while when using only 16 cores show 700h (Threadripper shows 400h).

I have no idea what the problem is.

Could anyone guide me through this?
juyoung518 is offline   Reply With Quote

Old   May 5, 2023, 13:53
Default
  #2
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
You have observed water flowing uphill. This cannot be, but without further info and some benchmark test results it is not possible to help.
wkernkamp is offline   Reply With Quote

Old   May 5, 2023, 14:39
Default
  #3
New Member
 
juyoung
Join Date: May 2023
Posts: 4
Rep Power: 2
juyoung518 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
You have observed water flowing uphill. This cannot be, but without further info and some benchmark test results it is not possible to help.
Passmark shows slightly low but normal values - CPU mark = 91858, Single thread = 2100, Floating point = 379838.

Cinebench R23 also shows normal values, greatly exceeding the threadripper.

Exact setup is : Dual 7742 / Supermicro H12DSI / 3200mhz samsung 256g

If I use 127 cores (SMT Off), same simulation shows ~4000h remaining.
For 60 cores, 1400-2000h, and for 16 cores, 700-900h.

I have no idea what might be the problem..
wkernkamp likes this.
juyoung518 is offline   Reply With Quote

Old   May 5, 2023, 15:07
Default
  #4
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
However, today I see that the dual 7742 system is x4 slower than the threadripper in exactly the same settings, with only the core number different (64 for the 7742 system, 20 for threadripper).

Moreover, I see that the simulation speed for the 7742 system is inversely proportional to the core count. 64 cores show 2000h remaining, while when using only 16 cores show 700h (Threadripper shows 400h).
You should probably test the scaling behavior on the TR system as well.
I.e. see what time it reports for e.g. 4,8,20,24 threads. Maybe we see the same trend here, and this is not a hardware issue.

Aside from that: Memory population matters A LOT on dual-socket Epyc systems.
"3200mhz samsung 256g" still leaves a lot of room for interpretation. On an H12DIs motherboard, these should be populated as 16x16GB. Can you confirm that?
flotus1 is offline   Reply With Quote

Old   May 5, 2023, 15:11
Default
  #5
New Member
 
juyoung
Join Date: May 2023
Posts: 4
Rep Power: 2
juyoung518 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
You should probably test the scaling behavior on the TR system as well.
I.e. see what time it reports for e.g. 4,8,20,24 threads. Maybe we see the same trend here, and this is not a hardware issue.

Aside from that: Memory population matters A LOT on dual-socket Epyc systems.
"3200mhz samsung 256g" still leaves a lot of room for interpretation. On an H12DIs motherboard, these should be populated as 16x16GB. Can you confirm that?
Thank you. I will try to test the behavior.

And yes, the RAM is populated in a 16gb x 16 slot configuration. i ran some more benches and the hardware at least seems fine…
juyoung518 is offline   Reply With Quote

Old   May 5, 2023, 16:45
Default
  #6
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quick question: how many nodes does your mesh contain?
flotus1 is offline   Reply With Quote

Old   May 5, 2023, 16:48
Default
  #7
New Member
 
juyoung
Join Date: May 2023
Posts: 4
Rep Power: 2
juyoung518 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Quick question: how many nodes does your mesh contain?
The total nodes are about 600k for this model.
juyoung518 is offline   Reply With Quote

Old   May 5, 2023, 16:50
Default
  #8
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
All right, just wanted to make sure this is not a memory capacity problem. Which it isn't.
flotus1 is offline   Reply With Quote

Old   May 6, 2023, 06:04
Default
  #9
New Member
 
Gokhan
Join Date: Dec 2020
Location: Stockholm
Posts: 5
Rep Power: 5
GC94 is on a distinguished road
Core count can be like this
2000 hours
-------
64 cores
In the end makes 31.25 hours.
GC94 is offline   Reply With Quote

Old   May 6, 2023, 15:14
Default
  #10
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 316
Rep Power: 12
wkernkamp is on a distinguished road
Quote:
Originally Posted by GC94 View Post
Core count can be like this
2000 hours
-------
64 cores
In the end makes 31.25 hours.

64 x 31.25 =~ 2000


So the estimation is based on the number of rows solved by one processor compared to the total number of rows that need to be solved for the requested number of iterations. In future, you can simply divide the time estimate by the number of processors engaged to get the true time estimate.


The Threadripper does appear to be faster (if you use my estimator). This may be due to the fact that at 16 cores, eight memory channels are not a limitation, so the higher clock of the threadripper then makes a difference. In addition, the problem sizes is smallish so the cache is more effective, further reducing bandwidth issues. It would be nice to learn actual times for the cases that you provided an ANSYS estimate for.
wkernkamp is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dual Nodes is Slower Than Single Node (Reposting) Mrxlazuardin Hardware 1 May 26, 2010 10:25
Dual Nodes is Slower Than Single Node Mrxlazuardin FLUENT 0 May 21, 2010 01:48
Single vs Dual Processors Sam Z CFX 4 October 22, 2002 17:17
P4 1.5 or Dual P3 800EB on Gibabyte board Danial FLUENT 4 September 12, 2001 11:44
dual or single reza FLUENT 4 August 12, 2001 07:38


All times are GMT -4. The time now is 07:52.