CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

CPU+GPU workstation for chess

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   November 10, 2022, 20:56
Default
  #21
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
I am having difficulty installing windows 10 as UFEI in this.
Do I need to change any of the BIOS settings?
tturgut is offline   Reply With Quote

Old   November 11, 2022, 02:54
Default
  #22
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I also had some issues when installing Win10 on an H11DSi back in the days.
I don't remember exactly, but toggling the setting for "above 4G decoding" and/or "IOMMU" fixed it for me. Maybe this still helps on the H12 series.
flotus1 is offline   Reply With Quote

Old   November 11, 2022, 21:28
Default
  #23
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
I am stuck.. :-(

I cant install windows to the BIOS.

W10 installation didn't work..
w11 went through. But when it starts, it says HYPERVISOR ERROR, with a blue screen

Any help is greatly appreciated...
tturgut is offline   Reply With Quote

Old   November 12, 2022, 05:12
Default
  #24
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I am not going into Windows 11.

With 10, do you have some of its hardware limitations in mind?
There are different values floating around on the internet. With the way Win10 handles high core count CPUs -splitting them into "sockets" artificially- while only supporting 2 sockets, 256 threads might just be too much for Windows 10 Pro.
Did you try disabling SMT? To see if it boots into Windows at all with lower thread+socket count.
And I'm moderately certain that disabling "above 4G decoding" is what helps Supermicro boards boot into unsupported Windows versions. It shouldn't, but many people report that it does.

Then there is the basic stuff. The amount of times I had to just redo a USB stick for Windows 10 to get the installation working is surprisingly high.
Does the system boot into Linux from a USB stick? At which point exactly does the installation fail for Windows 10?
But start with clearing CMOS. On these boards, that involves removing the CMOS battery, and then bridging two contact pads on the board. Check the manual. Then apply only the settings you need.
Updating to the latest bios version is the next step.

Last edited by flotus1; November 12, 2022 at 07:20.
flotus1 is offline   Reply With Quote

Old   November 14, 2022, 07:18
Default
  #25
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
The setup:
1) SUPERMICRO MBD-H12DSi-N6-O motherboard

https://www.supermicro.com/en/produc...ard/H12DSi-NT6 https://www.supermicro.com/manuals/m...0/MNL-2363.pdf

2) CPU: 2x AMD Epyc 7V12 processors (64 core each, 240 TDP, 2.45 Ghz)

3) memory: 16x 32GB 4DRx4 PC4-2133 LRDIMM DDR4-17000 ECC

https://www.ebay.com/itm/224440216145

3) CPU Coolers : Noctua NH-U14S TR4-SP3

https://www.amazon.com/dp/B074DX2SX7...roduct_details

4) PSU: Corsair HX1500i

https://www.amazon.com/dp/B074DX2SX7...roduct_details

I did research before buying every single component.

the board posts. i can get into the BIOS.

But there are big problems. I have a hard time downloading windows 10. mem86 tests fails (stops at the beginning)

Supermicro IPMI: sees both processors and all 16 memory sticks and thinks they are healthy. (i also decreased to 2 rams (one for each processor) and they are healthy.

before post, Sometimes "CPU initialization" fails. the screen gets stuck at 68

------------------------------------------------------------------------------
Questions:
1) Where is the problem? any one have any ideas?

2) CPU installation: CPU torque amount? I had a s20 tip scre driver, but not sure about the amount of force? what do i do?

3) mem86 flash drive; CPU found 0, CPU started 1!! and stops immediately (i have a creen shot, if it would help) tried with 16 Rams and 2 rams...
???
_____________________________________________

thanks!!
tturgut is offline   Reply With Quote

Old   November 14, 2022, 10:58
Default
  #26
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
Where I am right now:
it posts and goes to BIOS fine.
IPMI: I see 2x CPU's and Memory as healthy. (tried 16 and 2 memory sticks all shown healthy)

mem86 (latest version, starts and stops immediately) like a hardware failure.

Also, sometimes-rarely during posting it gets stuck (and needs rebooting): at "CPU initialization" code 68
CPU installation was fine.

case standoffs were very small, i wonder if it may be related with that.
----------------------------------
I will try 1 CPU 1 ram tonight and see what happens.

thanks for all the input!!
tturgut is offline   Reply With Quote

Old   November 14, 2022, 11:26
Default
  #27
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
If you suspect problems with standoffs or clearance on the back of the board, it is definitely a good idea to do further testing outside of the case. You would not be the first one with that problem. Lay the board on a flat, non-conductive surface. Some cardboard for example.
And like we already discussed previously via PM, re-seating the CPUs while carefully following the installation instructions and torque specs might help.
Maybe one of your friends can lend you some RDIMM modules that are known to be healthy. Good luck!

Last edited by flotus1; November 15, 2022 at 04:54.
flotus1 is offline   Reply With Quote

Old   November 16, 2022, 13:38
Default
  #28
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
I think it is most likely a RAM problem.

I was occasionally getting stuck at " CPU initialazation" ....68 screen.

I removed the RAMS , left one on each (3rd location) and with 2 ram stick it worked. then 4,8 worked.

But 16 x16 fails. This is a 4x4 rank memory. 16 x16GB 4x4 ram is not compatible for some reason.

IPMI sees all 16 rams as healthy.

I ordered new ram, 16x16GB dual rank.
tturgut is offline   Reply With Quote

Old   November 17, 2022, 19:29
Default
  #29
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
1) Supermicro says: POST Code 68 is PCI host bridge initialization

2) new ram coming tomorrow (16x16GB, dual rank)

3) CPU temps are 36C but, NB temp is 76C! How high is OK???
this is a server motherboard, shouldn't it run 24/7? Isnt this temp a problem?

thanks!
tturgut is offline   Reply With Quote

Old   November 18, 2022, 03:42
Default
  #30
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
With it being a server board, the assumption on everything cooling related is that there is a significant amount of airflow over all components of the board. Hidden deep within the documentation, there might even be a spec for linear airflow over the board.
That's one more reason why you can't have too many case fans when using such a board in a workstation.
But 75°C on the NB is nothing to worry about. If it goes above 90°C when you start using the thing at full power you might need to rethink the cooling situation. And don't be alarmed, VRMs can go over 100°C with your setup. That's only a problem if it starts throttling the CPUs.

Last edited by flotus1; November 18, 2022 at 05:18.
flotus1 is offline   Reply With Quote

Old   November 21, 2022, 11:45
Default
  #31
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
1) SUPERMICRO MBD-H12DSi-N6-O motherboard

https://www.supermicro.com/en/produc...ard/H12DSi-NT6 https://www.supermicro.com/manuals/m...0/MNL-2363.pdf

how many nvme slots?

there is one M2 on board.
---------------------------------------------------------------------------------
But, there is CN1/CN2 (next to the 24 pin power connection)

it says nvme.. can I use these 2 also?
tturgut is offline   Reply With Quote

Old   November 21, 2022, 13:35
Default
  #32
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Maybe if you found the right cables/adapters for it. I don't think you can plug an m.2 SSD into these ports.
flotus1 is offline   Reply With Quote

Old   November 21, 2022, 14:01
Default
  #33
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
something like these?
Does anyone have an idea which cable?

https://www.amazon.com/Bewinner-Conn...050869&sr=8-13

https://www.amazon.com/dp/B07VB6L8SJ...v_ov_lig_dp_it
tturgut is offline   Reply With Quote

Old   November 21, 2022, 16:04
Default
  #34
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
That's one of those questions the folks over on https://forums.servethehome.com/index.php could answer.
flotus1 is offline   Reply With Quote

Old   November 21, 2022, 22:15
Default
  #35
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
Someone said it looks like a "oculink SFF 8611 " ? Not sure..

Now I have another question: when i put chess programs at 256 cores. CPU0 is running at 100%, but CPU 1 is running at 50%. why?

( torque screw, HT1 thermal paste etc used. Temps are good at CPu and ram.
why is both not running at 100%?
tturgut is offline   Reply With Quote

Old   November 21, 2022, 22:21
Default
  #36
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
Now I have another question: when i put chess programs at 256 cores.
all 16 ram slots full with 16GB each. IPMI shows all 16 RAM and 2 CPU's as healthy.

But, CPU0 is running at 100%, but CPU 1 is running at 50%. why???

I don't think there were any CPU installation errors. (torque screw, HT1 thermal paste etc used.
Temps are good at CPU and ram.

why is both not running at 100%?
tturgut is offline   Reply With Quote

Old   November 22, 2022, 05:54
Default
  #37
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Now those are definitely questions for experts in the chess program you use. Remember, this is a CFD forum.
Maybe it just doesn't scale well with this many threads? Or maybe it needs some special settings to run on NUMA machines or high thread count in general? Even if I knew which programs you are using, I probably could not help you with that.

As for "temps are good":
With this kind of workstation, you need to monitor additional values. The CPU VRMs in particular, and the memory VRMs to a lesser extent. And CPU core frequencies as well, because that's what matters in the end. If anything else caused throttling, you will see it there.
flotus1 is offline   Reply With Quote

Old   November 23, 2022, 22:55
Default
  #38
New Member
 
Tansel
Join Date: Nov 2022
Posts: 21
Rep Power: 3
tturgut is on a distinguished road
This computer is too strong for chess players, they do not have much experience with this setup unfortunately.

1) with IPMI, i am able to get the VRM temps.(they are high 65-75 highest)
How do i decrease these temps ?

I put bunch of noctua air coolers (everywhere I could)+ an RTX 4090 videocard is installed now. (RTX is liquid MSI Suprim liquid- AIO, taking 240 mmfans space, but cools well.

2) VRM: are putting a couple small (10-15 mm) passive small heat sinks OK? (or would it help at all)?
some people are putting a 40 mm fan in between the 2 processors.

3) and these fans (especially CPU fans) are on idle continuously starting every 10-15 seconds. Is there a way i can silence them? (CPU temps are 35, very good, but id does not want to stay idle)

thanks!
tturgut is offline   Reply With Quote

Old   November 24, 2022, 12:25
Default
  #39
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
1) with IPMI, i am able to get the VRM temps.(they are high 65-75 highest)
How do i decrease these temps ?
Again, that's a perfectly fine temperature for VRMs. The components should be designed for something like 120°C. Throttling territory begins sooner, but definitely not lower than 90°C.
Better cooling can be done with a separate water block on the VRMs in case of water cooling.
For air cooling, two 40mm fans with 10mm height can be wedged between the heatsink on the CPU VRMs and the CPU coolers.

Quote:
2) VRM: are putting a couple small (10-15 mm) passive small heat sinks OK? (or would it help at all)?
some people are putting a 40 mm fan in between the 2 processors.
Been there, done that.
Keep in mind that the CPU VRMs already have heatsinks on them. Those could be cooled with additional small fans if necessary.
And I glued some small heatsinks on the bare memory VRMs, because I have a memory-intensive workload and low airflow. Hard to tell how much of a difference that makes, I had too much time on my hands during a lockdown.
Again, none of this is necessary with 75°C on the VRMs.

Quote:
3) and these fans (especially CPU fans) are on idle continuously starting every 10-15 seconds. Is there a way i can silence them? (CPU temps are 35, very good, but id does not want to stay idle)
Known and old problem with Supermicro boards.
You can search in the IPMI web interface if they allow you to change fan thresholds there. It was not possible in the past.
Otherwise the solution is this: https://calvin.me/quick-how-to-decre...-fan-threshold
No idea how to do any of this in Windows.
flotus1 is offline   Reply With Quote

Old   November 24, 2022, 12:54
Default
  #40
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,399
Rep Power: 46
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Also, my guess is that you are using stockfish. There are quite a few people running it on highend hardware like this.
https://ipmanchess.yolasite.com/amd-...-stockfish.php
https://github.com/official-stockfis...sh/issues/2448
https://forums.servethehome.com/inde...3/#post-355864

There is even a discord and a forum linked on https://stockfishchess.org/get-involved/
wkernkamp likes this.
flotus1 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
General recommendations for CFD hardware [WIP] flotus1 Hardware 19 February 29, 2024 12:48
GPU acceleration in Ansys Fluent flotus1 Hardware 63 May 12, 2023 02:48
CPU for Flow3d mik_urb Hardware 4 December 4, 2022 22:06
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 05:36
Star cd es-ice solver error ernarasimman STAR-CD 2 September 12, 2014 00:01


All times are GMT -4. The time now is 10:01.