CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Hardware

SM issues H12DSi-N6

Register Blogs Community New Posts Updated Threads Search

Like Tree4Likes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   February 8, 2023, 19:34
Default
  #21
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 343
Rep Power: 13
wkernkamp is on a distinguished road
There is an amibios post code manual on the SuperMicro website. The one I found is applicable to boards up to the X11 generation. Not sure if there were any changes since. Code 94 indicates "PCI Bus Enumeration". So it might be worth it to check if there are any other pci devices in your system besides the Nvme SSD that you already removed and reinstalled.


The problem is almost never with the CPU. The RDIMMs usually are fine too. Even the up-clocked components from China did work at the up-clocked speed. The sellers didn't realize they were dealing with Sherlock Holmes at the time, haha.


Memory problems in most cases are caused by the motherboard. Number one cause are bent pins in the socket. This can sometimes be addressed by carefully straightening pins. Use your phone to make a good photo to inspect at your leasure. If you find any issue, you send the photo to the seller and, if you are up to it, ask if they want you to attempt to straighten a pin. I have done this approach with a cooperative seller. Very satisfying when it succeeded. However, I think you are getting to the point that you are entitled to return the motherboard (in my opinion.)
wkernkamp is offline   Reply With Quote

Old   February 8, 2023, 19:42
Default
  #22
New Member
 
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3
klove007 is on a distinguished road
Quote:
Originally Posted by wkernkamp View Post
There is an amibios post code manual on the SuperMicro website. The one I found is applicable to boards up to the X11 generation. Not sure if there were any changes since. Code 94 indicates "PCI Bus Enumeration". So it might be worth it to check if there are any other pci devices in your system besides the Nvme SSD that you already removed and reinstalled.


The problem is almost never with the CPU. The RDIMMs usually are fine too. Even the up-clocked components from China did work at the up-clocked speed. The sellers didn't realize they were dealing with Sherlock Holmes at the time, haha.


Memory problems in most cases are caused by the motherboard. Number one cause are bent pins in the socket. This can sometimes be addressed by carefully straightening pins. Use your phone to make a good photo to inspect at your leasure. If you find any issue, you send the photo to the seller and, if you are up to it, ask if they want you to attempt to straighten a pin. I have done this approach with a cooperative seller. Very satisfying when it succeeded. However, I think you are getting to the point that you are entitled to return the motherboard (in my opinion.)
appreciate that, I found the same guide. It was just the last code I saw before posting.

I have since removed the NVME (even though I dont think it was causing issues, just incase)

I have narrowed it down to 7 dimms that work in the slots that the other dimms dont... so I believe its not the motherboard.

I have 5 that seem to be install in the empty slots but not detected and 4 that cause the system not to post.

I am going to try them in all the other slots tonight and see if I can get past 448gb posting.

90 secs per dimm test per slot. Is the post time to the bios when it works.

I have cleaned the ones that do not post as well to ensure contacts
wkernkamp likes this.

Last edited by klove007; February 8, 2023 at 20:50.
klove007 is offline   Reply With Quote

Old   February 8, 2023, 23:50
Default
  #23
Senior Member
 
Will Kernkamp
Join Date: Jun 2014
Posts: 343
Rep Power: 13
wkernkamp is on a distinguished road
Nice progress!


Once you have a known configuration that works, the memory training time will probably reduce for subsequent boots.


Is there anything special about the DIMMs that do not work?
wkernkamp is offline   Reply With Quote

Old   February 14, 2023, 01:18
Default
  #24
Member
 
Yan
Join Date: Dec 2013
Location: Milano
Posts: 42
Rep Power: 12
aparangement is on a distinguished road
Send a message via Skype™ to aparangement
For most motherboards designed for normal computers, boot the system without any memory installed would automatically reset the bios.

Not sure if Epyc motherboard would do the same but you could try.

Quote:
Originally Posted by klove007 View Post
Been troubleshooting all night and cant seem to figure this out, checked out others with similar problems, reviewed all the posts here.

I cannot locate the jumper to reset bios settings and it appears not to be on the latest online manual (even though it makes reference to it)

currently only have the board externally installed with two 7542's. Would not post with 16 x 64gb mem modules on the board

Tried one in C1 CPU1 slot, detects 64gb on boot up, 64 gb in BMC, when it posts (most of the time)

tried two on on C1 and D1 CPU1 slit, dectects 64gb on bootup and 64gb in BMC. when it posts (sometimes)

Sometimes can get to post with 4 on CPU1 and 4 on CPU2 as per the manual, bios is showing 192gb, bmc is showing 1 stick of 64gb.

Ill continue to try different combos to rule out if it is the memory.

Appreciate the insight and help from you all.
aparangement is offline   Reply With Quote

Old   February 19, 2023, 01:25
Default
  #25
Senior Member
 
Dongyue Li
Join Date: Jun 2012
Location: Beijing, China
Posts: 841
Rep Power: 18
sharonyue is on a distinguished road
Am, it is common or even pretty common issue when you try to install a workstation by yourself (the worker in a vendor or, you). I would suggest you install the memory one by one. I assume that you cannot disassemble the CPUs. Therefore you have to install one memory for CPU1 and one for CPU2. If it works, install more, unstill it does work anymore. From my experience (I installed around 1000 workstations like this):

1) If some memories are not detected, two minor reasons: CPU is not seated well, memory is not seated well. Several bad reasons: CPU problem (it loses one or several memory channels), motherboard problem (it loses one or several memory channels), memory problem (but it can be replaced easily).
2) Sometimes it detects all the memories, it reboots automatically and one memory is missing. But this memory is back again after several boots. In this case, this memory is nearly dad. Replace this one. No other issues.
2) All this issues should be handled BEFORE the workstation was shipped to the user. Before shiping, workstations should also be tested with full 100% CPU load for dozons of hours in case of ANY hardware failure. Ideally, you should not be able to aware such issue exist before, since the worker has already fixed.
3) I said this issue is pretty common is that for an experience worker, when they install the server, one would has this kind of problem for 10 workstations. For a new commer, nearly every workstation has this kinda of problem. It just because their installing technique/methodology.
4) If you need help, send me an Email then I can try to help you out.
wkernkamp and klove007 like this.
__________________
My OpenFOAM algorithm website: http://dyfluid.com
By far the largest Chinese CFD-based forum: http://www.cfd-china.com/category/6/openfoam
We provide lots of clusters to Chinese customers, and we are considering to do business overseas: http://dyfluid.com/DMCmodel.html
sharonyue is offline   Reply With Quote

Old   March 16, 2023, 05:27
Default
  #26
New Member
 
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3
klove007 is on a distinguished road
Finally got replacement samsung ram and turns out the issue was the modules of hynix ram either werent supported 100% on the board or the ram was well used and defective.

I do continue to have an issue on reboot where the system does not post/screen stays black after reboot.

Typically a power cycle in the BMC/IMPI or disconnecting the power cables corrects this and the system boots and posts just fine, until the next reboot.
flotus1 likes this.
klove007 is offline   Reply With Quote

Old   March 16, 2023, 07:52
Default
  #27
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,401
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Thanks for keeping us updated. Glad that the memory brought some improvements.

For the warm boot issue, maybe check this: https://forums.servethehome.com/inde...up-fine.39009/
The original poster abandoned the thread, but someone else posted a few settings that allegedly solved the problem on an H12DSI.
Or maybe bump OP there if they have resolved the issue through Supermicro support.
flotus1 is offline   Reply With Quote

Old   March 16, 2023, 16:35
Default
  #28
New Member
 
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3
klove007 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Thanks for keeping us updated. Glad that the memory brought some improvements.

For the warm boot issue, maybe check this: https://forums.servethehome.com/inde...up-fine.39009/
The original poster abandoned the thread, but someone else posted a few settings that allegedly solved the problem on an H12DSI.
Or maybe bump OP there if they have resolved the issue through Supermicro support.
started the day trying with those bios settings, seemed to work once again but now even power cycling seems to have inconsistent results.

bios is updated, next is to try BMC update and raid card update. Never done a BMC update before, is it difficult?
klove007 is offline   Reply With Quote

Old   March 16, 2023, 17:17
Default
  #29
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,401
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Quote:
Originally Posted by klove007 View Post
Never done a BMC update before, is it difficult?
Me neither, so no idea.
This is one of those issues I would ignore, instead of spending a lot of time trying to solve it. How often do you really need to restart the PC once everything is set up...
flotus1 is offline   Reply With Quote

Old   March 16, 2023, 20:25
Default
  #30
New Member
 
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3
klove007 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Me neither, so no idea.
This is one of those issues I would ignore, instead of spending a lot of time trying to solve it. How often do you really need to restart the PC once everything is set up...
once its loaded up, typically rarely except on updates.

My challenge might be that it will be getting the GPU's and drives installed next (could compound issues) then it will be going into a datacenter. I'd rather not have this issue in production...

BMC update complete and it did not resolve the issue...
klove007 is offline   Reply With Quote

Old   March 17, 2023, 02:49
Default
  #31
Super Moderator
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,401
Rep Power: 47
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
I see, that's a different story of course.
Are you already in contact with Supermicro support? I don't expect much from them for a single end user of their products. But still worth a shot.
flotus1 is offline   Reply With Quote

Old   March 17, 2023, 11:43
Default
  #32
New Member
 
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3
klove007 is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
I see, that's a different story of course.
Are you already in contact with Supermicro support? I don't expect much from them for a single end user of their products. But still worth a shot.
I haven't yet, but next step is that. I always like to exhaust my experience and try everything before.

from my experience, most companies end up telling me its my unsupported/untested devices, like 3rd party memory and drives

4 warm reboots this morning and no issue so far, doubt its resolved on my end but will keep you posted on my progress.
klove007 is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
chtMultiRegionSimpleFoam issues - non-conformal meshes & residual handling... manalis OpenFOAM Running, Solving & CFD 3 October 10, 2018 18:53
Convergence issues for Flat plate with sharp edge rajnarayang FLUENT 3 June 20, 2017 12:02
[ANSYS Meshing] Multizone issues (on my project) crenaudo ANSYS Meshing & Geometry 8 April 13, 2016 02:59
Multigrid Stability Issues ThomasHermann SU2 1 November 5, 2014 16:18
[General] Some Paraview Issues I can not solve MR_Chicho ParaView 1 September 24, 2012 05:03


All times are GMT -4. The time now is 20:06.