|
[Sponsors] |
February 8, 2023, 19:34 |
|
#21 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 343
Rep Power: 13 |
There is an amibios post code manual on the SuperMicro website. The one I found is applicable to boards up to the X11 generation. Not sure if there were any changes since. Code 94 indicates "PCI Bus Enumeration". So it might be worth it to check if there are any other pci devices in your system besides the Nvme SSD that you already removed and reinstalled.
The problem is almost never with the CPU. The RDIMMs usually are fine too. Even the up-clocked components from China did work at the up-clocked speed. The sellers didn't realize they were dealing with Sherlock Holmes at the time, haha. Memory problems in most cases are caused by the motherboard. Number one cause are bent pins in the socket. This can sometimes be addressed by carefully straightening pins. Use your phone to make a good photo to inspect at your leasure. If you find any issue, you send the photo to the seller and, if you are up to it, ask if they want you to attempt to straighten a pin. I have done this approach with a cooperative seller. Very satisfying when it succeeded. However, I think you are getting to the point that you are entitled to return the motherboard (in my opinion.) |
|
February 8, 2023, 19:42 |
|
#22 | |
New Member
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3 |
Quote:
I have since removed the NVME (even though I dont think it was causing issues, just incase) I have narrowed it down to 7 dimms that work in the slots that the other dimms dont... so I believe its not the motherboard. I have 5 that seem to be install in the empty slots but not detected and 4 that cause the system not to post. I am going to try them in all the other slots tonight and see if I can get past 448gb posting. 90 secs per dimm test per slot. Is the post time to the bios when it works. I have cleaned the ones that do not post as well to ensure contacts Last edited by klove007; February 8, 2023 at 20:50. |
||
February 8, 2023, 23:50 |
|
#23 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 343
Rep Power: 13 |
Nice progress!
Once you have a known configuration that works, the memory training time will probably reduce for subsequent boots. Is there anything special about the DIMMs that do not work? |
|
February 14, 2023, 01:18 |
|
#24 | |
Member
|
For most motherboards designed for normal computers, boot the system without any memory installed would automatically reset the bios.
Not sure if Epyc motherboard would do the same but you could try. Quote:
|
||
February 19, 2023, 01:25 |
|
#25 |
Senior Member
Dongyue Li
Join Date: Jun 2012
Location: Beijing, China
Posts: 841
Rep Power: 18 |
Am, it is common or even pretty common issue when you try to install a workstation by yourself (the worker in a vendor or, you). I would suggest you install the memory one by one. I assume that you cannot disassemble the CPUs. Therefore you have to install one memory for CPU1 and one for CPU2. If it works, install more, unstill it does work anymore. From my experience (I installed around 1000 workstations like this):
1) If some memories are not detected, two minor reasons: CPU is not seated well, memory is not seated well. Several bad reasons: CPU problem (it loses one or several memory channels), motherboard problem (it loses one or several memory channels), memory problem (but it can be replaced easily). 2) Sometimes it detects all the memories, it reboots automatically and one memory is missing. But this memory is back again after several boots. In this case, this memory is nearly dad. Replace this one. No other issues. 2) All this issues should be handled BEFORE the workstation was shipped to the user. Before shiping, workstations should also be tested with full 100% CPU load for dozons of hours in case of ANY hardware failure. Ideally, you should not be able to aware such issue exist before, since the worker has already fixed. 3) I said this issue is pretty common is that for an experience worker, when they install the server, one would has this kind of problem for 10 workstations. For a new commer, nearly every workstation has this kinda of problem. It just because their installing technique/methodology. 4) If you need help, send me an Email then I can try to help you out.
__________________
My OpenFOAM algorithm website: http://dyfluid.com By far the largest Chinese CFD-based forum: http://www.cfd-china.com/category/6/openfoam We provide lots of clusters to Chinese customers, and we are considering to do business overseas: http://dyfluid.com/DMCmodel.html |
|
March 16, 2023, 05:27 |
|
#26 |
New Member
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3 |
Finally got replacement samsung ram and turns out the issue was the modules of hynix ram either werent supported 100% on the board or the ram was well used and defective.
I do continue to have an issue on reboot where the system does not post/screen stays black after reboot. Typically a power cycle in the BMC/IMPI or disconnecting the power cables corrects this and the system boots and posts just fine, until the next reboot. |
|
March 16, 2023, 07:52 |
|
#27 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,401
Rep Power: 47 |
Thanks for keeping us updated. Glad that the memory brought some improvements.
For the warm boot issue, maybe check this: https://forums.servethehome.com/inde...up-fine.39009/ The original poster abandoned the thread, but someone else posted a few settings that allegedly solved the problem on an H12DSI. Or maybe bump OP there if they have resolved the issue through Supermicro support. |
|
March 16, 2023, 16:35 |
|
#28 | |
New Member
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3 |
Quote:
bios is updated, next is to try BMC update and raid card update. Never done a BMC update before, is it difficult? |
||
March 16, 2023, 17:17 |
|
#29 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,401
Rep Power: 47 |
||
March 16, 2023, 20:25 |
|
#30 | |
New Member
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3 |
Quote:
My challenge might be that it will be getting the GPU's and drives installed next (could compound issues) then it will be going into a datacenter. I'd rather not have this issue in production... BMC update complete and it did not resolve the issue... |
||
March 17, 2023, 02:49 |
|
#31 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,401
Rep Power: 47 |
I see, that's a different story of course.
Are you already in contact with Supermicro support? I don't expect much from them for a single end user of their products. But still worth a shot. |
|
March 17, 2023, 11:43 |
|
#32 | |
New Member
QC
Join Date: Feb 2023
Posts: 16
Rep Power: 3 |
Quote:
from my experience, most companies end up telling me its my unsupported/untested devices, like 3rd party memory and drives 4 warm reboots this morning and no issue so far, doubt its resolved on my end but will keep you posted on my progress. |
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
chtMultiRegionSimpleFoam issues - non-conformal meshes & residual handling... | manalis | OpenFOAM Running, Solving & CFD | 3 | October 10, 2018 18:53 |
Convergence issues for Flat plate with sharp edge | rajnarayang | FLUENT | 3 | June 20, 2017 12:02 |
[ANSYS Meshing] Multizone issues (on my project) | crenaudo | ANSYS Meshing & Geometry | 8 | April 13, 2016 02:59 |
Multigrid Stability Issues | ThomasHermann | SU2 | 1 | November 5, 2014 16:18 |
[General] Some Paraview Issues I can not solve | MR_Chicho | ParaView | 1 | September 24, 2012 05:03 |