CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   FLUENT (https://www.cfd-online.com/Forums/fluent/)
-   -   What cause the below of fluent calculation in cluster?It just happens abruptly. How t (https://www.cfd-online.com/Forums/fluent/243957-what-cause-below-fluent-calculation-cluster-just-happens-abruptly-how-t.html)

hitzhwan July 14, 2022 04:47

What cause the below of fluent calculation in cluster?It just happens abruptly. How t
 
What cause the below of fluent calculation in cluster?It just happens abruptly. How to solve it ?

Fatal error has happened to some of the processes!
Exiting ...



===============Message from the Cortex Process================================

Fatal error in one of the compute processes.

================================================== ============================

================================================== ============================
Stack backtrace generated for process id 2717 on signal 11 :
*** Error in `fluent': corrupted double-linked list: 0x0000000001f847c0 ***
======= Backtrace: =========
/usr/lib64/libc.so.6(+0x7bd95)[0x2af635affd95]
/usr/lib64/libc.so.6(+0x7de35)[0x2af635b01e35]
/usr/lib64/libc.so.6(__libc_malloc+0x4c)[0x2af635b0387c]
/usr/lib64/libc.so.6(__backtrace_symbols+0x10e)[0x2af635b8e33e]
fluent(print_back_trace_to_file+0x5a)[0x68d76a]
*** Error in `fluent': corrupted double-linked list: 0x0000000001f84760 ***
fluent[0x67f3b9]
/usr/lib64/libc.so.6(+0x35670)[0x2af635ab9670]
/usr/lib64/libc.so.6(+0x38dcd)[0x2af635abcdcd]
======= Backtrace: =========
/usr/lib64/libc.so.6(+0x38eb5)[0x2af635abceb5]
fluent[0x677829]
/usr/lib64/libc.so.6(+0x35670)[0x2af635ab9670]
/usr/lib64/libc.so.6(__select+0x33)[0x2af635b71943]
fluent[0x65eb86]
fluent(lreadf+0x29)[0x6e1b99]
/usr/lib64/libc.so.6(+0x7bd95)[0x2af635affd95]
/usr/lib64/libc.so.6(+0x7cec6)[0x2af635b00ec6]
/opt/application/ansys19/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9bfbc)[0x2af634afafbc]
/opt/application/ansys19/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(+0x9ca27)[0x2af634afba27]
/opt/application/ansys19/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyDict_SetItem+0x67)[0x2af634afd487]
fluent(eval+0x497)[0x6db8d7]
/opt/application/ansys19/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(_PyModule_Clear+0x14c)[0x2af634b015bc]
/opt/application/ansys19/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(PyImport_Cleanup+0x24f)[0x2af634b8288f]
/opt/application/ansys19/v192/fluent/../commonfiles/CPython/2_7_13/linx64/Release/python/lib/libpython2.7.so.1.0(Py_Finalize+0xfe)[0x2af634b948de]
/opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libExpr.so(_ZN13PyInitializerD1Ev+0x6)[0x2af630f3a716]
/usr/lib64/libc.so.6(__cxa_finalize+0x9a)[0x2af635abd1da]
/opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libExpr.so(+0x635c3)[0x2af630ecb5c3]
======= Memory map: ========
00400000-0124d000 r-xp 00000000 00:26 425291452 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/cortex.19.2.0
0144d000-01477000 r--p 00e4d000 00:26 425291452 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/cortex.19.2.0
01477000-014fb000 rw-p 00e77000 00:26 425291452 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/cortex.19.2.0
014fb000-0164d000 rw-p 00000000 00:00 0
01eb2000-022cb000 rw-p 00000000 00:00 0 [heap]
2af62e099000-2af62e0ba000 r-xp 00000000 08:03 17043974 /usr/lib64/ld-2.17.so
2af62e0ba000-2af62e219000 rw-p 00000000 00:00 0
2af62e219000-2af62e220000 r--s 00000000 08:03 17305077 /usr/lib64/gconv/gconv-modules.cache
2af62e220000-2af62e299000 rw-p 00000000 00:00 0
2af62e29a000-2af62e29b000 rw-p 00000000 00:00 0
2af62e2ba000-2af62e2bb000 r--p 00021000 08:03 17043974 /usr/lib64/ld-2.17.so
2af62e2bb000-2af62e2bc000 rw-p 00022000 08:03 17043974 /usr/lib64/ld-2.17.so
2af62e2bc000-2af62e2bd000 rw-p 00000000 00:00 0
2af62e2bd000-2af62e550000 r-xp 00000000 00:26 867021368 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libimf.so
2af62e550000-2af62e74f000 ---p 00293000 00:26 867021368 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libimf.so
2af62e74f000-2af62e755000 r--p 00292000 00:26 867021368 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libimf.so
2af62e755000-2af62e7aa000 rw-p 00298000 00:26 867021368 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libimf.so
2af62e7aa000-2af62f489000 r-xp 00000000 00:26 867021450 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libsvml.so
2af62f489000-2af62f688000 ---p 00cdf000 00:26 867021450 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libsvml.so
2af62f688000-2af62f6c3000 r--p 00cde000 00:26 867021450 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libsvml.so
2af62f6c3000-2af62f6c8000 rw-p 00d19000 00:26 867021450 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libsvml.so
2af62f6c8000-2af62f730000 r-xp 00000000 00:26 867021370 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libintlc.so.5
2af62f730000-2af62f930000 ---p 00068000 00:26 867021370 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libintlc.so.5
2af62f930000-2af62f931000 r--p 00068000 00:26 867021370 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libintlc.so.5
2af62f931000-2af62f932000 rw-p 00069000 00:26 867021370 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libintlc.so.5
2af62f932000-2af62f933000 rw-p 00000000 00:00 0
2af62f933000-2af62fa92000 r-xp 00000000 00:26 867021443 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libirng.so
2af62fa92000-2af62fc92000 ---p 0015f000 00:26 867021443 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libirng.so
2af62fc92000-2af62fc93000 r--p 0015f000 00:26 867021443 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libirng.so
2af62fc93000-2af62fca6000 rw-p 00160000 00:26 867021443 /opt/application/ansys19/v192/tp/IntelCompiler/2017.6.256/linx64/lib/intel64/libirng.so
2af62fca6000-2af62fcbc000 r-xp 00000000 08:03 17044007 /usr/lib64/libpthread-2.17.so
2af62fcbc000-2af62febc000 ---p 00016000 08:03 17044007 /usr/lib64/libpthread-2.17.so
2af62febc000-2af62febd000 r--p 00016000 08:03 17044007 /usr/lib64/libpthread-2.17.so
2af62febd000-2af62febe000 rw-p 00017000 08:03 17044007 /usr/lib64/libpthread-2.17.so
2af62febe000-2af62fec2000 rw-p 00000000 00:00 0
2af62fec2000-2af62fee8000 r-xp 00000000 00:26 425291467 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libCxHoops.so
2af62fee8000-2af6300e7000 ---p 00026000 00:26 425291467 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libCxHoops.so
2af6300e7000-2af6300e8000 r--p 00025000 00:26 425291467 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libCxHoops.so
2af6300e8000-2af6300e9000 rw-p 00026000 00:26 425291467 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libCxHoops.so
2af6300e9000-2af630517000 r-xp 00000000 00:26 425291458 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libStateEngine.so
2af630517000-2af630717000 ---p 0042e000 00:26 425291458 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libStateEngine.so
2af630717000-2af630719000 r--p 0042e000 00:26 425291458 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libStateEngine.so
2af630719000-2af630733000 rw-p 00430000 00:26 425291458 /opt/application/ansys19/v192/fluent/fluent19.2.0/cortex/lnamd64/libStateEngine.so
2af630733000-2af630734000 rw-p 00000000 00:00 0
2af630734000-2af6307b9000 r-xp 00000000 00:26 425291459

LuckyTran July 14, 2022 11:42

This is the libc version of a segmentation fault which means it tried to access memory and then couldn't. Frankly this could be caused by anything. Maybe someone unplugged your RAM or spilt coffee on it. Or maybe you have code that tries to access variables that haven't been declared yet.

hitzhwan July 14, 2022 21:24

Hello, do you use the cluster, do you find the cluster is faster than the single pc?
 
Quote:

Originally Posted by LuckyTran (Post 831693)
This is the libc version of a segmentation fault which means it tried to access memory and then couldn't. Frankly this could be caused by anything. Maybe someone unplugged your RAM or spilt coffee on it. Or maybe you have code that tries to access variables that haven't been declared yet.

Thank you, do you use the cluster, do you find the cluster is faster than the single pc?

LuckyTran July 14, 2022 21:46

Yes I use a cluster. And I employ just a tiny bit of common sense when I do. Most problems that I run on a cluster don't fit on one PC. So yes, it's infinitely faster.

hitzhwan July 14, 2022 21:53

what is the reason that it does not fit on one pc? why is it faster than the pc?
 
Quote:

Originally Posted by LuckyTran (Post 831721)
Yes I use a cluster. And I employ just a tiny bit of common sense when I do. Most problems that I run on a cluster don't fit on one PC. So yes, it's infinitely faster.

what is the reason that it does not fit on one pc? why is it faster than the pc?

LuckyTran July 14, 2022 22:04

Do you still have a Pentium CPU or do you have a modern multi-core CPU? Applications in general can run with more throughput via multithreading. Clusters are just massive versions of that.

I often need upwards of 200 GB of RAM to open my model. I only have 96 GB of RAM on my workstation. Even my smaller models that I can open on my workstation would take weeks to run. I run it on a cluster to get results on the same day.

hitzhwan July 15, 2022 03:51

Hi,my workstation has a CPU Xeon platinum 8273CL,what about your workstation and clu
 
Quote:

Originally Posted by LuckyTran (Post 831725)
Do you still have a Pentium CPU or do you have a modern multi-core CPU? Applications in general can run with more throughput via multithreading. Clusters are just massive versions of that.

I often need upwards of 200 GB of RAM to open my model. I only have 96 GB of RAM on my workstation. Even my smaller models that I can open on my workstation would take weeks to run. I run it on a cluster to get results on the same day.

Hi,my workstation has a CPU Xeon platinum 8273CL,what about your workstation and cluster?
if the cluster and the workstation have the same hardware, does they have the same speed?

LuckyTran July 15, 2022 11:15

If you have two computers with the same hardware not running at the same speed then either one is defective or you really did something wrong.

But instead of asking about how others are not having issues how about provide details relevant to yourself that might elucidate the issues you are having? That would be more helpful to you.

hitzhwan July 16, 2022 22:29

sorry, Maybe I do not have declare the detail. The cluster has a CPU of E5-2605v4, an
 
Quote:

Originally Posted by LuckyTran (Post 831776)
If you have two computers with the same hardware not running at the same speed then either one is defective or you really did something wrong.

But instead of asking about how others are not having issues how about provide details relevant to yourself that might elucidate the issues you are having? That would be more helpful to you.

sorry, Maybe I do not have declare the detail. The cluster has a CPU of E5-2605v4, another pc has a CPU of 6226R, we both use 24 cores to calculate, all the other calculation model are the same, I find the speed of the cluster is 2 times faster than another pc, what is the reason?

LuckyTran July 17, 2022 01:45

An E5-2650V4 has 12 cores, 24 hyperthreaded. A 6226R has 16 cores.


If you use 24 cores on an E5-2650V4, you'll likely saturate the cpu and it runs at 100%. If you use 24 cores on a 6226R, it will be very suboptimal. The first block of 16 will do their work, and then the remaining 8 must wait for the first block of 16 to finish. Not only does this guarantee you have at least 25% idle time, it doubles the number of cpu cycles needed to complete 1 iteration.


Since I can't really tell what else might be wrong, I recommend to run at less than capacity. I.e. 11 cores on both machines to do a fairer comparison.


All times are GMT -4. The time now is 20:32.