CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   FLUENT (http://www.cfd-online.com/Forums/fluent/)
-   -   Fluent parallel in CENTOS (http://www.cfd-online.com/Forums/fluent/92509-fluent-parallel-centos.html)

Nikolopoulos September 16, 2011 03:18

Fluent parallel in CENTOS
 
Hi,

I'm running a case with a lot of CPU and RAM needs.
My simulation is transient, pure Eulerina with 5 granular eulerian phases (~20 UDFs).
I have two 4-core machines and i run this simulation as parallel in these two interconnented machines. I'm running my simulations through Fluent 13 text mode.

Everything is ok but after some hours of calculations i get the following message for no obvious reason (the case converges just fine and it seems there is nothing wrong with the calculations).

fluent_mpi.13.0.0: Rank 0:10: MPI_Allreduce: MPI BUG: MPI_Recv: request not done
MPI Application rank 10 exited before MPI_Finalize() with status 1
MPI Application rank 34 exited before MPI_Finalize() with status 0
bash: line 0: kill: (8481) - No such process
bash: line 0: kill: (8482) - No such process

When i read the last autosaved data and continue calculations everything is OK but again after some hours I get the same message and fluent crashes.

In the same machines with simulations that are not so CPU and RAM intensive I have no problem.

I use -mpi=hp, but i also used the -mpi=openmpi (I had the same behavior but a different error message saying that it couldn't allocate 30 Gb of RAM)

The OS is CENTOS, and my RAM is enough for this case.

Any hints?

Thank you in advance!

P.S. I have some user defined memory allocated (C_UDMI()) but as I said I calculated the RAM needs and it is enough even for simulations in one of the servers

Nikolopoulos September 21, 2011 08:23

I installed some updates for centos...

Any hints anyone? Even ideas in what may be wrong will help me

Aris!

m2montazari September 23, 2011 03:27

hi,
I didnt used fluent 13 on linux yet, but for fluent 6.3.26, I had some problems somewhat similar to yours. I had a system with 5 RAMs placed in and one slot was free. I checked and found that the free slot was slot No. 4. so I changed it to slot No. 6(the last slot). then the problem was almost solved! some similar problems with RAM were occurred in university hpc centre.
yours,
mohammad

Nikolopoulos September 26, 2011 03:04

Thank you for your reply!

Following your suggestion, I tested the same case running in one machine only and there are no crash downs. I did that in both machines. So, i don't believe its a memory problem.

I'll try now deleting all UDMI. Does anybody now what is the User Defined Node Memory Locations? I run Fluent parallel in 2 machines and I set this value 0 while User Defined Memory Locations is set as 10 (11 UDMIs)


All times are GMT -4. The time now is 05:00.