CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > FLUENT

999999 (../../src/mpsystem.c@1123):mpt_read: failed:errno = 11

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   November 21, 2011, 10:51
Exclamation 999999 (../../src/mpsystem.c@1123):mpt_read: failed:errno = 11
  #1
New Member
 
Giuse
Join Date: Jul 2010
Location: Italy
Posts: 18
Rep Power: 6
UDS_rambler is on a distinguished road
Hi everybody!

I'm facing a serious problem trying to simulate a complex multiphase species transport model within an axialsymmetric domain. To model such a complex problem I'm using 3 different UDFs: 2 imposed as boundary conditions (consumption terms) and 1 executed at the end of each time-step which computes variables to apply to the other two UDFs.
These UDF are correctly compiled (with no mistake) and when I start the simulation, in serial, they work efficiently. The error arises when I start the simulation in parallel. In particular at the end of the first time-step an error pops out:
================================================== ============================
Stack backtrace generated for node id 4 on signal 11 :

================================================== ============================
Stack backtrace generated for node id 5 on signal 11 :
MPI Application rank 4 killed before MPI_Finalize() with signal 11
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
[....]
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....
node 999999 retrying on zero socket read.....

999999 (../../src/mpsystem.c@1123): mpt_read: failed: errno = 11

999999: mpt_read: error: read failed trying to read 4 bytes: Resource temporarily unavailable

I'm running Fluent on a 64-bit linux cluster on 8-processors (lnamd64 architecture) and trying to run the same simulation on a 32-bit linux cluster on 4 processors the error doesn't occur.

The "mpt_read: error: read failed trying to read 4 bytes" message makes me think of a problem of 32 Vs 64 -bit libraries (since 4 bytes are 32 bit) ..
Could you help me??
Simulation is really heavy and I need to run it on more processors I can...

I thank you in advance

UDS_rambler
UDS_rambler is offline   Reply With Quote

Old   November 22, 2011, 10:23
Default
  #2
New Member
 
Ronald A. Lau
Join Date: Jul 2009
Location: Chicago
Posts: 27
Rep Power: 7
ronaldalau is on a distinguished road
Send a message via Skype™ to ronaldalau
We've seen this error.

Our cluster is a 64bit windows HPC system on a GigE network. We've been told by cluster experts that the MPI system is dependent on network Latency, not Bandwidth. A GigE network will have a Latency of ~5ms. An Infiniband network has a latency of microseconds.

We've also been told to use 'Message Passing' for the DPM parallel scheme.

And if you haven't done so, compile your UDFs for 64 bit when running on the 64 bit cluster.

Hope this helps

R.
ronaldalau is offline   Reply With Quote

Old   November 22, 2011, 10:46
Default
  #3
New Member
 
Giuse
Join Date: Jul 2010
Location: Italy
Posts: 18
Rep Power: 6
UDS_rambler is on a distinguished road
Thank you so much Ronald.

I was going mental because of this problem. Now I know I should ask to the personnel in charge of the maintenance of the network in order to obtain the information you cited above. I'll give you a feedback when I learn more about it.

Thank you again

G.
UDS_rambler is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
parallel fluent runs being killed at partitioing Ben Aga FLUENT 3 June 8, 2012 10:40
Error in parallel fluent federica Main CFD Forum 0 November 20, 2011 06:21


All times are GMT -4. The time now is 01:55.