CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > FLUENT

parallel fluent runs being killed at partitioing

Register Blogs Community New Posts Updated Threads Search

Like Tree2Likes
  • 2 Post By Razvan

 
 
LinkBack Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
Old   September 23, 2005, 14:51
Default parallel fluent runs being killed at partitioing
  #1
Ben Aga
Guest
 
Posts: n/a
We suddenly have started seeing parallel fluent runs on our cluster die very early in their runs, generally during or right after partitioning.

We are running red hat e3 on a 64-bit opteron cluster. We use a beefy head node to host our runs and farm out the gmpi processes to compute nodes. We user PBSPro as our scheduler. I've got a ticket in to fluent and they are concerned about the OS causing this issue but havent been too specific as to why. PBS's vendor thinks the kernel on the head nodes may be running out of memory and killing these jobs to preserve itself. We've been running fluent jobs in this way for several months with no problems. This issue cropped up Tuesday and intermittently will kill jobs. There seems to be no rhyme or reason to what can run and what cant. It seems like once a job starts iterating, its ok (unless it starts to partition again). Below is the output we get to stdout when these processes are killed. It looks pretty much identical to what happens when someone kills one or more of the mpi processes from the cl when a job is running. Has anyone here runinto this issue themselves or does anybody have any possible culprits?

Thanks, r/ben

--------------------------------------------------

Parallel variables... Building...

grid,

auto partitioning mesh by Principal Axes,

distributing mesh

parts..,

faces..,

nodes..,

cells..,

materials,

interface,

domains,

mixture

liquid-phase

vapor-phase

interaction

zones,

fluid (liquid-phase)

outlet (liquid-phase)

inlet (liquid-phase)

internal.5 (liquid-phase)

symm2 (liquid-phase)

symm1 (liquid-phase)

wall (liquid-phase)

default-interior (liquid-phase)

fluid (vapor-phase)

outlet (vapor-phase)

inlet (vapor-phase)

internal.5 (vapor-phase)

symm2 (vapor-phase)

symm1 (vapor-phase)

wall (vapor-phase)

default-interior (vapor-phase)

default-interior

wall

symm1

symm2

internal.5

inlet

outlet

fluid

parallel,

shell conduction zones, Done.

>
:

> iter continuity x-velocity y-velocity z-velocity k epsilon vf-vapor-pnode 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read..... ... node 999999 retrying on zero socket read..... node 999999 retrying on zero socket read.....

999999 (mpsystem.c@1228): mpt_read: failed: errno = 11

999999: mpt_read: error: read failed trying to read 8 bytes: Resource temporarily unavailable /apps/Fluent/Fluent.Inc/bin/fluent: line 3875: 6678 Killed $NO_RUN $EXE_CMD $MPI_ENABLED_OPTIONS [bt] Execution path: [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(Process_Stackframe+0x17) [0x9f6e97] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(mpt_error+0x109) [0x9e50e9] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(mpt_read+0xc6) [0x9e88b6] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(mpt_tcpip_crecv_raw+0x28) [0x9ea408] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(mpt_tcpip_crecv_all+0x28) [0x9ec948] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(MPT_crecv_double+0x112) [0x9d6ee2] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16 [0x5e81d8] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(Models_Send_update_solve+0xbe) [0x56769e] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(Flow_Iterate+0x19e) [0x4e143e] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16 [0x546788] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(eval+0x773) [0xa27403] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(eval+0x860) [0xa274f0] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(eval+0x460) [0xa270f0] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(eval+0x49a) [0xa2712a] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16 [0xa2873c] [bt] /apps/Fluent/Fluent.Inc/fluent6.2.16/lnamd64/3ddp_host/fluent.6.2.16(eval_errprotect+0x32) [0xa280d2] The fluent process could not be started.

time/iter
  Reply With Quote

 


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel fluent not using all processors specified Paul FLUENT 18 October 26, 2023 03:54
Parallel Error in ANSYS FLUENT 12 zeusxx FLUENT 25 July 17, 2015 04:40
Urgent; parallel processing in fluent 12 Mansureh FLUENT 4 September 25, 2012 11:12
Parallel fluent 4 nodes machine (Quad 6600 SUSE) Rafa FLUENT 4 June 7, 2011 06:33
error parallel fluent session Diet FLUENT 2 January 27, 2005 12:31


All times are GMT -4. The time now is 11:07.