error message in parallel
I run fluent 6.3.26 in parallelm I have 2 pc [2*4core] with 2*8Go Ramm with mpich2 and I hqve this error [you know thanks for help...] my OS xp 64 bits SP2 on each pc [firewall unactivate] switch gigabit ethernet
Checking the status of SMPD for MPICH2 on the local machine ... '\\NT-1BEE049B35EB\codedol20C' job aborted: rank: node: exit code[: error message] 0: NT-1BEE049B35EB: 13 1: NT-1BEE049B35EB: 13 2: NT-1BEE049B35EB: 13: Fatal error in MPI_Barrier: Invalid buffer pointer, error stack: MPI_Barrier(406)....................: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier(76)....................: MPIC_Sendrecv(142)..................: MPID_Isend(97)......................: failure occurred while attempting to send an eager message MPIDI_CH3_iSend(152)................: MPIDI_CH3I_VC_post_connect(540).....: MPIDU_Sock_get_conninfo_from_bc(232): no space for the host description (unknown)(): Invalid buffer pointer 3: NT-1BEE049B35EB: 13: Fatal error in MPI_Barrier: Invalid buffer pointer, error stack: MPI_Barrier(406)....................: MPI_Barrier(MPI_COMM_WORLD) failed MPIR_Barrier(76)....................: MPIC_Sendrecv(142)..................: MPID_Isend(97)......................: failure occurred while attempting to send an eager message MPIDI_CH3_iSend(152)................: MPIDI_CH3I_VC_post_connect(540).....: MPIDU_Sock_get_conninfo_from_bc(232): no space for the host description (unknown)(): Invalid buffer pointer 4: NT-B38516325C1E: 13 5: NT-B38516325C1E: 13 6: NT-B38516325C1E: 13 7: NT-B38516325C1E: 13 Interrupting... Interrupting client... The Parallel FLUENT process could not be started. Done. |
I have partially resolve my problem with the environnement variable:MPICH2_CHANNEL
but now, I have a new error message, I use the model discrete ordinate and I change the DO by scheme to accelerate the calcul. After 3 ou 4 changing of DO --> BOOM ERROR ex: do 1 1 1 1--->ok do 2 2 2 2--->ok do 3 3 3 3--->ok do 4 4 4 4 --->ok do 5 5 5 5 ---> error or other example: do 2 2 2 2 --->ok do 4 4 4 4--->ok do 6 6 6 6--->ok do 8 8 8 8 --->ok do 10 10 10 10 ---> error example for the error message: 170 2.0357e-03 6.7638e-05 6.3893e-05 1.2912e-04 2.6070e-07 2.5162e-05 0:38:56 10 Library "\\NT-1BEE049B35EB\Fluent.Inc\custom\Shared\LIBUDF\win64 \3d_node\libudf.dll" opened 999999 (..\..\src\mpsystem.c@1123): mpt_read: failed: errno = 10054 999999: mpt_read: error: read failed trying to read 4 bytes: No such file or directory job aborted: rank: node: exit code[: error message] 0: NT-1BEE049B35EB: 123 1: NT-1BEE049B35EB: -1073741819: process 1 exited without calling finalize 2: NT-1BEE049B35EB: 123 3: NT-1BEE049B35EB: 123 4: NT-B38516325C1E: 123 5: NT-B38516325C1E: 123 6: NT-B38516325C1E: 123 7: NT-B38516325C1E: 123 The Parallel FLUENT process could not be started. |
All times are GMT -4. The time now is 05:58. |