Message truncated, error stack: MPIDI_CH3U_Receive_data_found
Hi Brunos,
I was trying to install OF 2.0.0 in CentOS version 4.x. It did not work. Anyway, I think I would update the OS and install it in the future. However, I have another question regarding running codes in parallel in OF. It appears that my solver runs fine in serial mode, however, when I run it in solver, I received the following error message. Fatal error in MPI_Recv: Message truncated, error stack: MPIDI_CH3U_Receive_data_found(257): Message from rank 2 and tag 1 truncated; 8 bytes received but buffer size is 4 I tried to search for any similar errors encountered by the OF users before, but could not find any good suggestion. Please let me know what is the source for this error and how to fix this issue. Thanks Regards, Vishal |
Hi Vishal,
I suspect that you need to find the correct environment variable for setting the buffer size for the MPI toolbox to use. For example, OpenFOAM sets for Open-MPI the variable "MPI_BUFFER_SIZE": https://github.com/OpenFOAM/OpenFOAM...ttings.sh#L580 Best regards, Bruno |
Hi Brunos,
Could you provide more details on this one. I am not that familiar on how to identify the environment variables to set the buffer size. I did look at the settings.sh file and the MPI_BUFFER_SIZE is set similar to the link you had mentioned. It would be great if you could help me out in this one. Thanks Regards, Vishal |
Hi Vishal,
I've moved this line of conversation from http://www.cfd-online.com/Forums/ope...eleased-4.html to this new thread, because the other one refers to installing OpenFOAM, not regarding running it in parallel ;) As for your problem, I need to know a few things:
Bruno |
Hi Brunos,
Below is the reply for all your questions. 1. Linux distribution: Linux taubh1 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 11:13:47 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux Scientific Linux release 6.1 (Carbon) 2. OpenFoam version: OpenFoam-1.7.1 3. Which MPI toolbox is being used with OpenFOAM? Command: echo $FOAM_MPI (did not work) Command used: echo $FOAM_MPI_LIBBIN message: /home/nandiga1/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/mvapich2-1.6-gcc+ifort 3.2. Check which mpirun is being found: which mpirun /usr/local/mvapich2-1.6-gcc+ifort/bin/mpirun ls -l $(which mpirun) lrwxrwxrwx 1 394298 394298 13 2011-11-18 16:53 /usr/local/mvapich2-1.6-gcc+ifort/bin/mpirun -> mpiexec.hydra 3.3. Check which version of MPI it's being used: mpirun --version HYDRA build details: Version: 1.6rc3 Release Date: unreleased development copy CC: gcc -fpic CXX: g++ -fpic F77: ifort -fpic F90: ifort -fpic Configure options: '--prefix=/usr/local/mvapich2-1.6-gcc+ifort' 'CC=gcc -fpic' 'CXX=g++ -fpic' 'F77=ifort -fpic' 'F90=ifort -fpic' 'FC=ifort -fpic' '--with-mpe' '--enable-sharedlibs=gcc' '--disable-checkerrors' '--with-atomic-primitives=auto_allow_emulation' 'CFLAGS= -DNDEBUG -O2' 'LDFLAGS= ' 'LIBS= -lpthread -libverbs -libumad -ldl -lrt ' 'CPPFLAGS= -I/usr/local/src/mvapich/mvapich2-1.6/src/openpa/src -I/usr/local/src/mvapich/mvapich2-1.6/src/openpa/src' Process Manager: pmi Launchers available: ssh rsh fork slurm ll lsf sge none persist Binding libraries available: hwloc plpa Resource management kernels available: none slurm ll lsf sge pbs Checkpointing libraries available: Demux engines available: poll select 3.4 What exact command are you using for launching the application in parallel? #PBS -q cse #PBS -l nodes=1:ppn=12 #PBS -l walltime=60:30:00 #PBS -j oe #PBS -o simout #PBS -N 2D_circular cd ${PBS_O_WORKDIR} module load mvapich2/1.6-gcc+ifort mpiexec -np 12 circularFoam_full -parallel Hope this would provide you some idea regarding the problem. Thanks Regards, Vishal |
Hi Vishal,
OK, I got a better idea of the system you've got, but no clear notion as to why this error occurs. Some searching online gave me the indication that it could be a memory limitation problem on the machines themselves. In other words, perhaps the mesh is too big for the machines you want to use. Another indication was that there is no way to control the buffer size on mvapich2. I suggest that you do a basic communication test on the cluster, following the instructions given here on how to test if MPI is working: post #4 of "openfoam 1.6 on debian etch", and/or post #19 of "OpenFOAM updates" Then try to run one of OpenFOAM's tutorials in parallel, such as the tutorial "multiphase/interFoam/laminar/damBreak". Best regards, Bruno |
Hello,
I am also getting the same error for my parallel simulations: Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(184).......................: MPI_Recv(buf=0x12e3180, count=21, MPI_PACKED, src=1, tag=1, MPI_COMM_WORLD, status=0x7fff4975d160) failed MPIDI_CH3U_Request_unpack_uebuf(691): Message truncated; 7776 bytes received but buffer size is 21 I am also using mvapich2 on my machine in cluster. But I was running the tutorial on "multiphase/interFoam/laminar/damBreak" and that case is running good in parallel without any error. My domain is 2D and very small (1.5m x 0.4m) and mesh (500x150). I am not sure why I am getting that error for some specific cases. Anyone have the solution? Thanks |
Greetings mmmn036,
After some more researching online, that seems to be a problem with mvapich2 1.9. Which version are you using? Beyond that, my guess is that the problem is related to a wrongly configured shell environment for using mvapich2? Check its manual for more details. There is a way to test running in parallel in OpenFOAM, namely by compiling and using the Test-parallel application. More details available here: Quote:
Bruno |
I run the following command:
which mpirun /opt/apps/intel14/mvapich2/2.0b/bin/mpirun |
Then please the Test-parallel application
|
Quote:
Is there any other way to test parallel in that version of OpenFoam? |
Hi mmmn036,
Sigh... you could have stated that sooner ;) And I had forgotten that foam-extend didn't have the test folder for some reason... OK, run the following commands, for getting and building the application: Code:
mkdir -p $FOAM_RUN If it works, it should output something like this: Code:
Create time Bruno |
Quote:
I ran the following command : Code:
foamJob -p -s parallelTest Code:
Parallel processing using MV2MPI with 16 processors It is still showing the same error while I run in parallel. |
It's strange that nothing got written into the log file...
What happens if you run it like this: Code:
mpirun -np 16 parallelTest -parallel |
1 Attachment(s)
Quote:
Now i got something in my log file which looks similar to the thread you mentioned. Please see the attached log file. But I am still seeing the same error in parallel simulation. |
Quote:
And since you're using mvapich2 2.0b, then it's not a problem related to the version itself. You wrote on your first post on this topic: Quote:
By the way, has you tried using one of the latest versions of OpenFOAM, such as 2.4.0 or 2.4.x, to see if it works with mvapich2? I ask this for the same reason as the above, as this could be a corner case that is not be contemplated in foam-extend 3.1, but might be already contemplated in OpenFOAM. |
3 Attachment(s)
Quote:
Here is the information you asked: Which solver/application are you using? I ask this because there are some settings in "system/fvSchemes" that might help with the problem and usually that depends on the simulation being done. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
OK, with any luck I found the answer that might help your problem, as reported in these two locations:
Code:
export MV2_ON_DEMAND_THRESHOLD=16 I had seen this solution before, but the answer in the second location referred to another error message, which was why I hadn't suggested this before. |
Quote:
I got a answer in other thread http://www.cfd-online.com/Forums/ope...tml#post519793 Following the change in below fix my issue. I can run my cases parallel now. Quote:
|
I'm glad you've found the solution!
Changing the blocking/non-blocking option had already crossed my mind, but it always felt that the issue was on the side of MVAPICH2. The other reason is that I thought that foam-extend was set to non-blocking by default, was because OpenFOAM is like that since at least 1.5.x!? But apparently it was changed in foam-extend without an explicit explanation in the respective commit, on the 2010-09-21 15:32:04... |
Hi everybody,
I am facing a similar problem and I really cannot understand what is going wrong. I am using OpenFOAM 2.2.x. and I am trying to implement the PatchFlowRateInjection of OpenFOAM 2.4.x in my version of the code. The code is working ok if I use the setPositionandCell member function of version 2.2.x. But when I try to use the patchInjectionBase::setPositionandCell of OpenFOAM 2.4.x, which I 've manage to use with success in another injector I 've created without any errors, the code, after running for some timeSteps, it suddenly crashes with the following error: "Fatal error in MPI_Recv: Message truncated, error stack: MPI_Recv(198)...........................: MPI_Recv(buf=0x7ffff2cf26c0, count=4, MPI_PACKED, src=0, tag=1, MPI_COMM_WORLD, status=0x7ffff2cf2650) failed MPIDI_CH3_PktHandler_EagerShortSend(443): Message from rank 0 and tag 1 truncated; 8 bytes received but buffer size is 4". The same code runs ok in serial. I notice that the problem occurs at Pstream::scatter(areaFraction); So one of the processors fails to receive the data from the master proc. commsType is set to nonBlocking. I would really appreciate any help on that. Thanks, Pante |
Quote:
Hi, Pante I'm facing the similar problem. Do you remember how you solved the problem? Thanks, Yanyan |
All times are GMT -4. The time now is 18:45. |