CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Message truncated, error stack: MPIDI_CH3U_Receive_data_found (https://www.cfd-online.com/Forums/openfoam-solving/125655-message-truncated-error-stack-mpidi_ch3u_receive_data_found.html)

nandiganavishal October 27, 2013 16:05

Message truncated, error stack: MPIDI_CH3U_Receive_data_found
 
Hi Brunos,

I was trying to install OF 2.0.0 in CentOS version 4.x. It did not work. Anyway, I think I would update the OS and install it in the future.

However, I have another question regarding running codes in parallel in OF. It appears that my solver runs fine in serial mode, however, when I run it in solver, I received the following error message.

Fatal error in MPI_Recv:
Message truncated, error stack:
MPIDI_CH3U_Receive_data_found(257): Message from rank 2 and tag 1 truncated; 8 bytes received but buffer size is 4


I tried to search for any similar errors encountered by the OF users before, but could not find any good suggestion. Please let me know what is the source for this error and how to fix this issue.

Thanks

Regards,
Vishal

wyldckat October 28, 2013 02:47

Hi Vishal,

I suspect that you need to find the correct environment variable for setting the buffer size for the MPI toolbox to use. For example, OpenFOAM sets for Open-MPI the variable "MPI_BUFFER_SIZE": https://github.com/OpenFOAM/OpenFOAM...ttings.sh#L580

Best regards,
Bruno

nandiganavishal October 28, 2013 10:28

Hi Brunos,

Could you provide more details on this one. I am not that familiar on how to identify the environment variables to set the buffer size. I did look at the settings.sh file and the MPI_BUFFER_SIZE is set similar to the link you had mentioned. It would be great if you could help me out in this one.

Thanks

Regards,
Vishal

wyldckat October 28, 2013 16:12

Hi Vishal,

I've moved this line of conversation from http://www.cfd-online.com/Forums/ope...eleased-4.html to this new thread, because the other one refers to installing OpenFOAM, not regarding running it in parallel ;)

As for your problem, I need to know a few things:
  1. On which Linux Distribution are you trying to run OpenFOAM in parallel?
  2. Which OpenFOAM version are you using for this?
  3. Which MPI toolbox is being used with OpenFOAM? To ascertain this:
    1. Check which MPI is chosen on OpenFOAM's environment:
      Code:

      echo $FOAM_MPI
    2. Check which mpirun is being found:
      Code:

      which mpirun
      ls -l $(which mpirun)

    3. Check which version of MPI it's being used:
      Code:

      mpirun --version
  4. What exact command are you using for launching the application in parallel?
Best regards,
Bruno

nandiganavishal October 29, 2013 11:38

Hi Brunos,

Below is the reply for all your questions.

1. Linux distribution:

Linux taubh1 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 11:13:47 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Scientific Linux release 6.1 (Carbon)

2. OpenFoam version: OpenFoam-1.7.1

3. Which MPI toolbox is being used with OpenFOAM?

Command: echo $FOAM_MPI (did not work)

Command used: echo $FOAM_MPI_LIBBIN
message: /home/nandiga1/OpenFOAM/OpenFOAM-1.7.1/lib/linux64GccDPOpt/mvapich2-1.6-gcc+ifort

3.2. Check which mpirun is being found:

which mpirun
/usr/local/mvapich2-1.6-gcc+ifort/bin/mpirun

ls -l $(which mpirun)
lrwxrwxrwx 1 394298 394298 13 2011-11-18 16:53 /usr/local/mvapich2-1.6-gcc+ifort/bin/mpirun -> mpiexec.hydra

3.3. Check which version of MPI it's being used:

mpirun --version

HYDRA build details:
Version: 1.6rc3
Release Date: unreleased development copy
CC: gcc -fpic
CXX: g++ -fpic
F77: ifort -fpic
F90: ifort -fpic
Configure options: '--prefix=/usr/local/mvapich2-1.6-gcc+ifort' 'CC=gcc -fpic' 'CXX=g++ -fpic' 'F77=ifort -fpic' 'F90=ifort -fpic' 'FC=ifort -fpic' '--with-mpe' '--enable-sharedlibs=gcc' '--disable-checkerrors' '--with-atomic-primitives=auto_allow_emulation' 'CFLAGS= -DNDEBUG -O2' 'LDFLAGS= ' 'LIBS= -lpthread -libverbs -libumad -ldl -lrt ' 'CPPFLAGS= -I/usr/local/src/mvapich/mvapich2-1.6/src/openpa/src -I/usr/local/src/mvapich/mvapich2-1.6/src/openpa/src'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge none persist
Binding libraries available: hwloc plpa
Resource management kernels available: none slurm ll lsf sge pbs
Checkpointing libraries available:
Demux engines available: poll select

3.4 What exact command are you using for launching the application in parallel?

#PBS -q cse
#PBS -l nodes=1:ppn=12
#PBS -l walltime=60:30:00
#PBS -j oe
#PBS -o simout
#PBS -N 2D_circular
cd ${PBS_O_WORKDIR}
module load mvapich2/1.6-gcc+ifort
mpiexec -np 12 circularFoam_full -parallel

Hope this would provide you some idea regarding the problem.

Thanks

Regards,
Vishal

wyldckat November 2, 2013 16:24

Hi Vishal,

OK, I got a better idea of the system you've got, but no clear notion as to why this error occurs.

Some searching online gave me the indication that it could be a memory limitation problem on the machines themselves. In other words, perhaps the mesh is too big for the machines you want to use.

Another indication was that there is no way to control the buffer size on mvapich2.

I suggest that you do a basic communication test on the cluster, following the instructions given here on how to test if MPI is working: post #4 of "openfoam 1.6 on debian etch", and/or post #19 of "OpenFOAM updates"

Then try to run one of OpenFOAM's tutorials in parallel, such as the tutorial "multiphase/interFoam/laminar/damBreak".

Best regards,
Bruno

mmmn036 August 19, 2015 01:21

Hello,

I am also getting the same error for my parallel simulations:

Fatal error in MPI_Recv:
Message truncated, error stack:
MPI_Recv(184).......................: MPI_Recv(buf=0x12e3180, count=21, MPI_PACKED, src=1, tag=1, MPI_COMM_WORLD, status=0x7fff4975d160) failed
MPIDI_CH3U_Request_unpack_uebuf(691): Message truncated; 7776 bytes received but buffer size is 21

I am also using mvapich2 on my machine in cluster.

But I was running the tutorial on "multiphase/interFoam/laminar/damBreak" and that case is running good in parallel without any error.

My domain is 2D and very small (1.5m x 0.4m) and mesh (500x150).

I am not sure why I am getting that error for some specific cases.

Anyone have the solution?

Thanks

wyldckat August 19, 2015 07:58

Greetings mmmn036,

After some more researching online, that seems to be a problem with mvapich2 1.9. Which version are you using?
Beyond that, my guess is that the problem is related to a wrongly configured shell environment for using mvapich2? Check its manual for more details.

There is a way to test running in parallel in OpenFOAM, namely by compiling and using the Test-parallel application. More details available here:
Quote:

On how to test if MPI is working: post #4 of "openfoam 1.6 on debian etch", and/or post #19 of "OpenFOAM updates" - Note: As of OpenFOAM 2.0.0, the application "parallelTest" is now called "Test-parallel".
Best regards,
Bruno

mmmn036 August 19, 2015 11:44

I run the following command:

which mpirun
/opt/apps/intel14/mvapich2/2.0b/bin/mpirun

wyldckat August 19, 2015 16:07

Then please the Test-parallel application

mmmn036 August 19, 2015 18:11

Quote:

Originally Posted by wyldckat (Post 560263)
Then please the Test-parallel application

I have installed foam-extend-3.1, where I could not find the "test/parallel" file.

Is there any other way to test parallel in that version of OpenFoam?

wyldckat August 20, 2015 13:16

Hi mmmn036,

Sigh... you could have stated that sooner ;) And I had forgotten that foam-extend didn't have the test folder for some reason...

OK, run the following commands, for getting and building the application:
Code:

mkdir -p $FOAM_RUN
cd $FOAM_RUN
mkdir parallelTest
wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/parallelTest.C
mkdir Make
cd Make/
wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/Make/options
wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/Make/files

Then run the parallelTest application the same way you ran the solver and in the same case folder.
If it works, it should output something like this:
Code:

Create  time

[1]
Starting transfers
[1]
[1] slave sending to master 0
[1] slave receiving from master 0
[0]
Starting transfers
[0]
[0] master receiving from slave 1
[0] (0 1 2)
[0] master sending to slave 1
End

[1] (0 1 2)
Finalising parallel run

Best regards,
Bruno

mmmn036 August 20, 2015 19:34

Quote:

Originally Posted by wyldckat (Post 560394)
Hi mmmn036,

Sigh... you could have stated that sooner ;) And I had forgotten that foam-extend didn't have the test folder for some reason...

OK, run the following commands, for getting and building the application:
Code:

mkdir -p $FOAM_RUN
cd $FOAM_RUN
mkdir parallelTest
wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/parallelTest.C
mkdir Make
cd Make/
wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/Make/options
wget https://raw.githubusercontent.com/OpenCFD/OpenFOAM-1.7.x/master/applications/test/parallel/Make/files

Then run the parallelTest application the same way you ran the solver and in the same case folder.
If it works, it should output something like this:
Code:

Create  time

[1]
Starting transfers
[1]
[1] slave sending to master 0
[1] slave receiving from master 0
[0]
Starting transfers
[0]
[0] master receiving from slave 1
[0] (0 1 2)
[0] master sending to slave 1
End

[1] (0 1 2)
Finalising parallel run

Best regards,
Bruno

I followed the instructions and complied the parallelTest.
I ran the following command :
Code:

foamJob -p -s parallelTest
and gor the following results:
Code:

Parallel processing using MV2MPI with 16 processors
Executing: mpirun -np 16 /work/02813/jzb292/foam/foam-extend-3.1/bin/foamExec parallelTest -parallel | tee log

Nothing is written in the "log" file.

It is still showing the same error while I run in parallel.

wyldckat August 20, 2015 20:53

It's strange that nothing got written into the log file...

What happens if you run it like this:
Code:

mpirun -np 16 parallelTest -parallel

mmmn036 August 20, 2015 22:25

1 Attachment(s)
Quote:

Originally Posted by wyldckat (Post 560444)
It's strange that nothing got written into the log file...

What happens if you run it like this:
Code:

mpirun -np 16 parallelTest -parallel

In my system I had to run "ibrun" instead of "mpirun" for parallel:

Now i got something in my log file which looks similar to the thread you mentioned. Please see the attached log file.

But I am still seeing the same error in parallel simulation.

wyldckat August 21, 2015 11:28

Quote:

Originally Posted by mmmn036 (Post 560448)
Now i got something in my log file which looks similar to the thread you mentioned. Please see the attached log file.

But I am still seeing the same error in parallel simulation.

OK, this was to try and figure out the origin of the problem. It seems that the problem has to do with the specific cases that you are having problems with, because the error is not triggered with the test applications, nor with other cases you tested.
And since you're using mvapich2 2.0b, then it's not a problem related to the version itself.

You wrote on your first post on this topic:
Quote:

Originally Posted by mmmn036 (Post 560133)
I am also using mvapich2 on my machine in cluster.

But I was running the tutorial on "multiphase/interFoam/laminar/damBreak" and that case is running good in parallel without any error.

My domain is 2D and very small (1.5m x 0.4m) and mesh (500x150).

Then the mesh has 75000 cells. A few questions:
  1. In how many sub-domains are you dividing the mesh and what decomposition settings are you using?
    • In other words: please provide the "system/decomposeParDict" file you're using, so that we can see what settings you're using.
    • If you're using 16 sub-domains, as reported by the test, then each sub-domain as around 4687 cells each. Can you please also provide the text output that decomposePar gave you? Because that tells us how many cells and faces were assigned to each sub-domain; there might be one or two sub-domains that have very few cells being shared.
  2. How is the case distributed among the various machines on the cluster? In other words, is the case running in a single machine or 2, 3 or 4 machines?
  3. Does your mesh have any special boundary conditions? Such as baffles or cyclic patches?
  4. Which solver/application are you using? I ask this because there are some settings in "system/fvSchemes" that might help with the problem and usually that depends on the simulation being done.
I ask all of these questions because the error message you're getting is related to an improperly defined memory size on the receiving side on one of the processes. Some MPI toolboxes don't have this issue, because they use some explicit or implicit size reset/definition, but some other MPI toolboxes don't take into account some corner case uses. Therefore, this is usually related to a situation which is not contemplated in the standard operation of OpenFOAM or foam-extend.

By the way, has you tried using one of the latest versions of OpenFOAM, such as 2.4.0 or 2.4.x, to see if it works with mvapich2? I ask this for the same reason as the above, as this could be a corner case that is not be contemplated in foam-extend 3.1, but might be already contemplated in OpenFOAM.

mmmn036 August 22, 2015 03:03

3 Attachment(s)
Quote:

Originally Posted by wyldckat (Post 560540)
OK, this was to try and figure out the origin of the problem. It seems that the problem has to do with the specific cases that you are having problems with, because the error is not triggered with the test applications, nor with other cases you tested.
And since you're using mvapich2 2.0b, then it's not a problem related to the version itself.

You wrote on your first post on this topic:

Then the mesh has 75000 cells. A few questions:
  1. In how many sub-domains are you dividing the mesh and what decomposition settings are you using?
    • In other words: please provide the "system/decomposeParDict" file you're using, so that we can see what settings you're using.
    • If you're using 16 sub-domains, as reported by the test, then each sub-domain as around 4687 cells each. Can you please also provide the text output that decomposePar gave you? Because that tells us how many cells and faces were assigned to each sub-domain; there might be one or two sub-domains that have very few cells being shared.
  2. How is the case distributed among the various machines on the cluster? In other words, is the case running in a single machine or 2, 3 or 4 machines?
  3. Does your mesh have any special boundary conditions? Such as baffles or cyclic patches?
  4. Which solver/application are you using? I ask this because there are some settings in "system/fvSchemes" that might help with the problem and usually that depends on the simulation being done.
I ask all of these questions because the error message you're getting is related to an improperly defined memory size on the receiving side on one of the processes. Some MPI toolboxes don't have this issue, because they use some explicit or implicit size reset/definition, but some other MPI toolboxes don't take into account some corner case uses. Therefore, this is usually related to a situation which is not contemplated in the standard operation of OpenFOAM or foam-extend.

By the way, has you tried using one of the latest versions of OpenFOAM, such as 2.4.0 or 2.4.x, to see if it works with mvapich2? I ask this for the same reason as the above, as this could be a corner case that is not be contemplated in foam-extend 3.1, but might be already contemplated in OpenFOAM.

First of All, I appreciate your support on this issue.
Here is the information you asked:

Which solver/application are you using? I ask this because there are some settings in "system/fvSchemes" that might help with the problem and usually that depends on the simulation being done.
Quote:

I am using a modified buoyanBoussiesqPimpleFoam solver with Immersed Boundary treatment that has been implemented by Hrjasak. The solver is working good in my local work station with 16 processor. But it is giving me error for parallel run in cluster. I have also included the error message file below
In how many sub-domains are you dividing the mesh and what decomposition settings are you using?
Quote:

the mesh size in my domain is now 600X80X1. I am testing for 16 subdomains, but I also tried with 128 or 256 subdomains which gives me the same error. I attached the decomposeparDic below
If you're using 16 sub-domains, as reported by the test, then each sub-domain as around 4687 cells each. Can you please also provide the text output that decomposePar gave you?
Quote:

I have attached the log file for decomposePar run results
How is the case distributed among the various machines on the cluster? In other words, is the case running in a single machine or 2, 3 or 4 machines?
Quote:

In the cluster, each node has 16 processor
[*]Does your mesh have any special boundary conditions? Such as baffles or cyclic patches?
Quote:

I have immersed boundary for all the fields. No cyclic boundary condition, but I will use cyclic boundary for 3D simulation later on.
By the way, has you tried using one of the latest versions of OpenFOAM, such as 2.4.0 or 2.4.x, to see if it works with mvapich2?
Quote:

Since Immersed boundary method platform only implemented on foam-extend, I couldn't used that solver in OpenFoam 2.4 or 2.4.x. But all other cases is running ok in parallel in OpnFoam-2.3 on the same cluster.
Thanks

wyldckat August 22, 2015 08:25

OK, with any luck I found the answer that might help your problem, as reported in these two locations:
As reported in both posts, try running this command before running in parallel with 16 cores in the cluster:
Code:

export MV2_ON_DEMAND_THRESHOLD=16
When you need to run with more cores, change the 16 to the number you need.


I had seen this solution before, but the answer in the second location referred to another error message, which was why I hadn't suggested this before.

mmmn036 August 22, 2015 12:50

Quote:

Originally Posted by wyldckat (Post 560636)
OK, with any luck I found the answer that might help your problem, as reported in these two locations:
As reported in both posts, try running this command before running in parallel with 16 cores in the cluster:
Code:

export MV2_ON_DEMAND_THRESHOLD=16
When you need to run with more cores, change the 16 to the number you need.


I had seen this solution before, but the answer in the second location referred to another error message, which was why I hadn't suggested this before.

This doesn't fix my error. I appreciate your support a lot though.
I got a answer in other thread http://www.cfd-online.com/Forums/ope...tml#post519793

Following the change in below fix my issue. I can run my cases parallel now.
Quote:

Edit $WM_PROJECT_DIR/etc/controlDict and change commsType to nonBlocking .
Thanks

wyldckat August 22, 2015 13:24

I'm glad you've found the solution!

Changing the blocking/non-blocking option had already crossed my mind, but it always felt that the issue was on the side of MVAPICH2.

The other reason is that I thought that foam-extend was set to non-blocking by default, was because OpenFOAM is like that since at least 1.5.x!? But apparently it was changed in foam-extend without an explicit explanation in the respective commit, on the 2010-09-21 15:32:04...

pante June 24, 2016 04:51

Hi everybody,

I am facing a similar problem and I really cannot understand what is going wrong. I am using OpenFOAM 2.2.x. and I am trying to implement the PatchFlowRateInjection of OpenFOAM 2.4.x in my version of the code.
The code is working ok if I use the setPositionandCell member function of version 2.2.x. But when I try to use the patchInjectionBase::setPositionandCell of
OpenFOAM 2.4.x, which I 've manage to use with success in another injector I 've created without any errors,
the code, after running for some timeSteps, it suddenly crashes with the following error:
"Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(198)...........................: MPI_Recv(buf=0x7ffff2cf26c0, count=4, MPI_PACKED, src=0, tag=1, MPI_COMM_WORLD, status=0x7ffff2cf2650) failed
MPIDI_CH3_PktHandler_EagerShortSend(443): Message from rank 0 and tag 1 truncated; 8 bytes received but buffer size is 4".


The same code runs ok in serial. I notice that the problem occurs at
Pstream::scatter(areaFraction);
So one of the processors fails to receive the data from the master proc.
commsType is set to nonBlocking.

I would really appreciate any help on that.
Thanks, Pante

imastrid November 19, 2021 01:20

Quote:

Originally Posted by pante (Post 606400)
Hi everybody,

I am facing a similar problem and I really cannot understand what is going wrong. I am using OpenFOAM 2.2.x. and I am trying to implement the PatchFlowRateInjection of OpenFOAM 2.4.x in my version of the code.
The code is working ok if I use the setPositionandCell member function of version 2.2.x. But when I try to use the patchInjectionBase::setPositionandCell of
OpenFOAM 2.4.x, which I 've manage to use with success in another injector I 've created without any errors,
the code, after running for some timeSteps, it suddenly crashes with the following error:
"Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(198)...........................: MPI_Recv(buf=0x7ffff2cf26c0, count=4, MPI_PACKED, src=0, tag=1, MPI_COMM_WORLD, status=0x7ffff2cf2650) failed
MPIDI_CH3_PktHandler_EagerShortSend(443): Message from rank 0 and tag 1 truncated; 8 bytes received but buffer size is 4".


The same code runs ok in serial. I notice that the problem occurs at
Pstream::scatter(areaFraction);
So one of the processors fails to receive the data from the master proc.
commsType is set to nonBlocking.

I would really appreciate any help on that.
Thanks, Pante


Hi, Pante
I'm facing the similar problem. Do you remember how you solved the problem?
Thanks,
Yanyan


All times are GMT -4. The time now is 18:45.