CFD Online Discussion Forums

CFD Online Discussion Forums (
-   Main CFD Forum (
-   -   Parallel computing quad core (

Prad January 14, 2009 12:14

Parallel computing quad core

I am running my CFD code on quad core machine. It terminating with following error: "rank 1 in job 78 host_40793 caused collective abort of all ranks exit status of rank 1: killed by signal 11"

My machine architecture is Processor (CPU): Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz

OS: Linux x86_64

System: openSUSE 10.3 (x86_64)

KDE: 3.5.7 "release 72.9"

MPI Library is :mpich

But I have already ran this with IBM-Cluster with AIX operating system with 8 processors. And there is no problem(even with higher number of control volumes)

But when I reduce the no of control volumes. It is working even on quad core.

For example I can run on single processor with 1 million control volumes. But I am not able use more than few thousand(say 40 K) control volumes on four processors, all together. My code is based on block structures(It reads blocks and the divisions on each line). It is written in Fortran 77 and using gcc and gfortran for compiling it.

In short I am able run on four processors of quad core machine with very few number of control volume. But when I increase the number of control volumes, it is collapsing with the above error.

I dont know much about MPI. So, it will be great, if somebody can throw light how to use calculate on quad core machine with higher number of control volumes.

Please let me know, if you need some more information to answer

Regards Prad

Velan January 14, 2009 13:54

Re: Parallel computing quad core
I got similar problem like this. I ran my job using 1 or 2 million grid points in one machine. But the same job was not ran in other machine. I spend so much time on it and found one way of solution. Its not about machine problem. Its about compiler problem

check the below link (It may help you)

Job which ran has PGI compilers and Job which gave problem has Intel compilers. This error comes usually in do loops. PGI sends the data as vectors, but intel wont do it. In intel compliers Job will not run eventhough you job doesnt require high memory. So try with some other compliers.

Hope this provides you some help

- Velan

chandra January 23, 2009 08:59

Re: Parallel computing quad core
well, r u sure u can use MPI on a quad-core system? As much as I know, quad-core like systems are shared memory systems suitable for OpenMP. MPI is suitable for distributed memory systems, like linux clusters.

Please check it out in detail.

Prad January 23, 2009 09:05

Re: Parallel computing quad core
Hi Chandra, As I have mentioned I am able to use MPI on quadcore machine and it is running with few number of grid points(or control volumed). More ever my colleagues are able run their codes on similar quad core machines. But their solver is different.

I think problem may be with compiler as Velan mentioned or the memory allocation or something which I am not able to figure out.

Regards Prad

Prad January 23, 2009 09:12

Re: Parallel computing quad core
Hi Velan,

I tried to change the PGI, I have some problems with make file. So, I am not bale to compile it with PGI. I am not sure how to make a makefile suitable for PGI.

Can you suggest on making of make file for PGI, I changed existing file , below u can how my new make file for PGI looks like...what is wrong in it?

SYSNAME = pgi.x86Linux









CPPDFLAGS = -traditional-cpp -E -P -M

CPPD = gcc


CPPFLAGS = -traditional-cpp -E -P

CPP = gcc





FFLAGSFAST = -fast -tp p6 -Mdalign -c -byteswapio


FFLAGSDEBUG = -g -c -byteswapio -Mbounds

FFLAGSPAR = (need only be set if EXTRAPAROBJS is TRUE)

FFLAGSPARPRF = (need only be set if EXTRAPAROBJS is TRUE)



FC = pgf77

FCPAR = (need only be set if EXTRAPAROBJS is TRUE)





CFLAGSPAR = (need only be set if EXTRAPAROBJS is TRUE)

CFLAGSPARPRF = (need only be set if EXTRAPAROBJS is TRUE)

CFLAGSPARDBG = (need only be set if EXTRAPAROBJS is TRUE)

CC = gcc

CCPAR = (need only be set if EXTRAPAROBJS is TRUE)











LINK = pgf77

LINKPAR = mpif77

chandra January 23, 2009 09:20

Re: Parallel computing quad core
If so, the compiler may be a problem. I've also faced problems in past because of the compiler. When I changed my compiler from GCC to Intel's ICC for my OpenMP code, the same code ran very well on the same machine. So, if possible, plz try to change the compiler and re-run the code.

Tom January 23, 2009 09:26

Re: Parallel computing quad core
Have you tried running your code in debug mode? I've rum LPI using the intel compiler without any problems (you may need to type ulimit -s unlimited before running the code though!).

Also as chandra above says it's not a good idea to run mpi on a intel quadcore (my experience is that it will actually run slower than a single core due to each cpu flushing the shared cache and reloading it with it's own data).

The shared cache is really a big problem on intel quadcores since you only tend to get good scaling when the data that all 4 cores is using fits into cache at the same time.

Velan January 23, 2009 10:20

Re: Parallel computing quad core
Hi Prad,

I used rocks version of PGI which is very simple to compile :). For fast reply post your quires and error in

They will help you in more detail about how to compile it.

Jed January 23, 2009 14:51

Re: Parallel computing quad core
The memory bandwidth issue is pretty fundamental, not specifically an MPI issue. You should get fine multicore performance for matrix assembly and residual evaluation. Everything will be poor for sparse linear algebra since one core can pretty much saturate the memory bandwidth for the entire socket. Getting significant benefit from multiple cores in the sparse matrix kernels requires quite a lot of tricks, see and note that several techniques that make the final pthreads implementation impressive can also be applied to the MPI version.

The current advice is to get a Nehalem (Core i7) if you want better memory bandwidth. Otherwise, just buy sockets, the number of cores and their speed is much less relevant than the number of sockets and the speed of the bus.

Tom January 24, 2009 20:23

Re: Parallel computing quad core
Well, since I've got unlimited access to 3 supercomputers which are (essentially) free of the problems I described to the original poster, that's not particularly good advice - I just use a quadcore at home for messing about. Basically on a quadcore it's a bad idea to use mpi and, as intel have reported in their own research, once your data exceeds a certain size you essentially aren't any better off using >2 cores.

TG January 25, 2009 12:00

Re: Parallel computing quad core
MPI on quad cores is neither good nor bad. Its just a means to communicate between different processes. If your algorithm is memory bandwidth intensive, no means of inter-process communication will keep the pipes full and your performance will suffer. If your algorithm is compute intensive and your memory bandwidth needs are low, it will work just fine. Its not MPI that is the problem - its the algorithm that determines whether it will scale well on quads or not.

Tom January 25, 2009 14:52

Re: Parallel computing quad core
"If your algorithm is compute intensive and your memory bandwidth needs are low, it will work just fine"

That's the point (and the fact that the MPIsend/recieve can cause problems with the shared cache) most CFD calculations are going to run into the bandwidth problem on quadcores fairly quicky.

A simple example is to use Jacobi iteration (a highly scalable "bit reproducable" algorithm) so solve Poisson's equation on a intel quadcore you'll get perfect scaling on a 360x360 grid. Now redo the calculation on a 720x720 grid and you'll find it difficult to even half the computational time (basically two cores is almost optimal).

In contrast you also get the occasional "super scaling" by going to 2 cores from 1 (try the same problem on a 720x360 grid!) and no further improvement for 3 or 4 cores.

This is just something that you need to be wary of when your code has to run efficiently on a number of different parallel architectures.

hahnpv January 29, 2009 18:47

Re: Parallel computing quad core
Hi Prad,

It is entirely possible there is something wrong in your algorithms that is a function of number of cores and number of control volumes.

I have my own home brew CFD code which I recently parallelized in MPI and ran into similar issues. I wouldn't blame your Intel box until you run the exact same case on the supercomputer with the same number of ranks.

I agree with the arguments posed by Tom and others but that shouldn't cause it to crash in this manner.


Prad February 9, 2009 15:28

Re: Parallel computing quad core
Hi Philip,

I was out of station for last two weeks. I think your suggestion is most suitabl to my case. As this code is age old code. Many people worked on this code and added lot of stuff and now it is really huge code and with lots of problem. But I am supposed to work with this code only. And it is also based on block structured. So, it reads mesh as blocks. Sometimes same no of mesh points with different number of blocks also gives the problem for compilation. Older computers sometimes allow you to compile higher no of mesh points than newer processors and newer operating system .

I ran the code on super computer. It works much better, and it doesn't give any compilation problems and run time problems on super computer. Only problem is on local machines with single and quad core computers.

Can you elaborate on the issues u have faced and how did you solve them, which may be helpful to me? Thanks in advance Prad

All times are GMT -4. The time now is 03:04.