CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Main CFD Forum

Parallel computing quad core

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   January 14, 2009, 12:14
Default Parallel computing quad core
  #1
Prad
Guest
 
Posts: n/a
Hi

I am running my CFD code on quad core machine. It terminating with following error: "rank 1 in job 78 host_40793 caused collective abort of all ranks exit status of rank 1: killed by signal 11"

My machine architecture is Processor (CPU): Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz

OS: Linux 2.6.22.19-0.1-default x86_64

System: openSUSE 10.3 (x86_64)

KDE: 3.5.7 "release 72.9"

MPI Library is :mpich

But I have already ran this with IBM-Cluster with AIX operating system with 8 processors. And there is no problem(even with higher number of control volumes)

But when I reduce the no of control volumes. It is working even on quad core.

For example I can run on single processor with 1 million control volumes. But I am not able use more than few thousand(say 40 K) control volumes on four processors, all together. My code is based on block structures(It reads blocks and the divisions on each line). It is written in Fortran 77 and using gcc and gfortran for compiling it.

In short I am able run on four processors of quad core machine with very few number of control volume. But when I increase the number of control volumes, it is collapsing with the above error.

I dont know much about MPI. So, it will be great, if somebody can throw light how to use calculate on quad core machine with higher number of control volumes.

Please let me know, if you need some more information to answer

Regards Prad

  Reply With Quote

Old   January 14, 2009, 13:54
Default Re: Parallel computing quad core
  #2
Velan
Guest
 
Posts: n/a
I got similar problem like this. I ran my job using 1 or 2 million grid points in one machine. But the same job was not ran in other machine. I spend so much time on it and found one way of solution. Its not about machine problem. Its about compiler problem

check the below link (It may help you) http://www.clusterresources.com/pipe...er/004457.html

Job which ran has PGI compilers and Job which gave problem has Intel compilers. This error comes usually in do loops. PGI sends the data as vectors, but intel wont do it. In intel compliers Job will not run eventhough you job doesnt require high memory. So try with some other compliers.

Hope this provides you some help

- Velan
  Reply With Quote

Old   January 23, 2009, 08:59
Default Re: Parallel computing quad core
  #3
chandra
Guest
 
Posts: n/a
well, r u sure u can use MPI on a quad-core system? As much as I know, quad-core like systems are shared memory systems suitable for OpenMP. MPI is suitable for distributed memory systems, like linux clusters.

Please check it out in detail.

  Reply With Quote

Old   January 23, 2009, 09:05
Default Re: Parallel computing quad core
  #4
Prad
Guest
 
Posts: n/a
Hi Chandra, As I have mentioned I am able to use MPI on quadcore machine and it is running with few number of grid points(or control volumed). More ever my colleagues are able run their codes on similar quad core machines. But their solver is different.

I think problem may be with compiler as Velan mentioned or the memory allocation or something which I am not able to figure out.

Regards Prad

  Reply With Quote

Old   January 23, 2009, 09:12
Default Re: Parallel computing quad core
  #5
Prad
Guest
 
Posts: n/a
Hi Velan,

I tried to change the PGI, I have some problems with make file. So, I am not bale to compile it with PGI. I am not sure how to make a makefile suitable for PGI.

Can you suggest on making of make file for PGI, I changed existing file , below u can how my new make file for PGI looks like...what is wrong in it?

SYSNAME = pgi.x86Linux

DEFTARGET = fast

USECPP = FALSE

MOVEFOBJS = FALSE

MOVECOBJS = FALSE

USEINLINE = FALSE

AUTOINLINE = FALSE

EXTRAPAROBJS = FALSE

FDEFINES = -DU77 -DX86LINUX $(XTRADEF)

CPPDFLAGS = -traditional-cpp -E -P -M

CPPD = gcc

CPPDTYP = CPPGNU

CPPFLAGS = -traditional-cpp -E -P

CPP = gcc

CPPTYP = CPPGNU

MACHOPT =

EXPSUB =

EXPFILE =

FFLAGSFAST = -fast -tp p6 -Mdalign -c -byteswapio

FFLAGSPROF = -pg $(FFLAGSFAST)

FFLAGSDEBUG = -g -c -byteswapio -Mbounds

FFLAGSPAR = (need only be set if EXTRAPAROBJS is TRUE)

FFLAGSPARPRF = (need only be set if EXTRAPAROBJS is TRUE)

FFLAGSPARDBG =

FFLAGOBJNAM = -o

FC = pgf77

FCPAR = (need only be set if EXTRAPAROBJS is TRUE)

CDEFINES = -DSUBNAMUNDERSCORE -DGNU

CFLAGSFAST = -c

CFLAGSPROF = -pg $(CFLAGSFAST)

CFLAGSDEBUG = -g -c

CFLAGSPAR = (need only be set if EXTRAPAROBJS is TRUE)

CFLAGSPARPRF = (need only be set if EXTRAPAROBJS is TRUE)

CFLAGSPARDBG = (need only be set if EXTRAPAROBJS is TRUE)

CC = gcc

CCPAR = (need only be set if EXTRAPAROBJS is TRUE)

LIBS =

LIBSPROF = $(LIBS)

LIBSPAR =

LIBSPARPROF = $(LIBSPAR)

LDFLAGSFAST =

LDFLAGSPROF = -pg

LDFLAGSDEBUG =

LDFLAGSPAR =

LDFLAGSPARPRF = -pg

LDFLAGSPARDBG =

LINK = pgf77

LINKPAR = mpif77
  Reply With Quote

Old   January 23, 2009, 09:20
Default Re: Parallel computing quad core
  #6
chandra
Guest
 
Posts: n/a
If so, the compiler may be a problem. I've also faced problems in past because of the compiler. When I changed my compiler from GCC to Intel's ICC for my OpenMP code, the same code ran very well on the same machine. So, if possible, plz try to change the compiler and re-run the code.

  Reply With Quote

Old   January 23, 2009, 09:26
Default Re: Parallel computing quad core
  #7
Tom
Guest
 
Posts: n/a
Have you tried running your code in debug mode? I've rum LPI using the intel compiler without any problems (you may need to type ulimit -s unlimited before running the code though!).

Also as chandra above says it's not a good idea to run mpi on a intel quadcore (my experience is that it will actually run slower than a single core due to each cpu flushing the shared cache and reloading it with it's own data).

The shared cache is really a big problem on intel quadcores since you only tend to get good scaling when the data that all 4 cores is using fits into cache at the same time.
  Reply With Quote

Old   January 23, 2009, 10:20
Default Re: Parallel computing quad core
  #8
Velan
Guest
 
Posts: n/a
Hi Prad,

I used rocks version of PGI which is very simple to compile . For fast reply post your quires and error in

http://www.pgroup.com/userforum/index.php

They will help you in more detail about how to compile it.
  Reply With Quote

Old   January 23, 2009, 14:51
Default Re: Parallel computing quad core
  #9
Jed
Guest
 
Posts: n/a
The memory bandwidth issue is pretty fundamental, not specifically an MPI issue. You should get fine multicore performance for matrix assembly and residual evaluation. Everything will be poor for sparse linear algebra since one core can pretty much saturate the memory bandwidth for the entire socket. Getting significant benefit from multiple cores in the sparse matrix kernels requires quite a lot of tricks, see http://crd.lbl.gov/~oliker/papers/SIAMPP08-oliker.pdf and note that several techniques that make the final pthreads implementation impressive can also be applied to the MPI version.

The current advice is to get a Nehalem (Core i7) if you want better memory bandwidth. Otherwise, just buy sockets, the number of cores and their speed is much less relevant than the number of sockets and the speed of the bus.
  Reply With Quote

Old   January 24, 2009, 20:23
Default Re: Parallel computing quad core
  #10
Tom
Guest
 
Posts: n/a
Well, since I've got unlimited access to 3 supercomputers which are (essentially) free of the problems I described to the original poster, that's not particularly good advice - I just use a quadcore at home for messing about. Basically on a quadcore it's a bad idea to use mpi and, as intel have reported in their own research, once your data exceeds a certain size you essentially aren't any better off using >2 cores.
  Reply With Quote

Old   January 25, 2009, 12:00
Default Re: Parallel computing quad core
  #11
TG
Guest
 
Posts: n/a
MPI on quad cores is neither good nor bad. Its just a means to communicate between different processes. If your algorithm is memory bandwidth intensive, no means of inter-process communication will keep the pipes full and your performance will suffer. If your algorithm is compute intensive and your memory bandwidth needs are low, it will work just fine. Its not MPI that is the problem - its the algorithm that determines whether it will scale well on quads or not.
  Reply With Quote

Old   January 25, 2009, 14:52
Default Re: Parallel computing quad core
  #12
Tom
Guest
 
Posts: n/a
"If your algorithm is compute intensive and your memory bandwidth needs are low, it will work just fine"

That's the point (and the fact that the MPIsend/recieve can cause problems with the shared cache) most CFD calculations are going to run into the bandwidth problem on quadcores fairly quicky.

A simple example is to use Jacobi iteration (a highly scalable "bit reproducable" algorithm) so solve Poisson's equation on a intel quadcore you'll get perfect scaling on a 360x360 grid. Now redo the calculation on a 720x720 grid and you'll find it difficult to even half the computational time (basically two cores is almost optimal).

In contrast you also get the occasional "super scaling" by going to 2 cores from 1 (try the same problem on a 720x360 grid!) and no further improvement for 3 or 4 cores.

This is just something that you need to be wary of when your code has to run efficiently on a number of different parallel architectures.
  Reply With Quote

Old   January 29, 2009, 18:47
Default Re: Parallel computing quad core
  #13
hahnpv
Guest
 
Posts: n/a
Hi Prad,

It is entirely possible there is something wrong in your algorithms that is a function of number of cores and number of control volumes.

I have my own home brew CFD code which I recently parallelized in MPI and ran into similar issues. I wouldn't blame your Intel box until you run the exact same case on the supercomputer with the same number of ranks.

I agree with the arguments posed by Tom and others but that shouldn't cause it to crash in this manner.

Philip

  Reply With Quote

Old   February 9, 2009, 15:28
Default Re: Parallel computing quad core
  #14
Prad
Guest
 
Posts: n/a
Hi Philip,

I was out of station for last two weeks. I think your suggestion is most suitabl to my case. As this code is age old code. Many people worked on this code and added lot of stuff and now it is really huge code and with lots of problem. But I am supposed to work with this code only. And it is also based on block structured. So, it reads mesh as blocks. Sometimes same no of mesh points with different number of blocks also gives the problem for compilation. Older computers sometimes allow you to compile higher no of mesh points than newer processors and newer operating system .

I ran the code on super computer. It works much better, and it doesn't give any compilation problems and run time problems on super computer. Only problem is on local machines with single and quad core computers.

Can you elaborate on the issues u have faced and how did you solve them, which may be helpful to me? Thanks in advance Prad

  Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
solving a conduction problem in FLUENT using UDF Avin2407 Fluent UDF and Scheme Programming 1 March 13, 2015 03:02
Superlinear speedup in OpenFOAM 13 msrinath80 OpenFOAM Running, Solving & CFD 18 March 3, 2015 06:36
Parallel Processing in Quad Core Computer Francis FLUENT 2 August 5, 2008 08:35
Parallel computing on dual core Fabio FLUENT 3 July 8, 2008 05:28
Parallel processing in quad core Renato Pacheco FLUENT 1 June 4, 2008 12:06


All times are GMT -4. The time now is 08:23.