CFD Online Discussion Forums

CFD Online Discussion Forums (
-   CFX (
-   -   parallel performance (

ivandipia January 27, 2009 13:15

parallel performance
I am testing cfx11 for parallel computing, using an Intel QuadCore with 4GB ram. My domain is a pipe with about 10000 elements in the section, and 256 nodes streamwise (about 3 millions of nodes). I have found that the serial run performs very well with full load of a single processor, while parallel run is a disaster. The load falls down and the wallclock time per timestep is higher than the serial run! Can it depend on the domain shape that is almost topologically cubic, or there is something else? Some mistake? I use PVM and automatic partioning.

Glenn Horrocks January 27, 2009 17:22

Re: parallel performance

Are you running on Windows? PVM does not run too well on windows. Try MPICH.

Glenn Horrocks

andy2o January 27, 2009 18:12

Re: parallel performance
Here's one possibility. Each solver process needs some extra memory, in addition to the memory for the matrices and variable values on the mesh. I'll call these two types the 'solver memory' and 'job memory'. So

a) For 1 CFX process you need (100% job memory + solver memory)

b) For 2 CFX processes, dividing the job 50% per process, you need 2*(50% job memory + solver memory)

c) For 3 CFX processes, dividing the job 33% per process, you need 3*(33% job memory + solver memory)

etc. So your parallel task will use more memory requirements than the single process version because of the extra copies of the 'solver memory'. Do you have enough memory? Are you using pagefiles in the parallel runs (but not in the single processor case)?

Cheers, andy

CycLone January 28, 2009 12:53

Re: parallel performance
As Glenn suggested, use MPICH. PVM hangs on Windows, which is probably the greatest source of delay.

The case you are running is only 10k nodes, so it probably won't scale that well. Available memory won't be an issue, but there is additional computational overhead for the solver and some communication overhead. On large models, these are amortized over a large number of nodes and aren't noticable, but you will generally see parallel efficiency drop off as your partitions drop below 100k nodes each.

That said, with MPICH you should see some improvement in run time, I just wouldn't expect 2 processors to be twice as fast (maybe 1.2 to 1.5 times faster). Make your mesh bigger and you'll see better parallel efficiency.


andy2o January 28, 2009 17:44

Re: parallel performance
You are probably right about PVM - I've never used Windows with CFX, and you have a knack for answering questions well here! However, I'll just point out the OP's problem is 10,000 elements in *each* of 256 mesh cross sections, giving ~3 million nodes in total, as the original post said. (It sounds like an extruded or structured mesh.)

I would certainly agree that 10,000 nodes is too small too scale well in parallel. However, the actual problem size of 3 million nodes does sound about the size that would use most of a 4GB machine's memory. Hence my suggestion - however I don't have access to a suitable problem to estimate the actual memory consumption accurately just now, so I freely admit it's just a half-educated guess!

Best wishes, andy2o

CycLone January 29, 2009 12:33

Re: parallel performance
Ah! I missed that. I thought it was 10k nodes total.

3 million nodes (hex) will probably require ~3GB RAM. The solver memory overhead is still pretty small, so I doubt it would push him over but it may be pushing the limit with other applications running on the same machine. PVM will definitely be an issue (it is gone from v12beta altogether), so let's see how MPICH works for him.


Nathan January 29, 2009 16:26

Re: parallel performance
The Intel Core 2 the front side bus can cause bottleneck issues as well. This might be part of the problem

All times are GMT -4. The time now is 16:32.