CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

parallel performance

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   January 27, 2009, 13:15
Default parallel performance
  #1
ivandipia
Guest
 
Posts: n/a
I am testing cfx11 for parallel computing, using an Intel QuadCore with 4GB ram. My domain is a pipe with about 10000 elements in the section, and 256 nodes streamwise (about 3 millions of nodes). I have found that the serial run performs very well with full load of a single processor, while parallel run is a disaster. The load falls down and the wallclock time per timestep is higher than the serial run! Can it depend on the domain shape that is almost topologically cubic, or there is something else? Some mistake? I use PVM and automatic partioning.
  Reply With Quote

Old   January 27, 2009, 17:22
Default Re: parallel performance
  #2
Glenn Horrocks
Guest
 
Posts: n/a
Hi,

Are you running on Windows? PVM does not run too well on windows. Try MPICH.

Glenn Horrocks
  Reply With Quote

Old   January 27, 2009, 18:12
Default Re: parallel performance
  #3
andy2o
Guest
 
Posts: n/a
Here's one possibility. Each solver process needs some extra memory, in addition to the memory for the matrices and variable values on the mesh. I'll call these two types the 'solver memory' and 'job memory'. So

a) For 1 CFX process you need (100% job memory + solver memory)

b) For 2 CFX processes, dividing the job 50% per process, you need 2*(50% job memory + solver memory)

c) For 3 CFX processes, dividing the job 33% per process, you need 3*(33% job memory + solver memory)

etc. So your parallel task will use more memory requirements than the single process version because of the extra copies of the 'solver memory'. Do you have enough memory? Are you using pagefiles in the parallel runs (but not in the single processor case)?

Cheers, andy
  Reply With Quote

Old   January 28, 2009, 12:53
Default Re: parallel performance
  #4
CycLone
Guest
 
Posts: n/a
As Glenn suggested, use MPICH. PVM hangs on Windows, which is probably the greatest source of delay.

The case you are running is only 10k nodes, so it probably won't scale that well. Available memory won't be an issue, but there is additional computational overhead for the solver and some communication overhead. On large models, these are amortized over a large number of nodes and aren't noticable, but you will generally see parallel efficiency drop off as your partitions drop below 100k nodes each.

That said, with MPICH you should see some improvement in run time, I just wouldn't expect 2 processors to be twice as fast (maybe 1.2 to 1.5 times faster). Make your mesh bigger and you'll see better parallel efficiency.

-CycLone
  Reply With Quote

Old   January 28, 2009, 17:44
Default Re: parallel performance
  #5
andy2o
Guest
 
Posts: n/a
You are probably right about PVM - I've never used Windows with CFX, and you have a knack for answering questions well here! However, I'll just point out the OP's problem is 10,000 elements in *each* of 256 mesh cross sections, giving ~3 million nodes in total, as the original post said. (It sounds like an extruded or structured mesh.)

I would certainly agree that 10,000 nodes is too small too scale well in parallel. However, the actual problem size of 3 million nodes does sound about the size that would use most of a 4GB machine's memory. Hence my suggestion - however I don't have access to a suitable problem to estimate the actual memory consumption accurately just now, so I freely admit it's just a half-educated guess!

Best wishes, andy2o
  Reply With Quote

Old   January 29, 2009, 12:33
Default Re: parallel performance
  #6
CycLone
Guest
 
Posts: n/a
Ah! I missed that. I thought it was 10k nodes total.

3 million nodes (hex) will probably require ~3GB RAM. The solver memory overhead is still pretty small, so I doubt it would push him over but it may be pushing the limit with other applications running on the same machine. PVM will definitely be an issue (it is gone from v12beta altogether), so let's see how MPICH works for him.

-CycLone
  Reply With Quote

Old   January 29, 2009, 16:26
Default Re: parallel performance
  #7
Nathan
Guest
 
Posts: n/a
The Intel Core 2 the front side bus can cause bottleneck issues as well. This might be part of the problem
  Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
parallel performance on BX900 uzawa OpenFOAM Installation on Windows, Mac and other Unsupported Platforms 3 September 5, 2011 15:52
Performance of GGI case in parallel hannes OpenFOAM Running, Solving & CFD 26 August 3, 2011 03:07
Parallel performance OpenFoam Vs Fluent prapanj Main CFD Forum 0 March 26, 2009 06:43
Performance of interFoam running in parallel hsieh OpenFOAM Running, Solving & CFD 8 September 14, 2006 09:15
ANSYS CFX 10.0 Parallel Performance for Windows XP Saturn CFX 4 August 13, 2006 12:27


All times are GMT -4. The time now is 05:48.