CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > Siemens

star 4.06 memory on linux cluster

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 20, 2009, 12:27
Default star 4.06 memory on linux cluster
  #1
New Member
 
john mck
Join Date: Apr 2009
Posts: 3
Rep Power: 16
johnmck is on a distinguished road
I'm trying to run Star 4.06 on a linux cluster with pbs, on 900,000 cells modelling incompressible transient flow. Each node of the cluster has two processors with 4 cores, and 8GB of shared memory. The model is partitioned using metis.

Each processor is an Intel(R) Xeon(R) CPU E5430 @ 2.66GHz

Uname -a gives: Linux 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:14:47 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

The compiler is Absoft 9.0 EP 64 bit.

My question is this:

If I use 8 processes on each of 2 nodes, ie 16 processes in total, each process takes 860Mb virtual memory (mostly data and stack).
If I use 8 processes on each of 4 nodes, ie 32 processes in total, each process takes 820Mb.

How come if I use more processes each doesnt use proportionally less memory? I would have expected the total memory used to stay almost constant. As it stands I can't use many more cells before the nodes run out of memory, and start to swap.

Any advice appreciated. My apologies if I've missed something obvious like an option.
Regards
John Mck.
johnmck is offline   Reply With Quote

Old   April 20, 2009, 12:49
Default
  #2
Senior Member
 
Aroon
Join Date: Apr 2009
Location: Racine WI
Posts: 148
Rep Power: 16
vishyaroon is on a distinguished road
The memory used is not only related to the problem size. As you use more processors, the communication overhead between the processors increases. So you'll not have a linear decrease in the memory usage. At some point using more processors may result in slower performance due to the communication overhead
vishyaroon is offline   Reply With Quote

Old   April 20, 2009, 16:17
Default yes a tradeoff - but not yet?
  #3
New Member
 
john mck
Join Date: Apr 2009
Posts: 3
Rep Power: 16
johnmck is on a distinguished road
Yes, I'd agree that eventually there is a trade off, when using more nodes adds a greater communications overhead then the computational benefit they bring.

But I didnt think I'd reached that point yet. I'm finding that I can't run 1,000,000 cells on two nodes each with 8Gb and 8 processors. Other threads indicate that I should be able to do this on a single processor with 2GB memory.

Any ideas?

Regards
John
johnmck is offline   Reply With Quote

Old   April 20, 2009, 17:25
Default
  #4
Senior Member
 
Aroon
Join Date: Apr 2009
Location: Racine WI
Posts: 148
Rep Power: 16
vishyaroon is on a distinguished road
That was my initial thought too. I use similar machines (my Linux machine) shows the same capabilities as your except for a different linux version. And I frequently run about 1 million size meshes on 1 processor.
vishyaroon is offline   Reply With Quote

Old   April 20, 2009, 19:15
Default
  #5
f-w
Senior Member
 
Join Date: Apr 2009
Posts: 153
Rep Power: 16
f-w is on a distinguished road
johnmck,

Just out of curiosity, have you benchmarked your quad-cores? I was advised to go with dual-cores instead of quad-cores because of the inherent performance loss when using all 4 cores (which I confirmed on my head-node with Star-CCM+). What is your "speedup" going from 7 to 8 cores on one of your nodes?

Thanks,
f-w
f-w is offline   Reply With Quote

Old   April 21, 2009, 03:48
Default
  #6
Senior Member
 
Mark Olesen
Join Date: Mar 2009
Location: https://olesenm.github.io/
Posts: 1,679
Rep Power: 40
olesen has a spectacular aura aboutolesen has a spectacular aura about
Quote:
Originally Posted by f-w View Post
I was advised to go with dual-cores instead of quad-cores because of the inherent performance loss when using all 4 cores
I don't think the issue is dual vs. quad core per se, but rather the bottleneck accessing the memory. We've have several dual-cpu/quad-cores machines in our cluster and found that using a single process per cpu gave us about 30-35% better performance than using all of the cores (no swapping occured). In _our_ testcase, the memory bottleneck was worse than the network overhead incurred by spreading the job over more machines. As always, do not trust anybody's benchmark though, but benchmark with your own problems.

With the changes in memory access with the Nehalem cpus, the impact of the memory bottleneck should become less significant in the future ... it might even be better in the current generation of AMD cpus.
olesen is offline   Reply With Quote

Old   April 21, 2009, 08:58
Default More Results
  #7
New Member
 
john mck
Join Date: Apr 2009
Posts: 3
Rep Power: 16
johnmck is on a distinguished road
I ran some more tests (mesh 96x99x96=912384 cells), and yes we are reaching a tradeoff:


Nodes x processes per node
1x1=1 (ie serial) uses 1870Mb/process
1x2=2 uses 1340
1x4=4 uses 1060
1x8=8 uses 940
2x4=8 uses 930
2x8=16 uses 860
4x8=32 uses 820

For our work using 32 licences the memory per process doesnt fall much below half the serial memory requirement. So for big jobs we'll have to only partly use nodes, in order to get enough memory on them.

The memory overhead due to parallel working seems surprisingly high, to me at least.

Many Thanks
Regards
John mck
johnmck is offline   Reply With Quote

Old   April 22, 2009, 14:02
Default
  #8
TMG
Member
 
Join Date: Mar 2009
Posts: 44
Rep Power: 17
TMG is on a distinguished road
Your model is too small to make your conclusion valid. By 32 cores you only have 28000 cells on each core (that's a very small number). At that size the overhead of all the "halo" cells (the cells that exist at boundaries between two domains) are just not going to decrease any further. If you run a much larger (like an order of magnitude) model, you will see the memory effect you are looking for.
TMG is offline   Reply With Quote

Reply

Tags
linux, memory, parallel, star, xeon

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Running on Distibuted Memory linux itanium cluster Josh FLUENT 0 January 29, 2007 01:18
HPC on a Linux cluster Jihwan Siemens 2 November 22, 2005 11:17
[Commercial meshers] Trimmed cell and embedded refinement mesh conversion issues michele OpenFOAM Meshing & Mesh Conversion 2 July 15, 2005 05:15
Linux Cluster Performance with a bi-processor PC M. FLUENT 1 April 22, 2005 10:25
Star and cluster under Linux jens Siemens 1 January 19, 2000 04:59


All times are GMT -4. The time now is 01:20.