CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

Specify number of cores that CFX should use.

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   March 29, 2010, 08:39
Default Specify number of cores that CFX should use.
  #1
Senior Member
 
Lance
Join Date: Mar 2009
Posts: 522
Rep Power: 11
Lance is on a distinguished road
Hi, is there a command to specify the number of cores that CFX should use, similar to Fluent's -tX, where X is an integer?

Right now CFX uses all the eight cores on the compute node because I specify -par-dist $(hostlist -e -s, -a'*8' $SLURM_NODELIST) in my start script, which resolves to node_name*8. What I want is that CFX only uses four cores and Ansys the remaining four for my FSI simulation. But if I change -a'*8' to -a'*4' Ansys doesn't understand that it should use the remaining four cores available on the compute node. Instead both CFX and Ansys tries to share four cores, which makes the simulation run very slow.

So, is there a way to say how many number of the available cores CFX should use?

Cheers
Lance
Lance is offline   Reply With Quote

Old   March 29, 2010, 20:13
Default
  #2
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,823
Rep Power: 85
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
I don't understand your question. If you have 8 cores and tell CFX to use 4, then CFX will use 4 and there are another 4 available for ANSYS. So what is the problem?

Have you tried to use process affinity? That way you can assign specific processes to specific cores and can reduce conflicts. But 99% of the time this is not required as both Windows and Linux do a good job by default of sharing the load around.
ghorrocks is offline   Reply With Quote

Old   March 30, 2010, 02:05
Default
  #3
Senior Member
 
Lance
Join Date: Mar 2009
Posts: 522
Rep Power: 11
Lance is on a distinguished road
Hi Glenn,
I'll try to be more clear:
when running top on the compute node, I see four CFX processes at 100% CPU but only one Ansys which runs at ~400% CPU. I was expecting to see four Ansys at 100% each, which lead me to believe that Ansys only runs on one node? Or is this behavior normal?

Code:
top - 07:57:12 up 11 days, 15:37,  3 users,  load average: 5.22, 3.19, 1.41
Tasks: 195 total,   5 running, 189 sleeping,   0 stopped,   1 zombie
Cpu(s): 50.7%us,  0.4%sy,  0.0%ni, 48.8%id,  0.0%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:  16439436k total,  3295936k used, 13143500k free,   270088k buffers
Swap: 33999992k total,        0k used, 33999992k free,   684480k cac
 
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                               
17112 user_xx   25   0 9553m 474m  25m R 336.7  3.0   2:29.71 ansys.e120                                                            
17161 user_xx   25   0  426m 355m  23m R 100.1  2.2   4:46.80 solver-hpmpi.ex                                                       
17160 user_xx   25   0  442m 371m  24m R 99.8  2.3   4:46.48 solver-hpmpi.ex                                                        
17162 user_xx   25   0  435m 364m  24m R 99.5  2.3   4:45.90 solver-hpmpi.ex                                                        
16893 user_xx   16   0 8510m  33m  20m S  0.7  0.2   0:04.61 SolverManager.e
Lance is offline   Reply With Quote

Old   March 30, 2010, 02:13
Default
  #4
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,823
Rep Power: 85
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
No, this behaviour is normal. The tech-heads may wish to correct me here, but this is my understanding of what is happening:

CFX runs as 4 separate processes. This appears on the task manager as four processes. Each process has its own memory allocation and the processes communicate with each other through MPICH or whatever multi-processor framework you are working within.

ANSYS runs as a single process, but is multi-threaded so it can run four process threads simultaneously. This appears on the task manager as one process with 400% CPU load. The process accesses one big chunk of memory, and all threads can access the full memory. The communications between threads is taken care of internally in the ANSYS executable.
ghorrocks is offline   Reply With Quote

Old   March 30, 2010, 02:21
Default
  #5
Senior Member
 
Lance
Join Date: Mar 2009
Posts: 522
Rep Power: 11
Lance is on a distinguished road
wow, fast response from the other side of the globe.
Many thanks Glenn, what would we do without you
Lance is offline   Reply With Quote

Old   May 14, 2010, 12:26
Default
  #6
Senior Member
 
Join Date: Feb 2010
Posts: 145
Rep Power: 8
Jade M is on a distinguished road
Hi Lance,

I was wondering about this myself. Do you have a HPC license? If not, then my understanding is that CFX will only utilize, in total, 100% of one core. There are two levels of the HPC license, one where two cores can be utilized, and one where all cores can be utilized. My CFX technical support says

There are a couple of ways of speeding up simulations, but unfortunately in general nothing comes for free...: 1) Run in parallel. With CFX, you can take a single problem and partition it into multiple pieces to be run on several cores or even several networked computers. This requires those cores/computers as well as ANSYS parallel licensing keys. 2) Reduce mesh size. Of course this can compromise your solution. 3) Mesh with more efficient methods. It is possible that instead of using 3M elements to mesh a e.g. pipe, using a hex mesh or swept mesh method may result in a more efficient filling of the volume with elements yielding the same solution with less elements, faster run times, less memory usage... This is most prevalent in areas of tight gaps or small, high aspect ratio features, where the isotropic fill behaviour of tet elements can drive very high mesh counts.
Jade M is offline   Reply With Quote

Old   May 15, 2010, 06:59
Default
  #7
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,823
Rep Power: 85
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
Quote:
my understanding is that CFX will only utilize, in total, 100% of one core.
If running in serial, yes, it will use 100% of one core. (Providing nothing is stuffing things up like excessive page file useage or virus checking!)

Quote:
There are two levels of the HPC license, one where two cores can be utilized, and one where all cores can be utilized.
This is for the ANSYS FEA package. CFX is different. Any parallel in CFX requires a parallel license.

Quote:
There are a couple of ways of speeding up simulations
Yes, there are many. Here's my list. Firstly things which do not compromise simulation accuracy:
1) Run parallel
2) Get a faster CPU
3) Spend the time to optimise your simulation. This can take months of work for a complex simulation.

Here's things which gain speed at the price of accuracy:
1) Coarser mesh (do a mesh sensitivity study and work out which mesh gives you the accuracy you require - ie optimise)
2) Looser convergence (likewise do a sensitivity study to determine what is required)
3) Bigger time steps if transient (likewise do a sensitivity study to determine what is required)
4) Run single precision (but some simulations need double precision to run)
5) Simpler physics (eg degassing boundaries in multiphase, using incompressible buoyancy rather then fully compressible flow with buoyancy)
6) Don't do CFD at all and develop an analytical model or dimensional model or empirical model or similar.
ghorrocks is offline   Reply With Quote

Old   May 17, 2010, 08:40
Default
  #8
Senior Member
 
Join Date: Feb 2010
Posts: 145
Rep Power: 8
Jade M is on a distinguished road
The statement "There are two levels of the HPC license, one where two cores can be utilized, and one where all cores can be utilized" is correct. Yes, this applies to parallel licenses.
Jade M is offline   Reply With Quote

Old   November 12, 2013, 08:24
Default Best number of Cores per job
  #9
New Member
 
Hassan Adel
Join Date: Oct 2013
Location: Egypt
Posts: 17
Rep Power: 3
hassan1201 is on a distinguished road
Can Anyone tell me, how I can Calculate the optimum number of cores for any problems?
__________________
H.Elsheshtawy
hassan1201 is offline   Reply With Quote

Old   November 12, 2013, 17:58
Default
  #10
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,823
Rep Power: 85
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
Define "optimum" - everybody has a different definition. A commercial user who has to pay full commercial price for licenses might define it as simulations per $ spent. An academic user who has far cheaper licenses might define it as the quickest simulation time.
ghorrocks is offline   Reply With Quote

Old   March 31, 2014, 04:12
Default
  #11
New Member
 
Join Date: Nov 2013
Posts: 18
Rep Power: 3
magicalmarshmallow is on a distinguished road
Is there a general correlation between no. cores used and computational time? I can imagine this is analysis dependant, not necessarily node or mesh density dependant.

I'm told that there isn't a huge pile of difference between 8 & 16 cores, due to the slow interconnection between processors on our HPC. Maybe my information is incorrect, but is this a factor? (Yes we're academic users)
magicalmarshmallow is offline   Reply With Quote

Old   March 31, 2014, 05:50
Default
  #12
Senior Member
 
Lance
Join Date: Mar 2009
Posts: 522
Rep Power: 11
Lance is on a distinguished road
As you say, it is problem and hardware dependent. I generally see a linear speed-up between 8 and 16 cores for my problems, but we have infiniband which removes the interconnection bottleneck. If you use Ethernet Im sure that it will slow things down. Why not test to see for yourself?
Lance is offline   Reply With Quote

Old   March 31, 2014, 05:55
Default
  #13
New Member
 
Join Date: Nov 2013
Posts: 18
Rep Power: 3
magicalmarshmallow is on a distinguished road
I should do alright, mind you the wait time for 16 cores on the HPC might remove the gain. Suck it to em and see hey!
magicalmarshmallow is offline   Reply With Quote

Old   March 31, 2014, 05:59
Default
  #14
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,823
Rep Power: 85
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
CFX V15 has had parallel enhancements which allows it to scale well up to thousands of cores. I forget the exact number, but I am sure it was in the thousands. Have a look in the V15 promotional stuff.

But to get that you need fast interconnects (like infiniband) and a simulation large enough to scale to that number of processors, and physics which multiprocess well.

And then there is the issue of how to mesh these big simulations, set them up and post process them - but that is anothe rtopic.
ghorrocks is offline   Reply With Quote

Old   April 1, 2014, 10:43
Default
  #15
Senior Member
 
Bruno
Join Date: Mar 2009
Location: Brazil
Posts: 236
Rep Power: 12
brunoc is on a distinguished road
If you're looking for speed, the usual guideline is to use 50-100k nodes/core (not elements) for hex meshes and 70-100k nodes/core for tet meshes. But this is a really simple rule of thumb. For instance, things such as number of equations being solved is obviously important, but so are the number of domains and GGI interfaces in your model, which is something people usually don't take into account.

If you're looking for the best speed/investment ratio, lots of things need to be considered. First and foremost, is you're equipment able to handle a simulation with a large number of cores? You need to check stuff like connection speed (forget about using the conventional ethernet connection), CPU access to memory (processors with lots of cores used to not have enough bandwidth to access the memory at full speed, but I'm not sure about current models) and, in case of transient runs, a fast read/write speed to the disk. Also, with CFX, pre and post processing are done in serial, so you also need an extra computer with LOTS of RAM just to execute these steps. And then you have to buy the parallel licenses.

As you can see, the prices of parallel licenses become just one variable. In that regard, ANSYS has an 'HPC Pack' license that has its advantages and disadvantages. The main advantage is that it is cost effective for simulations with a huge number of cores, since since each license adds the number of cores allowed by 4x (1 lic = up to 8 cores, 2 lics = up to 32, then 128, 512 and so on). But the 'up to' thing is important: if you run a simulation with, say, 5 cores, you'll still use one HPC Pack license. 9-32 cores? Two licenses.

If you only have one solver license and a single machine with lots of cores, just follow the ~70-100k nodes/core up to the number of licenses available (or the number of cores on your computer, whichever is smaller) and you should be close to the best performance your machine can deliver.
brunoc is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Pros and Cons for CFX, CFdesign, COMSOL Val Main CFD Forum 3 June 10, 2011 02:20
DecomposePar unequal number of shared faces maka OpenFOAM Pre-Processing 6 August 12, 2010 09:01
air bubble is disappear increasing time using vof xujjun CFX 9 June 9, 2009 07:59
CFX Solver Memory Error mike CFX 1 March 19, 2008 08:22
CFX 4.4 installation problem Pandu Sattvika CFX 1 December 1, 2001 05:07


All times are GMT -4. The time now is 11:40.