CFD Online Discussion Forums

CFD Online Discussion Forums (
-   CFX (
-   -   Specify number of cores that CFX should use. (

Lance March 29, 2010 08:39

Specify number of cores that CFX should use.
Hi, is there a command to specify the number of cores that CFX should use, similar to Fluent's -tX, where X is an integer?

Right now CFX uses all the eight cores on the compute node because I specify -par-dist $(hostlist -e -s, -a'*8' $SLURM_NODELIST) in my start script, which resolves to node_name*8. What I want is that CFX only uses four cores and Ansys the remaining four for my FSI simulation. But if I change -a'*8' to -a'*4' Ansys doesn't understand that it should use the remaining four cores available on the compute node. Instead both CFX and Ansys tries to share four cores, which makes the simulation run very slow.

So, is there a way to say how many number of the available cores CFX should use?


ghorrocks March 29, 2010 20:13

I don't understand your question. If you have 8 cores and tell CFX to use 4, then CFX will use 4 and there are another 4 available for ANSYS. So what is the problem?

Have you tried to use process affinity? That way you can assign specific processes to specific cores and can reduce conflicts. But 99% of the time this is not required as both Windows and Linux do a good job by default of sharing the load around.

Lance March 30, 2010 02:05

Hi Glenn,
I'll try to be more clear:
when running top on the compute node, I see four CFX processes at 100% CPU but only one Ansys which runs at ~400% CPU. I was expecting to see four Ansys at 100% each, which lead me to believe that Ansys only runs on one node? Or is this behavior normal?


top - 07:57:12 up 11 days, 15:37,  3 users,  load average: 5.22, 3.19, 1.41
Tasks: 195 total,  5 running, 189 sleeping,  0 stopped,  1 zombie
Cpu(s): 50.7%us,  0.4%sy,  0.0%ni, 48.8%id,  0.0%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:  16439436k total,  3295936k used, 13143500k free,  270088k buffers
Swap: 33999992k total,        0k used, 33999992k free,  684480k cac
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                             
17112 user_xx  25  0 9553m 474m  25m R 336.7  3.0  2:29.71 ansys.e120                                                           
17161 user_xx  25  0  426m 355m  23m R 100.1  2.2  4:46.80 solver-hpmpi.ex                                                     
17160 user_xx  25  0  442m 371m  24m R 99.8  2.3  4:46.48 solver-hpmpi.ex                                                       
17162 user_xx  25  0  435m 364m  24m R 99.5  2.3  4:45.90 solver-hpmpi.ex                                                       
16893 user_xx  16  0 8510m  33m  20m S  0.7  0.2  0:04.61 SolverManager.e

ghorrocks March 30, 2010 02:13

No, this behaviour is normal. The tech-heads may wish to correct me here, but this is my understanding of what is happening:

CFX runs as 4 separate processes. This appears on the task manager as four processes. Each process has its own memory allocation and the processes communicate with each other through MPICH or whatever multi-processor framework you are working within.

ANSYS runs as a single process, but is multi-threaded so it can run four process threads simultaneously. This appears on the task manager as one process with 400% CPU load. The process accesses one big chunk of memory, and all threads can access the full memory. The communications between threads is taken care of internally in the ANSYS executable.

Lance March 30, 2010 02:21

wow, fast response from the other side of the globe.
Many thanks Glenn, what would we do without you :)

Jade M May 14, 2010 12:26

Hi Lance,

I was wondering about this myself. Do you have a HPC license? If not, then my understanding is that CFX will only utilize, in total, 100% of one core. There are two levels of the HPC license, one where two cores can be utilized, and one where all cores can be utilized. My CFX technical support says

There are a couple of ways of speeding up simulations, but unfortunately in general nothing comes for free...: 1) Run in parallel. With CFX, you can take a single problem and partition it into multiple pieces to be run on several cores or even several networked computers. This requires those cores/computers as well as ANSYS parallel licensing keys. 2) Reduce mesh size. Of course this can compromise your solution. 3) Mesh with more efficient methods. It is possible that instead of using 3M elements to mesh a e.g. pipe, using a hex mesh or swept mesh method may result in a more efficient filling of the volume with elements yielding the same solution with less elements, faster run times, less memory usage... This is most prevalent in areas of tight gaps or small, high aspect ratio features, where the isotropic fill behaviour of tet elements can drive very high mesh counts.

ghorrocks May 15, 2010 06:59


my understanding is that CFX will only utilize, in total, 100% of one core.
If running in serial, yes, it will use 100% of one core. (Providing nothing is stuffing things up like excessive page file useage or virus checking!)


There are two levels of the HPC license, one where two cores can be utilized, and one where all cores can be utilized.
This is for the ANSYS FEA package. CFX is different. Any parallel in CFX requires a parallel license.


There are a couple of ways of speeding up simulations
Yes, there are many. Here's my list. Firstly things which do not compromise simulation accuracy:
1) Run parallel
2) Get a faster CPU
3) Spend the time to optimise your simulation. This can take months of work for a complex simulation.

Here's things which gain speed at the price of accuracy:
1) Coarser mesh (do a mesh sensitivity study and work out which mesh gives you the accuracy you require - ie optimise)
2) Looser convergence (likewise do a sensitivity study to determine what is required)
3) Bigger time steps if transient (likewise do a sensitivity study to determine what is required)
4) Run single precision (but some simulations need double precision to run)
5) Simpler physics (eg degassing boundaries in multiphase, using incompressible buoyancy rather then fully compressible flow with buoyancy)
6) Don't do CFD at all and develop an analytical model or dimensional model or empirical model or similar.

Jade M May 17, 2010 08:40

The statement "There are two levels of the HPC license, one where two cores can be utilized, and one where all cores can be utilized" is correct. Yes, this applies to parallel licenses.

hassan1201 November 12, 2013 08:24

Best number of Cores per job
Can Anyone tell me, how I can Calculate the optimum number of cores for any problems?:confused:

ghorrocks November 12, 2013 17:58

Define "optimum" - everybody has a different definition. A commercial user who has to pay full commercial price for licenses might define it as simulations per $ spent. An academic user who has far cheaper licenses might define it as the quickest simulation time.

magicalmarshmallow March 31, 2014 04:12

Is there a general correlation between no. cores used and computational time? I can imagine this is analysis dependant, not necessarily node or mesh density dependant.

I'm told that there isn't a huge pile of difference between 8 & 16 cores, due to the slow interconnection between processors on our HPC. Maybe my information is incorrect, but is this a factor? (Yes we're academic users)

Lance March 31, 2014 05:50

As you say, it is problem and hardware dependent. I generally see a linear speed-up between 8 and 16 cores for my problems, but we have infiniband which removes the interconnection bottleneck. If you use Ethernet Im sure that it will slow things down. Why not test to see for yourself?

magicalmarshmallow March 31, 2014 05:55

I should do alright, mind you the wait time for 16 cores on the HPC might remove the gain. Suck it to em and see hey!

ghorrocks March 31, 2014 05:59

CFX V15 has had parallel enhancements which allows it to scale well up to thousands of cores. I forget the exact number, but I am sure it was in the thousands. Have a look in the V15 promotional stuff.

But to get that you need fast interconnects (like infiniband) and a simulation large enough to scale to that number of processors, and physics which multiprocess well.

And then there is the issue of how to mesh these big simulations, set them up and post process them - but that is anothe rtopic.

brunoc April 1, 2014 10:43

If you're looking for speed, the usual guideline is to use 50-100k nodes/core (not elements) for hex meshes and 70-100k nodes/core for tet meshes. But this is a really simple rule of thumb. For instance, things such as number of equations being solved is obviously important, but so are the number of domains and GGI interfaces in your model, which is something people usually don't take into account.

If you're looking for the best speed/investment ratio, lots of things need to be considered. First and foremost, is you're equipment able to handle a simulation with a large number of cores? You need to check stuff like connection speed (forget about using the conventional ethernet connection), CPU access to memory (processors with lots of cores used to not have enough bandwidth to access the memory at full speed, but I'm not sure about current models) and, in case of transient runs, a fast read/write speed to the disk. Also, with CFX, pre and post processing are done in serial, so you also need an extra computer with LOTS of RAM just to execute these steps. And then you have to buy the parallel licenses.

As you can see, the prices of parallel licenses become just one variable. In that regard, ANSYS has an 'HPC Pack' license that has its advantages and disadvantages. The main advantage is that it is cost effective for simulations with a huge number of cores, since since each license adds the number of cores allowed by 4x (1 lic = up to 8 cores, 2 lics = up to 32, then 128, 512 and so on). But the 'up to' thing is important: if you run a simulation with, say, 5 cores, you'll still use one HPC Pack license. 9-32 cores? Two licenses.

If you only have one solver license and a single machine with lots of cores, just follow the ~70-100k nodes/core up to the number of licenses available (or the number of cores on your computer, whichever is smaller) and you should be close to the best performance your machine can deliver.

All times are GMT -4. The time now is 11:08.