CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   FLUENT (http://www.cfd-online.com/Forums/fluent/)
-   -   Fluent speed up studies (http://www.cfd-online.com/Forums/fluent/129445-fluent-speed-up-studies.html)

Anna Tian February 5, 2014 08:29

Fluent speed up studies
 
Hi,

Have someone done any speed up studies for Fluent? For example, I have 16 simulations with 10 millions grids to run and I only have 32 processors. In this case, shall I run 4 jobs at the same time with 8 processors for each or I'd better run 8 jobs at the same time with 4 processors for each? Or 2 jobs with 16 processors for each? Which way can let me finish all the simulations earlier? Or this also depends on the kind of CPU I use? Any indications on that?

This should be a quite general topic discussed several times before. But I didn't find any threads on that. Could someone give a link to them?

flotus1 February 5, 2014 09:01

The theoretical linear speedup can never be reached due to communication losses and parts in the code that can not be executed in parallel.
So it is fastest to run as many cases at the same time as possible as long as you dont run out of memory.

Anna Tian February 5, 2014 10:52

Quote:

Originally Posted by flotus1 (Post 473542)
The theoretical linear speedup can never be reached due to communication losses and parts in the code that can not be executed in parallel.
So it is fastest to run as many cases at the same time as possible as long as you dont run out of memory.

Ok. Your answer helps me. I think I should refer to something else.

I once saw a speed up study results in an academic institute. It shows that when the processors number jump from 8 to 10, the simulation won't be as much speed up as when the processors number jump from 6 to 8. They even plot a curve to show that there is a kink at the number of 8. Before 8, it is quite effective to increase the number of processors. But after 8, it won't be that effective. Is the anywhere which gives a this kind of speed up study methodology so that I can follow or see the testing results directly?

Sorry that this area is quite new to me, so I'm not sure whether I'm using the correct technical term to describe it.

flotus1 February 5, 2014 13:24

Speedup is the correct term for this.

The procedure is quite simple.
You run the case of interest on a single core and take the time.
Then you increase the number of cores and run the same case again. For a case run on n cores we will name the time taken T_n.

Now the speedup S_n for n cores is simply
S_n=\frac{T_1}{T_n}
The ideal speedup would be a straight line with slope 1.
Usually, due to the losses already mentioned, the real curve will be below this line.

Personally, I find the parrallel efficiency more intuitive
E_n=\frac{T_1}{n \cdot T_n}

Should have checked first, I just wrote down part of the wikipedia article on this topic. But it still has some more information so its worth visiting.

Quote:

Is the anywhere which gives a this kind of speed up study methodology so that I can follow or see the testing results directly?
I dont quite get what you want. Please rephrase.

Anna Tian February 5, 2014 16:14

Quote:

Originally Posted by flotus1 (Post 473605)
Speedup is the correct term for this.

The procedure is quite simple.
You run the case of interest on a single core and take the time.
Then you increase the number of cores and run the same case again. For a case run on n cores we will name the time taken T_n.

Now the speedup S_n for n cores is simply
S_n=\frac{T_1}{T_n}
The ideal speedup would be a straight line with slope 1.
Usually, due to the losses already mentioned, the real curve will be below this line.

Personally, I find the parrallel efficiency more intuitive
E_n=\frac{T_1}{n \cdot T_n}

Should have checked first, I just wrote down part of the wikipedia article on this topic. But it still has some more information so its worth visiting.


I dont quite get what you want. Please rephrase.

Thank you for your answer, Flotus1. That's very helpful. I meant could I just use the testing results that other people did or I need to do the testing by myself? Does it also depend on the algorithm I choose? For different grids but same grids number, will steady-state simulation give the same testing results (we set the max iteration number)? Does the test result depend on the CPUs? For example, we have 32 CPU A and another group have 32 CPU B. Will we obtain the same testing results?

flotus1 February 5, 2014 16:53

Quote:

I meant could I just use the testing results that other people did or I need to do the testing by myself?
That also depends on WHY these results are so important for you.
The only time I did such an analysis it was to compare the parallel efficiency of an in-house CFD code with a commercial one.

Apart from that, as you already supposed, there are many factors affecting the result of such an analysis.
Impossible to list them all, so lets stick with the conclusion that group A and group B from you example will definitely not get the same result.
They might not even get the same result if they were using the same cpus because there could still be many other issues affecting the parallel performance.

Quote:

For different grids but same grids number, will steady-state simulation give the same testing results
Not necessarily. Imagine an all-hex mesh and compare it to a polyhedral mesh.
The polyhedral cells have a higher number of faces at the interfaces between the partitions, so the performance loss due to communication will be higher.

Anna Tian February 5, 2014 17:16

Quote:

Originally Posted by flotus1 (Post 473628)
That also depends on WHY these results are so important for you.
The only time I did such an analysis it was to compare the parallel efficiency of an in-house CFD code with a commercial one.

Apart from that, as you already supposed, there are many factors affecting the result of such an analysis.
Impossible to list them all, so lets stick with the conclusion that group A and group B from you example will definitely not get the same result.
They might not even get the same result if they were using the same cpus because there could still be many other issues affecting the parallel performance.


Not necessarily. Imagine an all-hex mesh and compare it to a polyhedral mesh.
The polyhedral cells have a higher number of faces at the interfaces between the partitions, so the performance loss due to communication will be higher.

Sorry that I didn't tell the conditions clearly.

1. Software is fixed to be Fluent.

2. I understand about the CPU issue, the testing results will depend on CPU, memory speed and a lot of issues.

3. Only the structured grids will be used.

Questions:

Is there any other motivations to do the speed up studies?

For different structured grids but same grids number, will steady-state simulation give the same testing results (we set the max iteration number)? I ask this because I'd like to know whether I can do the tests and run my simulations at the same time.

flotus1 February 5, 2014 17:55

Quote:

whether I can do the tests and run my simulations at the same time
Do I get this correctly? You want to find out about the parallel speedup of Ansys Fluent for whatever reason,
and now you want to run several parts of the test at the same time?
Like number of cores 1, 2, 4 and 8 all at the same time and later the test with 16 cores? That would be unadvisable.

Quote:

For different structured grids but same grids number, will steady-state simulation give the same testing results
There is still the domain decomposition itself that can make a difference, but the results should be comparable.

Anna Tian February 6, 2014 06:24

Quote:

Originally Posted by flotus1 (Post 473640)
Do I get this correctly? You want to find out about the parallel speedup of Ansys Fluent for whatever reason,
and now you want to run several parts of the test at the same time?
Like number of cores 1, 2, 4 and 8 all at the same time and later the test with 16 cores? That would be unadvisable.

Why I can't do them at the same time? I have the same CPUs.

Btw, will the speed up efficiency also depend on the grids number that I'm running?

flotus1 February 6, 2014 07:26

You can not run them at the same time because it is exactly the influence of parallel load that you are trying to examine. So running the "serial" case of one core in parallel with other simulations will spoil the result.

Lets take a very simple example of a 4-core cpu that is only supplied with one piece of ram, so we only have one memory channel.
Running the test for one core alone, this simulation can use the whole memory bandwith and will be rather fast. That is how it should be done.
If you run one other simulation on two cores at the same time, the simulation on one core will only have a fraction of the memory bandwith available and run slower.
The time taken will depend on how many other simulation you ran at the same time.
The memory bandwith is only an example, there are many possible influences besides the cpu type.

Quote:

Btw, will the speed up efficiency also depend on the grids number that I'm running?
Yes it will. The speedup is usually better with higher cell counts so it usually makes no sense to run a mesh with 10000 cells on 128 cores.

Anna Tian February 6, 2014 12:16

Quote:

Originally Posted by flotus1 (Post 473715)
You can not run them at the same time because it is exactly the influence of parallel load that you are trying to examine. So running the "serial" case of one core in parallel with other simulations will spoil the result.

Lets take a very simple example of a 4-core cpu that is only supplied with one piece of ram, so we only have one memory channel.
Running the test for one core alone, this simulation can use the whole memory bandwith and will be rather fast. That is how it should be done.
If you run one other simulation on two cores at the same time, the simulation on one core will only have a fraction of the memory bandwith available and run slower.
The time taken will depend on how many other simulation you ran at the same time.
The memory bandwith is only an example, there are many possible influences besides the cpu type.


Yes it will. The speedup is usually better with higher cell counts so it usually makes no sense to run a mesh with 10000 cells on 128 cores.


How many cores are suggested to be used by Fluent service per million grids? What's the value you like to choose?

flotus1 February 6, 2014 12:38

I dont know if there is an official recommendation for the minimum number of cells per core by fluent.
I run fluent cases on the maximum number of cores or licenses available regardless of the cell count, except for very simple problems.

Anna Tian February 9, 2014 09:08

Quote:

Originally Posted by flotus1 (Post 473640)
Do I get this correctly? You want to find out about the parallel speedup of Ansys Fluent for whatever reason,
and now you want to run several parts of the test at the same time?
Like number of cores 1, 2, 4 and 8 all at the same time and later the test with 16 cores? That would be unadvisable.

Why it is unadvisable? Because different simulations using different CPUs will somehow interact with each other? How?

flotus1 February 9, 2014 16:33

Quote:

Originally Posted by Anna Tian (Post 474115)
Why it is unadvisable? Because different simulations using different CPUs will somehow interact with each other? How?

You already answered the question yourself.
So did I a few posts ago.

Quote:

Originally Posted by flotus1 (Post 473715)
You can not run them at the same time because it is exactly the influence of parallel load that you are trying to examine. So running the "serial" case of one core in parallel with other simulations will spoil the result.

Lets take a very simple example of a 4-core cpu that is only supplied with one piece of ram, so we only have one memory channel.
Running the test for one core alone, this simulation can use the whole memory bandwith and will be rather fast. That is how it should be done.
If you run one other simulation on two cores at the same time, the simulation on one core will only have a fraction of the memory bandwith available and run slower.
The time taken will depend on how many other simulation you ran at the same time.
The memory bandwith is only an example, there are many possible influences besides the cpu type.



All times are GMT -4. The time now is 00:40.