CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > FLUENT

Fluent speed up studies

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By flotus1

Reply
 
LinkBack Thread Tools Display Modes
Old   February 5, 2014, 08:29
Question Fluent speed up studies
  #1
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Hi,

Have someone done any speed up studies for Fluent? For example, I have 16 simulations with 10 millions grids to run and I only have 32 processors. In this case, shall I run 4 jobs at the same time with 8 processors for each or I'd better run 8 jobs at the same time with 4 processors for each? Or 2 jobs with 16 processors for each? Which way can let me finish all the simulations earlier? Or this also depends on the kind of CPU I use? Any indications on that?

This should be a quite general topic discussed several times before. But I didn't find any threads on that. Could someone give a link to them?
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 5, 2014, 09:01
Default
  #2
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
The theoretical linear speedup can never be reached due to communication losses and parts in the code that can not be executed in parallel.
So it is fastest to run as many cases at the same time as possible as long as you dont run out of memory.
flotus1 is offline   Reply With Quote

Old   February 5, 2014, 10:52
Question
  #3
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
The theoretical linear speedup can never be reached due to communication losses and parts in the code that can not be executed in parallel.
So it is fastest to run as many cases at the same time as possible as long as you dont run out of memory.
Ok. Your answer helps me. I think I should refer to something else.

I once saw a speed up study results in an academic institute. It shows that when the processors number jump from 8 to 10, the simulation won't be as much speed up as when the processors number jump from 6 to 8. They even plot a curve to show that there is a kink at the number of 8. Before 8, it is quite effective to increase the number of processors. But after 8, it won't be that effective. Is the anywhere which gives a this kind of speed up study methodology so that I can follow or see the testing results directly?

Sorry that this area is quite new to me, so I'm not sure whether I'm using the correct technical term to describe it.
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 5, 2014, 13:24
Default
  #4
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Speedup is the correct term for this.

The procedure is quite simple.
You run the case of interest on a single core and take the time.
Then you increase the number of cores and run the same case again. For a case run on n cores we will name the time taken T_n.

Now the speedup S_n for n cores is simply
S_n=\frac{T_1}{T_n}
The ideal speedup would be a straight line with slope 1.
Usually, due to the losses already mentioned, the real curve will be below this line.

Personally, I find the parrallel efficiency more intuitive
E_n=\frac{T_1}{n \cdot T_n}

Should have checked first, I just wrote down part of the wikipedia article on this topic. But it still has some more information so its worth visiting.

Quote:
Is the anywhere which gives a this kind of speed up study methodology so that I can follow or see the testing results directly?
I dont quite get what you want. Please rephrase.
flotus1 is offline   Reply With Quote

Old   February 5, 2014, 16:14
Question
  #5
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Speedup is the correct term for this.

The procedure is quite simple.
You run the case of interest on a single core and take the time.
Then you increase the number of cores and run the same case again. For a case run on n cores we will name the time taken T_n.

Now the speedup S_n for n cores is simply
S_n=\frac{T_1}{T_n}
The ideal speedup would be a straight line with slope 1.
Usually, due to the losses already mentioned, the real curve will be below this line.

Personally, I find the parrallel efficiency more intuitive
E_n=\frac{T_1}{n \cdot T_n}

Should have checked first, I just wrote down part of the wikipedia article on this topic. But it still has some more information so its worth visiting.


I dont quite get what you want. Please rephrase.
Thank you for your answer, Flotus1. That's very helpful. I meant could I just use the testing results that other people did or I need to do the testing by myself? Does it also depend on the algorithm I choose? For different grids but same grids number, will steady-state simulation give the same testing results (we set the max iteration number)? Does the test result depend on the CPUs? For example, we have 32 CPU A and another group have 32 CPU B. Will we obtain the same testing results?
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 5, 2014, 16:53
Default
  #6
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
I meant could I just use the testing results that other people did or I need to do the testing by myself?
That also depends on WHY these results are so important for you.
The only time I did such an analysis it was to compare the parallel efficiency of an in-house CFD code with a commercial one.

Apart from that, as you already supposed, there are many factors affecting the result of such an analysis.
Impossible to list them all, so lets stick with the conclusion that group A and group B from you example will definitely not get the same result.
They might not even get the same result if they were using the same cpus because there could still be many other issues affecting the parallel performance.

Quote:
For different grids but same grids number, will steady-state simulation give the same testing results
Not necessarily. Imagine an all-hex mesh and compare it to a polyhedral mesh.
The polyhedral cells have a higher number of faces at the interfaces between the partitions, so the performance loss due to communication will be higher.
flotus1 is offline   Reply With Quote

Old   February 5, 2014, 17:16
Question
  #7
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
That also depends on WHY these results are so important for you.
The only time I did such an analysis it was to compare the parallel efficiency of an in-house CFD code with a commercial one.

Apart from that, as you already supposed, there are many factors affecting the result of such an analysis.
Impossible to list them all, so lets stick with the conclusion that group A and group B from you example will definitely not get the same result.
They might not even get the same result if they were using the same cpus because there could still be many other issues affecting the parallel performance.


Not necessarily. Imagine an all-hex mesh and compare it to a polyhedral mesh.
The polyhedral cells have a higher number of faces at the interfaces between the partitions, so the performance loss due to communication will be higher.
Sorry that I didn't tell the conditions clearly.

1. Software is fixed to be Fluent.

2. I understand about the CPU issue, the testing results will depend on CPU, memory speed and a lot of issues.

3. Only the structured grids will be used.

Questions:

Is there any other motivations to do the speed up studies?

For different structured grids but same grids number, will steady-state simulation give the same testing results (we set the max iteration number)? I ask this because I'd like to know whether I can do the tests and run my simulations at the same time.
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 5, 2014, 17:55
Default
  #8
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
whether I can do the tests and run my simulations at the same time
Do I get this correctly? You want to find out about the parallel speedup of Ansys Fluent for whatever reason,
and now you want to run several parts of the test at the same time?
Like number of cores 1, 2, 4 and 8 all at the same time and later the test with 16 cores? That would be unadvisable.

Quote:
For different structured grids but same grids number, will steady-state simulation give the same testing results
There is still the domain decomposition itself that can make a difference, but the results should be comparable.
flotus1 is offline   Reply With Quote

Old   February 6, 2014, 06:24
Question
  #9
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Do I get this correctly? You want to find out about the parallel speedup of Ansys Fluent for whatever reason,
and now you want to run several parts of the test at the same time?
Like number of cores 1, 2, 4 and 8 all at the same time and later the test with 16 cores? That would be unadvisable.
Why I can't do them at the same time? I have the same CPUs.

Btw, will the speed up efficiency also depend on the grids number that I'm running?
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 6, 2014, 07:26
Default
  #10
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
You can not run them at the same time because it is exactly the influence of parallel load that you are trying to examine. So running the "serial" case of one core in parallel with other simulations will spoil the result.

Lets take a very simple example of a 4-core cpu that is only supplied with one piece of ram, so we only have one memory channel.
Running the test for one core alone, this simulation can use the whole memory bandwith and will be rather fast. That is how it should be done.
If you run one other simulation on two cores at the same time, the simulation on one core will only have a fraction of the memory bandwith available and run slower.
The time taken will depend on how many other simulation you ran at the same time.
The memory bandwith is only an example, there are many possible influences besides the cpu type.

Quote:
Btw, will the speed up efficiency also depend on the grids number that I'm running?
Yes it will. The speedup is usually better with higher cell counts so it usually makes no sense to run a mesh with 10000 cells on 128 cores.
Anna Tian likes this.
flotus1 is offline   Reply With Quote

Old   February 6, 2014, 12:16
Question
  #11
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
You can not run them at the same time because it is exactly the influence of parallel load that you are trying to examine. So running the "serial" case of one core in parallel with other simulations will spoil the result.

Lets take a very simple example of a 4-core cpu that is only supplied with one piece of ram, so we only have one memory channel.
Running the test for one core alone, this simulation can use the whole memory bandwith and will be rather fast. That is how it should be done.
If you run one other simulation on two cores at the same time, the simulation on one core will only have a fraction of the memory bandwith available and run slower.
The time taken will depend on how many other simulation you ran at the same time.
The memory bandwith is only an example, there are many possible influences besides the cpu type.


Yes it will. The speedup is usually better with higher cell counts so it usually makes no sense to run a mesh with 10000 cells on 128 cores.

How many cores are suggested to be used by Fluent service per million grids? What's the value you like to choose?
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 6, 2014, 12:38
Default
  #12
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
I dont know if there is an official recommendation for the minimum number of cells per core by fluent.
I run fluent cases on the maximum number of cores or licenses available regardless of the cell count, except for very simple problems.
flotus1 is offline   Reply With Quote

Old   February 9, 2014, 09:08
Question
  #13
Senior Member
 
Anna Tian's Avatar
 
Meimei Wang
Join Date: Jul 2012
Posts: 494
Rep Power: 7
Anna Tian is on a distinguished road
Quote:
Originally Posted by flotus1 View Post
Do I get this correctly? You want to find out about the parallel speedup of Ansys Fluent for whatever reason,
and now you want to run several parts of the test at the same time?
Like number of cores 1, 2, 4 and 8 all at the same time and later the test with 16 cores? That would be unadvisable.
Why it is unadvisable? Because different simulations using different CPUs will somehow interact with each other? How?
__________________
Best regards,
Meimei
Anna Tian is offline   Reply With Quote

Old   February 9, 2014, 16:33
Default
  #14
Senior Member
 
flotus1's Avatar
 
Alex
Join Date: Jun 2012
Location: Germany
Posts: 1,104
Rep Power: 19
flotus1 will become famous soon enoughflotus1 will become famous soon enough
Quote:
Originally Posted by Anna Tian View Post
Why it is unadvisable? Because different simulations using different CPUs will somehow interact with each other? How?
You already answered the question yourself.
So did I a few posts ago.

Quote:
Originally Posted by flotus1 View Post
You can not run them at the same time because it is exactly the influence of parallel load that you are trying to examine. So running the "serial" case of one core in parallel with other simulations will spoil the result.

Lets take a very simple example of a 4-core cpu that is only supplied with one piece of ram, so we only have one memory channel.
Running the test for one core alone, this simulation can use the whole memory bandwith and will be rather fast. That is how it should be done.
If you run one other simulation on two cores at the same time, the simulation on one core will only have a fraction of the memory bandwith available and run slower.
The time taken will depend on how many other simulation you ran at the same time.
The memory bandwith is only an example, there are many possible influences besides the cpu type.
flotus1 is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Two questions on Fluent UDF Steven Fluent UDF and Scheme Programming 4 September 20, 2013 16:30
problem in using parallel process in fluent 14 aydinkabir88 FLUENT 1 July 10, 2013 02:00
LSR (Land speed record) car simulation on FLUENT Maxime31850 FLUENT 2 May 1, 2013 11:15
few quesions on ANSYS ICEMCFD and FLUENT Prakash.Paudel ANSYS 0 August 12, 2010 12:07
Parametric Studies Using Fluent 6.1 Jim FLUENT 0 April 12, 2003 10:22


All times are GMT -4. The time now is 14:00.