CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > OpenFOAM Running, Solving & CFD

8x icoFoam speed up with Cufflink CUDA solver library

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree5Likes

Reply
 
LinkBack Thread Tools Display Modes
Old   June 13, 2012, 07:17
Default
  #21
Member
 
Lukasz Miroslaw
Join Date: Dec 2009
Location: Poland
Posts: 64
Rep Power: 7
Lukasz is on a distinguished road
Send a message via Skype™ to Lukasz
I fully agree that communication matters. On high-end HPC clusters we were able to get about only 1.5x acceleration (see report) for a motorbike test with 32M cells (nGPUs vs. nCPUs, where n was a number of threads/GPU cards). This is probably due to communication. Please note, that on CPU GAMG was run.

This is also the reason why we implemented PISO and SIMPLE fully on a GPU card. Now, we can run the whole case on a single GPU card up to 10M cells and the inter-node communication problem is now an intra-node communication (GPU<->CPU on a single node bounded by memory bandwith).
Lukasz is offline   Reply With Quote

Old   June 13, 2012, 08:46
Default
  #22
Senior Member
 
kmooney's Avatar
 
Kyle Mooney
Join Date: Jul 2009
Location: Amherst, MA USA - San Diego, CA USA
Posts: 278
Rep Power: 9
kmooney is on a distinguished road
Quote:
Originally Posted by looyirock View Post
I had gone through the post, The speedup is highly dependent on hardware and the problem being solved. We might even see some variability in the numbers if we can ran the test a few times. One can have a really amazing setup, but a mediocre cluster if the communication is slow between nodes. Please produce some more attachments about the topic for view detail information.

This was just a quick timing comparison. I wasn't trying to nor am I planning on delivering a full benchmark on this stuff.

The point of all of this is that after about 20 minutes of library installations I was able to get a foam solver to run at 1/8th the run time on a relatively old gpu, for free.
kmooney is offline   Reply With Quote

Old   August 12, 2012, 14:24
Default anyone figure out how to compile the testCufflinkFoam application?
  #23
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
Openfoam, CUDA and the nvcc test scripts are all working on Ubuntu 10.04; I'm sure I'm missing something obvious but I can't see an obvious way to compile the testCufflinkFoam application to run the test examples. What am I missing? Thanks.

PS this is all from the GettingStarted page HERE
atg is offline   Reply With Quote

Old   August 12, 2012, 14:29
Default
  #24
Senior Member
 
chegdan's Avatar
 
Daniel P. Combest
Join Date: Mar 2009
Location: St. Louis, USA
Posts: 543
Rep Power: 18
chegdan will become famous soon enough
Quote:
Originally Posted by atg View Post
Openfoam, CUDA and the nvcc test scripts are all working on Ubuntu 10.04; I'm sure I'm missing something obvious but I can't see an obvious way to compile the testCufflinkFoam application to run the test examples. What am I missing? Thanks.

PS this is all from the GettingStarted page HERE
Did you install cusp and try to compile a cusp example? Also, make sure you are using the extend version of open foam.
mm.abdollahzadeh likes this.
__________________
Dan

Find me on twitter @dancombest and LinkedIn
chegdan is offline   Reply With Quote

Old   August 12, 2012, 14:51
Default
  #25
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
OpenFoam 1.6-ext

Yes it is looking pretty much like the result in the getting started document:

(OF:1.6-ext Opt) caelinux@karl-OF:cufflink-library-read-only$ nvcc -o testcusp testcg.cu
/usr/local/cuda/bin/../include/thrust/detail/tuple_transform.h(130): Warning: Cannot tell what pointer points to, assuming global memory space
/usr/local/cuda/bin/../include/thrust/detail/tuple_transform.h(130): Warning: Cannot tell what pointer points to, assuming global memory space
(OF:1.6-ext Opt) caelinux@karl-OF:cufflink-library-read-only$ ./testcusp
Solver will continue until residual norm 0.01 or reaching 100 iterations
Iteration Number | Residual Norm
0 1.000000e+01
1 1.414214e+01
2 1.093707e+01
3 8.949320e+00
4 6.190057e+00
5 3.835190e+00
6 1.745482e+00
7 5.963549e-01
8 2.371135e-01
9 1.152524e-01
10 3.134469e-02
11 1.144416e-02
12 1.824177e-03
Successfully converged after 12 iterations.

Quote:
Originally Posted by chegdan View Post
Did you install cusp and try to compile a cusp example? Also, make sure you are using the extend version of open foam.
atg is offline   Reply With Quote

Old   August 12, 2012, 14:57
Default
  #26
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
For some reason the test cannot find blockMesh, or at least I think that is what is going on:

(OF:1.6-ext Opt) caelinux@karl-OF:testCases$ sudo ./runSerialTests
./runSerialTests: line 3: /bin/tools/RunFunctions: No such file or directory
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_CG/N10
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_CG/N50
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_CG/N100
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_CG/N500
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_CG/N1000
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_CG/N2000
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_DiagPCG/N10
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_DiagPCG/N50
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_DiagPCG/N100
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_DiagPCG/N500
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_DiagPCG/N1000
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_DiagPCG/N2000
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_SmAPCG/N10
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_SmAPCG/N50
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_SmAPCG/N100
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_SmAPCG/N500
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_SmAPCG/N1000
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/cufflink/cufflink_SmAPCG/N2000
./runSerialTests: line 16: runApplication: command not found
./runSerialTests: line 17: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/CG/N10
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/CG/N50
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/CG/N100
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/CG/N500
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/CG/N1000
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/CG/N2000
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/DPCG/N10
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/DPCG/N50
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/DPCG/N100
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/DPCG/N500
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/DPCG/N1000
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/DPCG/N2000
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/GAMG/N10
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/GAMG/N50
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/GAMG/N100
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/GAMG/N500
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/GAMG/N1000
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
/home/caelinux/OpenFOAM/caelinux-1.6-ext/run/cufflinkTest/testCases/OpenFOAM/GAMG/N2000
./runSerialTests: line 34: runApplication: command not found
./runSerialTests: line 35: runApplication: command not found
atg is offline   Reply With Quote

Old   August 12, 2012, 15:05
Default
  #27
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
nvccWmakeAll log:

https://dl.dropbox.com/u/34549456/Link%20to%20make.log
atg is offline   Reply With Quote

Old   August 12, 2012, 15:22
Default
  #28
Senior Member
 
chegdan's Avatar
 
Daniel P. Combest
Join Date: Mar 2009
Location: St. Louis, USA
Posts: 543
Rep Power: 18
chegdan will become famous soon enough
Quote:
Originally Posted by atg View Post

Looking at your error log....I can tell you forgot your lduInterface.H that is discussed on the getting started page. Go there and follow those steps and you should be good.
__________________
Dan

Find me on twitter @dancombest and LinkedIn
chegdan is offline   Reply With Quote

Old   August 12, 2012, 17:03
Default
  #29
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
Sorry I thought that was only required for multi GPU. My mistake.

Thanks Very Much!

Karl
atg is offline   Reply With Quote

Old   August 13, 2012, 01:35
Default
  #30
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
M2090
Quadro600

Six cores of E1280 are better than one CPU core and a Quadro 600 (96 gpu cores). In fact, four CPU cores on the E1280 are faster than six, and either is faster than the Quadro 600.

But with the M2090, 512 CUDA cores (and a Quadro? It said 2 processes and the Quadro was heating up) crush the six core CPU by about five times apparently, at least for these test tasks. 75 seconds vs ~360

I hope this is of some benefit for an incompressible simpleFoam case. Time will tell I suppose.

Thanks Dan and Kyle for your help.

Karl

Last edited by atg; August 13, 2012 at 04:33. Reason: Added Link to Tesla data; prior post was Quadro 600 only
atg is offline   Reply With Quote

Old   August 13, 2012, 05:02
Default
  #31
Senior Member
 
chegdan's Avatar
 
Daniel P. Combest
Join Date: Mar 2009
Location: St. Louis, USA
Posts: 543
Rep Power: 18
chegdan will become famous soon enough
Karl,

I'm glad to help. Just to add a little to your last statement

Quote:
I hope this is of some benefit for an incompressible simpleFoam case. Time will tell I suppose.
From my experience (I may be repeating myself from an earlier post on this thread...but i haven't read it in a while), when you use a solution strategy that uses a relative residual/more outer-iterations approach...the GPU will lose its speed-up. The reason for this is that in solution strategies that rely on the relative residual/more outer-iteration method will require more back and forth data transfer between the host and device (GPU). This back and forth is the bottleneck and it will make speed-up only minor. For transient cases where you need to drive down the residuals (i.e. lots of inner-iterations) then the GPU will be better suited for your problem and you will most likely see better speed-up.
__________________
Dan

Find me on twitter @dancombest and LinkedIn
chegdan is offline   Reply With Quote

Old   August 13, 2012, 05:32
Default
  #32
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Delft, Netherlands
Posts: 919
Rep Power: 17
akidess will become famous soon enough
Quote:
Originally Posted by chegdan View Post
For transient cases where you need to drive down the residuals (i.e. lots of inner-iterations) then the GPU will be better suited for your problem and you will most likely see better speed-up.
I have never run GPU simulations myself, but this statement clashes with every published result I've seen so far.
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
*Help define the OpenFOAM stackexchange Q&A site: http://area51.stackexchange.com/prop...oam-technology
akidess is offline   Reply With Quote

Old   August 13, 2012, 06:39
Default
  #33
Senior Member
 
chegdan's Avatar
 
Daniel P. Combest
Join Date: Mar 2009
Location: St. Louis, USA
Posts: 543
Rep Power: 18
chegdan will become famous soon enough
Quote:
Originally Posted by akidess View Post
...but this statement clashes with every published result I've seen so far.
You will see in my post

Quote:
From my experience...
I'd be interested to see where they show phenomenal speed-up when a solution strategy using the GPU is dominated with data transfer rather than actually solving a Navier-Stokes problem. If you have sources it would be a great thing to share.

steady-state solution strategies that use relative residual convergence criteria with dominant outer-iterations (e.g. one could use simpleFoam) to drive down convergence will have more instances that data will have to be transferred to the GPU. therefore it will spend less time solving the linear system and spend more time transferring things back and forth if it is a large system. Its also highly dependent on your hardware setup.

Quote:
I have never run GPU simulations myself,...
Grab speedIt, grab the symscape plugin, download cufflink and try them all out.
__________________
Dan

Find me on twitter @dancombest and LinkedIn
chegdan is offline   Reply With Quote

Old   August 13, 2012, 07:03
Default
  #34
Senior Member
 
akidess's Avatar
 
Anton Kidess
Join Date: May 2009
Location: Delft, Netherlands
Posts: 919
Rep Power: 17
akidess will become famous soon enough
Dan, as an example have a look at this document that compares the performance of simpleFoam and pisoFoam using a GPU accelerated linear solver: http://am.ippt.gov.pl/index.php/am/article/view/516/196 (or also http://www.slideshare.net/LukaszMiro...el-performance).
__________________
*On twitter @akidTwit
*Spend as much time formulating your questions as you expect people to spend on their answer.
*Help define the OpenFOAM stackexchange Q&A site: http://area51.stackexchange.com/prop...oam-technology
akidess is offline   Reply With Quote

Old   August 13, 2012, 07:30
Default
  #35
Senior Member
 
chegdan's Avatar
 
Daniel P. Combest
Join Date: Mar 2009
Location: St. Louis, USA
Posts: 543
Rep Power: 18
chegdan will become famous soon enough
1) Areal offloads the pressure velocity coupling completely on the GPU, so there is not as much back and forth data transfer (this is how they are getting around the problem I suggested)

2) The first link makes no mention of relative residual and they are using and convergence criteria. So there is no way to know if they were using a strategy of driving down the residual with many outer-iterations with a relative residual convergence criteria like I am suggesting.
__________________
Dan

Find me on twitter @dancombest and LinkedIn
chegdan is offline   Reply With Quote

Old   August 13, 2012, 18:44
Default
  #36
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
Thanks Dan I had read your earlier comment but am just coming to terms with the outer vs inner iteration part. It is just a reflection of my poor grasp of the fundamentals at play here.

Along the CPU/GPU communication line however I should add that the Quadro600 in the above test was only working at 8x, whereas the Tesla was in the 16x slot (I only have one 16x slot on the board). So the Quadro might do better in the faster slot vs. the CPU.

It will be interesting to note how much better PCI 3.0 slots perform, though the overall hardware seems likely to take a quantum leap at about the same time with the Kepler stuff.

I will post some comparative run times when I get a case going in simpleFoam.

Quote:
Originally Posted by chegdan View Post

Karl,

I'm glad to help. Just to add a little to your last statement



From my experience (I may be repeating myself from an earlier post on this thread...but i haven't read it in a while), when you use a solution strategy that uses a relative residual/more outer-iterations approach...the GPU will lose its speed-up. The reason for this is that in solution strategies that rely on the relative residual/more outer-iteration method will require more back and forth data transfer between the host and device (GPU). This back and forth is the bottleneck and it will make speed-up only minor. For transient cases where you need to drive down the residuals (i.e. lots of inner-iterations) then the GPU will be better suited for your problem and you will most likely see better speed-up.
atg is offline   Reply With Quote

Old   September 15, 2012, 13:31
Default
  #37
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
In keeping with Dan's advice to compare the best cpu solver with the best gpu solver, I ran a few tests. On the cavity tutorial re-blockmeshed to 1000x1000 cells, I get a speedup of 1.5 (310 vs 463s) for one hundred timesteps of this:

p solver cufflink SmAPCG preconditioner FDIC
U solver cufflink DiagPBiCGStab
U preconditioner DILU

over a cpu run on single processor of xeon e1280 with:
p: FDIC/GAMG
U: DILU/PBiCG

I am still at the stage where I don't really know the solvers very well, so take it with a grain of salt, but it was all set at the same relative tolerances with 1e-08 for p and 1e-05 for U.

I tried the parallel version of cpu, but it was slower; 491 vs 463s for 4 vs 1 core on the cpu. The gpu is an M2090 Tesla with like 512 cores or something and 6Gb memory I think. I didn't try anything with multiple gpus.

So it would appear that modest speedup is obtained for this example in icoFoam. I do not know why my parallel cpu runs are slower on 4 cores vs 1; I expected it to be at least somewhat faster.
atg is offline   Reply With Quote

Old   September 20, 2012, 11:28
Default
  #38
Senior Member
 
chegdan's Avatar
 
Daniel P. Combest
Join Date: Mar 2009
Location: St. Louis, USA
Posts: 543
Rep Power: 18
chegdan will become famous soon enough
Quote:
Originally Posted by atg View Post
p solver cufflink SmAPCG preconditioner FDIC
U solver cufflink DiagPBiCGStab
U preconditioner DILU
For this setup, though you set the preconditioner to DILU, it will actually be Diagonal preconditioned. You might get some more speedup is you use the Ainv preconditioned PBiCGStab with AinvPBiCGStab as your solver name in cufflink.

Edit: the preconditioner FDIC for the SmAPCG will do not actually use the FDIC preconditioner, it will use the Smoothed Aggregate AMG preconditioner. I guess I need to change this in the examples to be more straight forward. The preconditioners aren't runtime selectable with the preconditoner name, but rather with the name of the solver.
__________________
Dan

Find me on twitter @dancombest and LinkedIn

Last edited by chegdan; September 20, 2012 at 11:59. Reason: more info added
chegdan is offline   Reply With Quote

Old   September 20, 2012, 11:50
Default
  #39
atg
Member
 
Karl
Join Date: Jan 2011
Posts: 36
Rep Power: 6
atg is on a distinguished road
Quote:
Originally Posted by chegdan View Post
For this setup, though you set the preconditioner to DILU, it will actually be Diagonal preconditioned. You might get some more speedup is you use the Ainv preconditioned PBiCGStab with AinvPBiCGStab as your solver name in cufflink.
OK I'll try that. It looked like the cufflink solvers were specifying their own preconditioners for most of the runs I did, but I missed that one apparently. Thanks.
atg is offline   Reply With Quote

Old   November 5, 2012, 20:07
Default
  #40
Member
 
Join Date: Apr 2010
Posts: 61
Rep Power: 7
alquimista is on a distinguished road
Hello everyone,

I have been using cufflink for some applications for incompressible flow with good results using one Tesla C2050. However I don't obtain speedup using more than one GPU with the parallel versions of the cufflink solvers and unexpectedly using 2 GPUs results slower than the serial one.

I discussed some issues in the cufflink-users linked in the code page.

Since I tested the same behavior in different machines and GPUs I wanna share some test case with you if anyone would check it or have experienced also that. The test case is the same provided by Dan with testCufflinkFoam application. I have fixed some wrong values in some blockMeshDict and tolerances and maxIter for the Parallel solvers since they were different.

The test case can be reproduced running the same scripts as the original folder:

./runParallelGPU
./runSerialGPU
./runGetTimes

I attach the case here and some figures for CG and diagPCG tested in a GeForce GTX 690. smAPCG for some reason the number of iterations don't fix for 1GPU and 2GPU and I must to check it first.

Thank you very much. I find this library useful, for my experience there are situation where one can't use GAMG solver for p, especially in applications with bad quality meshes so I guess that the speed up can't be always comparable with CG since it should be robust. There the speed up is considerable.

Regards
Attached Files
File Type: pdf CG.pdf (5.0 KB, 32 views)
File Type: pdf diagPCG.pdf (5.0 KB, 23 views)
alquimista is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Does it metter to increase RAM for solver speed?? raima Main CFD Forum 1 February 28, 2008 11:47
compressible two phase flow in CFX4.4 youngan CFX 0 July 1, 2003 23:32
CFX 5.5 Roued CFX 1 October 2, 2001 16:49
Setting a B.C using UserFortran in 4.3 tokai CFX 10 July 17, 2001 16:25
i wanna speed up my solver! Maciej Matyka Main CFD Forum 8 November 28, 2000 14:52


All times are GMT -4. The time now is 22:27.