OpenFOAM benchmarks on various hardware

wkernkamp · June 6, 2022, 22:59

Quote:

Originally Posted by Simbelmynë

You cannot make comparisons like that. There is a huge difference between some systems with identical theoretical bandwidth.

Yes, but only when there is something wrong in the setup so that the possible bandwidth is not achieved. Otherwise the bandwidth is a key factor that translates directly into OpenFOAM performance.

What I was saying is that his performance is in the ball park correct, except that considering the more modern cpu and higher clock, I would expect a bit better. Maybe it is thermal throttling, maybe WSL2. Maybe his cpu was having a slow day. I don't know.

masb · June 7, 2022, 01:23

Quote:

Originally Posted by masb

AMD Threadripper 1950X under WSL Ubuntu 20.04

# cores Wall time (s):
------------------------
Meshing Times:
1 1056.81
2 701.65
4 496.73
6 393.98
8 381.59
10 360.49
12 339.13
14 323.9
16 343.45

Flow Calculation:
1 822.07
2 498.66
4 350.45
6 326.8
8 324.14
10 319.38
12 314.45
14 315.73
16 324.57

Ubuntu 20.04, no WSL:

# cores Wall time (s):
------------------------
1 2 4 6 8 10 12 14 16
Meshing Times:
1 1026.86
2 697.82
4 397
6 294.65
8 251.36
10 231.26
12 210.35
14 201.72
16 207.07
Flow Calculation:
1 852.77
2 510.34
4 220.9
6 181.68
8 160.85
10 153.79
12 144.88
14 145.64
16 143.53

flotus1 · June 7, 2022, 03:04

That's A LOT of performance left on the table with WSL. I wonder if it can be tweaked in any way to yield better results, or if that's just price for convenience.

wkernkamp · June 7, 2022, 20:08

Seems that WSL is OK for 1 or 2 cores, but looses performance as you go beyond that. Is there some limitation on the amount of resource that gets allocated to WSL (looks like 50% in your case masb)

Simbelmynë · June 9, 2022, 17:52

Quote:

Originally Posted by wkernkamp

Yes, but only when there is something wrong in the setup so that the possible bandwidth is not achieved. Otherwise the bandwidth is a key factor that translates directly into OpenFOAM performance.

And I am saying this is not true.

As a general indicator, bandwidth is by far the most important metric for CFD.

However, recent CPUs from AMD (and possibly Intel) has shown that bandwidth is not the entire story.

Check out results from 5800X3D for instance. It is really good in terms of performance per bandwidth.

It started to be visible with Zen 2, most likely since Intel just produced minor upgrades to new desktop CPUs for several years.

wkernkamp · June 12, 2022, 20:03

Quote:

Originally Posted by Simbelmynë

5800X3D, 2 x 8 GB DDR4 Rank1 @ 3200 MT/s (14-14-14-14-28,1T)
OFv9, OpenSUSE Tumbleweed, GCC 11.2, kernel 5.17.4

Code:

 cores       Simulation     Meshing
#                (s)      (min.sec)
1             314.21        12m23s
2             201.98        8m21s
4             149.98        5m05s
6             138.55        4m02s

Will update if I manage to push the memory and IF to 1800 MHz.

EDIT:
2 x 8 GB DDR4 Rank1 @3800 MT/s (16-16-16-16-32, 1T)

Code:

cores    Simulation         Meshing
#           (s)             (min.sec)
1            304              12m14
2            188              8m12
4            135              4m58
6            124              3m55
8            122              3m28

The 5800 itself gets almost proportionally better with bandwidth.

Quote:

Originally Posted by wkernkamp

2xE5-2697 v2 16x 8GB DDR-1866 MHz OF v2112

Flow:
20 85.73
22 84.44
24 84.02

The memory bandwith of the 2xE5-2687v2 is just under twice the bandwith of your 5800X. The performance ratio is 122/84=1.4 so there has been improvement probably related to cache organization and cache capacity. The more cache can be utilized, the more your effective bandwidth goes up. So the improvement you are talking about is 40% in ten years.

Simbelmynë · June 13, 2022, 01:48

Quote:

Originally Posted by wkernkamp

The memory bandwith of the 2xE5-2687v2 is just under twice the bandwith of your 5800X. The performance ratio is 122/84=1.4 so there has been improvement probably related to cache organization and cache capacity. The more cache can be utilized, the more your effective bandwidth goes up. So the improvement you are talking about is 40% in ten years.

I think more recent CPUs should be compared as well.

Quote:

Originally Posted by Simbelmynë

5800X3D, 2 x 8 GB DDR4 Rank1 @ 3200 MT/s (14-14-14-14-28,1T)
OFv9, OpenSUSE Tumbleweed, GCC 11.2, kernel 5.17.4

Code:

 cores       Simulation     Meshing
#                (s)      (min.sec)
1             314.21        12m23s
2             201.98        8m21s
4             149.98        5m05s
6             138.55        4m02s

Here are some CPUs from 2017. All of them have Rank 2 memory (compared to rank 1 of the 5800X3D). If we look at the 3200 MT/s results then the first two HEDT CPUs have double theoretical bandwidth and the 8700k has identical theoretical bandwidth.

Quote:

Originally Posted by Simbelmynë

7940X, 32 (4x8) GB 3200 MHz RAM, CentOS 7.x, kernel 3.10.0

Code:

# cores   Wall time (s):
------------------------
1 764.36
2 419.98
4 233.26
6 188.29
8 169
12 160.28
14 168.73

Threadripper 1950X, 32 (4x8) GB 3200 MHz RAM, CentOS 7.x, kernel 4.14.5 (SMT on)

Code:

# cores   Wall time (s):
------------------------
1 827.21
2 465.01
4 235.17
6 198.81
8 170.73
12 154.26
16 154.9

8700K, 32 (4x8) GB 3200 MHz RAM, Mint 18.3, kernel 4.13.0

Code:

# cores   Wall time (s):
------------------------
1 531.44
2 312.15
4 249.55
 6 247.83

Clearly there is a huge improvement where bandwidth is not the only answer. Memory latency and cache size likely plays an important role as well.

If you wish to compare HEDT with HEDT then look at the results from the 3990X. This also gives an indication of how good the architecture is even if it is one gen older compared to the 5800X3D.

Quote:

Originally Posted by Geon-Hong

My testing environment is as follows.

- CPU: AMD Ryzen Threadripper 3990x
- RAM: 128GB (32GB x 4 / DDR4 / 2,666MHz)
- M/B : TRX40 (Gigabyte TRX40 AORUS Pro Wifi)
- SSD: SAMSUNG 1TB M.2
- OF : OpenFOAM-v2006
(function objects for generating stream lines were deactivated)

And the results are here:

Code:

# cores   Wall time (s):
------------------------
1      620.9
2      355.72
4      177.92
8      110.08
16     66
24     66.08
32     62.7
40     63.79
48     63.18
56     63.97
64     63.11

As you can see, the parallel performance was saturated around 16 cores.

Many thanks.

With similar architecture and a huge cache then bandwidth is king.

wkernkamp · June 14, 2022, 03:35

I think Geon-Hong misstated his configuration. He must have 8 channels active. There is a comparable threadripper 3960x in the data. It's single core performance is better than Geon-Hong's, but he is bandwidth limited at 93 seconds. That one has four channels:

Quote:

Originally Posted by spwater

Here is my result. Newlt configured workstation with Threadripper 3960x, 3.8 GHz 24C, 64 G memory (4 channel)

# cores Wall time (s):
------------------------
1 550.49
2 299.15
4 161.65
6 120.55
8 101.56
12 99.13
16 93.74
20 93.71
24 93.65

wkernkamp · June 14, 2022, 05:05

I was wrong about the 3990x: it has only 4 memory channels.

8x1866 = 14928 MT/s for E5-2697v2(x2)
4x3200 = 12800 MT/s for 3960x
2x3200 = 6400 MT/s for 5800X3D
4x2666 = 10666 MT/s for 3990x

MT to complete benchmark:

Code:

CPU        DIMM  CH     MT/s    Benchm.     MT       
E5-2697v2  1866   8    14928  x   84s  =  1253952
3960x      3200   4    12800  x   93s  =  1190400
5800X3D    3200   2     6400  x  139s  =   889600
3990x      2666   4    10666  x   63s  =   671832

E5-2697v2 = 1.41 x more MT to complete than 5800X3D
3960x = 1.33 x more MT to complete than 5800X3D 3990x = 1.32 x fewer MT to complete than 5800X3D

Level 3 Caches are:

Code:



CPU        Cache  Cores   Cache per   Work per
                     at Sat. Core at Sat. Core at Sat.

E5-2697v2   60 MB    24      2.5 MB      4.1%
3960x      128 MB    16      8   MB      6.2%
5800X3D     96 MB     6     16   MB     16.7%
3990x      256 MB    32      8   MB      3.1%

Simbelmynë · June 14, 2022, 11:11

@wkernkamp

I like the idea of total MT to run the benchmark. Even if we have no idea what the actual bandwidth usage was during the simulation, this at least gives a relation that is based on theoretical bandwidth as well as actual simulation time. It also illustrates the, sometimes subtle, differences between different architectures.

I was surprised by the large difference between the 3960X and 3990X, they both have the same L3 per core and the same architecture. I would have guessed that the 3960X is faster due to the faster memory, but there may be other factors also in play here. My guess is on RAM timings and perhaps also on rank as well as on the Linux kernel being used.

Kailee71 · June 14, 2022, 11:28

Quote:

Originally Posted by flotus1

That's A LOT of performance left on the table with WSL. I wonder if it can be tweaked in any way to yield better results, or if that's just price for convenience.

I did some rough comparisons with respect to VMs/containers. LXC was the clear winner (using proxmox) which cost only a couple of % in performance when compared to bare-metal. Next was VMWare which did a surprisingly good job, and was very nicely tweakable through the GUI. Performance almost on par with Proxmox/LXC, with a loss of 5-7%. Behind that came TrueNAS scale (KVM) but this really suffered from the NFS implementation (ganesha performance really sucks at the moment, but I understand why Scale uses it). The pricetag was somewhere around 15% if I remember correctly.

Way behind (not just on a different field, but in a different park) came WSL. Admittedly, this was about a year ago and I understand stuff probably has moved along, but it was clearly not a viable alternative unless you're just interested in tinkering.

Out of my 60 cores total, 20 live on my VMWare (data-)server which runs TrueNAS Core (4 cores) for the data, and a compute VM with 16 cores, 32 cores on a dedicated 4-socket bare-metal compute node, and a further 8 in my workstation. This is a compromise that works surprisingly well in a 10Gb environment.

Sorry for the anecdotal-only data. I'll try to find actual numbers.

Kai.

AlexKaz · June 14, 2022, 22:10

Quote:

Originally Posted by AlexKaz

Dual e5 2683v4, JGINYUE X99-D8 Server from Aliexpress, DDR4 RDIMM 2133 8x8 default timings
v1806, Linux Mint 19.3

HT on, NUMA off, CoD off

Code:

cores    speedup mesh     speedup flow     mesh sec.    flow sec    power
1     1         1          1649.57    1256.06  94.77
2     1.48    1.782    1117.49     705.03    97.73
4     2.78    4.034     593.14      311.35    111.14
6     3.63    5.960     454.04      210.75    122.62
8     4.42    7.524     372.84      166.95    129.69
12    5.31    9.708     310.83     129.38    147.65
16    6.07    11.23     271.89     111.89    161.83
20    6.66    11.98     247.87     104.88    175.18
24    7.76    12.52     212.62     100.29    186.94
28    7.96    12.62     207.12     99.53      198.46
30    7.19    12.55     229.57     100.07    203.85

HT off, NUMA on, CoD on

Code:

cores    speedup mesh     speedup flow     mesh sec.        flow sec
1    1            1         1649.57   1256.06
2                
4                
6                
8                
12                
16    6.47      14.09    254.92    89.17
20    7.15      15.40    230.72    81.56
24    8.41      16.19    196.11    77.57
28    8.55      16.62    193.05    75.59
30    7.69      15.67    214.58    80.18

Quote:

Originally Posted by AlexKaz

After reset BIOS to default settings, ht on, numa on, cod off, timings 12-11-11-24...

Code:

cores    speedup speedup flow  mesh sec.    flow sec    power
1.00    0.88    0.81    1455.79    1017.61    88.44
2.00                    
4.00                    
6.00                    
8.00                    
12.00                    
16.00    5.79    11.37    251.33    89.52    166.46
20.00    6.40    12.40    227.45    82.08    179.20
24.00    7.51    13.04    193.94    78.06    191.52
28.00    7.70    13.21    189.06    77.03    204.56
30.00    6.99    13.17    208.26    77.26    208.33

After some optimizations, dual 2683v4 run 32-threads solution with 67-68 seconds. HT on, Numa on, COD on, 2133 2 rank 8 dimms, foam v1812 (for v2112 ~ the same). I think, mainly reason in Numa on and the most early microcode for CPUID 406F1 0x0B00000B.

wkernkamp · June 14, 2022, 22:42

Can you publish the full curve for the optimized machine. By the way, you should use 2400 MHz RDIMMs for best performance. I am interested in the result for 24 cores for comparison to the 2xE5-2697v2.

AlexKaz · June 15, 2022, 03:19

Quote:

Originally Posted by wkernkamp

Can you publish the full curve for the optimized machine. By the way, you should use 2400 MHz RDIMMs for best performance. I am interested in the result for 24 cores for comparison to the 2xE5-2697v2.

Sorry, in my case it does not running at 2400 with 8 dimms. Only 7 dimms are working with 2400. It is a such silicone lottery for used cpus

AlexKaz · June 15, 2022, 11:24

I can add only times for 2133, 2 rank, 13-12-12-....
1 1535.27 1098.81
2 1018.75 550.63
4 573.74 257.45
8 364 135.52
10 339.37 101.29
12 321.41 97.4
14 266.07 94.89
16 258.09 82.39
18 237.39 84.1
20 210.51 75.66
22 236.61 78.39
24 200.13 71.75
26 213.59 76.73
28 186.62 69.07
30 189.12 73.23
31 195.98 70.99
32 182.98 68.03

wkernkamp · June 15, 2022, 15:53

Thanks for posting. Interesting that there is quite a bit of fluctuation up and down as the number of cores goes up.

DVSoares · June 16, 2022, 07:58

Hey guys,

Kudos to all for keeping this thread active. I am looking to (finally) replace my Galago Ultrapro bought in 2014 - have been using it until it gets too close to be fubar, decided to run the benchmark on it to get a sense of upgrade with today's options.

System has an Intel(R) Core(TM) i7-4750HQ (clock 2GHz - 3.2GHz), data from lscpu:

Code:

Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
    CPU family:          6
    Model:               70
    Thread(s) per core:  2
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    CPU max MHz:         3200.0000
    CPU min MHz:         800.0000
...
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    1 MiB (4 instances)
  L3:                    6 MiB (1 instance)
  L4:                    128 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-7

Memory is DDR3 1600MHz 2x4GB in dual channel, supported by that large L4 cache. Bench results are:

Code:

Meshing Times:
1 1522.67
2 971.47
3 740.59
4 584.75
Flow Calculation:
1 914.75
2 512.87
3 236.5
4 363.65

Cache hierarchy plays a central role in guaranteeing cores are properly fed and saturated with correct data (increased prefetching performance, etc.) - see how this cpu gets best fed with 3 threads, showcasing that no rule is 100% applicable to each cpu, in terms of OF performance.

Now moving to some of these DDR5 equipped notebooks with a reasonable gpu and let this guy here rest in pieces

Cheers

flotus1 · June 17, 2022, 05:49

Sorry for barging right into the middle of this conversation, but the benchmark running faster on 3 cores than on 4 cores on a laptop can have so many other reasons. "Cache hierarchy" would be way down on my list for checking potential causes.

thread placement, especially since Hyperthreading is enabled
Thermal throttloing
TDP throttling
Background processes
General variance of benchmark results
...
Anything related to CPU caches

DVSoares · June 17, 2022, 10:07

Hey flotus1, your comments are always most welcome, no need to apologize

I’ve repeated the runs at least 5 times, without even X11 running and in separate (in order to control temperatures), results didn’t vary more than 5% - just took the last run and put here.

At the end of the day, one has to assess the entire platform (hardware and host software) - Simbelmynë’s last post is all about that too.

I confess that laptop still serves my coding needs very well (no local compiling/running on it though) but it’s time has come

wkernkamp · June 20, 2022, 12:31

Quote:

Originally Posted by DVSoares

Hey guys,

Kudos to all for keeping this thread active. I am looking to (finally) replace my Galago Ultrapro bought in 2014 - have been using it until it gets too close to be fubar, decided to run the benchmark on it to get a sense of upgrade with today's options.

System has an Intel(R) Core(TM) i7-4750HQ (clock 2GHz - 3.2GHz), data from lscpu:........
Cheers

Your machine is very interesting for the current discussion, because it has an exceptionally large cache. If we analyze the number of transactions required to complete the benchmark same as I did above, we get:

Code:

CPU         DIMM MT/s  Channels MT/s    Benchm.      MT
i7-4750HQ     1600        2     3200    236.5s    756800

The low value of required transactions to complete (MT) is in line with the modern "large cache" AMD cpus. Nice confirmation of the effect of cache on benchmark completion from an older cpu.

June 7, 2022, 03:04		#523
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,400 Rep Power: 47	That's A LOT of performance left on the table with WSL. I wonder if it can be tweaked in any way to yield better results, or if that's just price for convenience. masb likes this.

June 7, 2022, 20:08		#524
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 339 Rep Power: 12	Seems that WSL is OK for 1 or 2 cores, but looses performance as you go beyond that. Is there some limitation on the amount of resource that gets allocated to WSL (looks like 50% in your case masb) masb likes this.

June 14, 2022, 05:05		#529
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 339 Rep Power: 12	I was wrong about the 3990x: it has only 4 memory channels. 8x1866 = 14928 MT/s for E5-2697v2(x2) 4x3200 = 12800 MT/s for 3960x 2x3200 = 6400 MT/s for 5800X3D 4x2666 = 10666 MT/s for 3990x MT to complete benchmark: Code: CPU DIMM CH MT/s Benchm. MT E5-2697v2 1866 8 14928 x 84s = 1253952 3960x 3200 4 12800 x 93s = 1190400 5800X3D 3200 2 6400 x 139s = 889600 3990x 2666 4 10666 x 63s = 671832 E5-2697v2 = 1.41 x more MT to complete than 5800X3D 3960x = 1.33 x more MT to complete than 5800X3D 3990x = 1.32 x fewer MT to complete than 5800X3D Level 3 Caches are: Code: CPU Cache Cores Cache per Work per at Sat. Core at Sat. Core at Sat. E5-2697v2 60 MB 24 2.5 MB 4.1% 3960x 128 MB 16 8 MB 6.2% 5800X3D 96 MB 6 16 MB 16.7% 3990x 256 MB 32 8 MB 3.1% Last edited by wkernkamp; June 20, 2022 at 12:34. Reason: Added x2 for dual E5-2697v2

June 14, 2022, 11:11		#530
Simbelmynë Senior Member Join Date: May 2012 Posts: 548 Rep Power: 15	@wkernkamp I like the idea of total MT to run the benchmark. Even if we have no idea what the actual bandwidth usage was during the simulation, this at least gives a relation that is based on theoretical bandwidth as well as actual simulation time. It also illustrates the, sometimes subtle, differences between different architectures. I was surprised by the large difference between the 3960X and 3990X, they both have the same L3 per core and the same architecture. I would have guessed that the 3960X is faster due to the faster memory, but there may be other factors also in play here. My guess is on RAM timings and perhaps also on rank as well as on the Linux kernel being used. wkernkamp likes this.

June 15, 2022, 11:24		#535
AlexKaz New Member Alexander Kazantcev Join Date: Sep 2019 Posts: 23 Rep Power: 6	I can add only times for 2133, 2 rank, 13-12-12-.... 1 1535.27 1098.81 2 1018.75 550.63 4 573.74 257.45 8 364 135.52 10 339.37 101.29 12 321.41 97.4 14 266.07 94.89 16 258.09 82.39 18 237.39 84.1 20 210.51 75.66 22 236.61 78.39 24 200.13 71.75 26 213.59 76.73 28 186.62 69.07 30 189.12 73.23 31 195.98 70.99 32 182.98 68.03 wkernkamp likes this.

June 14, 2022, 22:42		#533
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 339 Rep Power: 12	Can you publish the full curve for the optimized machine. By the way, you should use 2400 MHz RDIMMs for best performance. I am interested in the result for 24 cores for comparison to the 2xE5-2697v2.

June 15, 2022, 15:53		#536
wkernkamp Senior Member Will Kernkamp Join Date: Jun 2014 Posts: 339 Rep Power: 12	Thanks for posting. Interesting that there is quite a bit of fluctuation up and down as the number of cores goes up.

June 16, 2022, 07:58	System76 Galago Ultrapro (2014 Laptop)	#537
DVSoares New Member Daniel Join Date: Jun 2010 Posts: 12 Rep Power: 15	Hey guys, Kudos to all for keeping this thread active. I am looking to (finally) replace my Galago Ultrapro bought in 2014 - have been using it until it gets too close to be fubar, decided to run the benchmark on it to get a sense of upgrade with today's options. System has an Intel(R) Core(TM) i7-4750HQ (clock 2GHz - 3.2GHz), data from lscpu: Code: Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz CPU family: 6 Model: 70 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 1 CPU max MHz: 3200.0000 CPU min MHz: 800.0000 ... Caches (sum of all): L1d: 128 KiB (4 instances) L1i: 128 KiB (4 instances) L2: 1 MiB (4 instances) L3: 6 MiB (1 instance) L4: 128 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Memory is DDR3 1600MHz 2x4GB in dual channel, supported by that large L4 cache. Bench results are: Code: Meshing Times: 1 1522.67 2 971.47 3 740.59 4 584.75 Flow Calculation: 1 914.75 2 512.87 3 236.5 4 363.65 Cache hierarchy plays a central role in guaranteeing cores are properly fed and saturated with correct data (increased prefetching performance, etc.) - see how this cpu gets best fed with 3 threads, showcasing that no rule is 100% applicable to each cpu, in terms of OF performance. Now moving to some of these DDR5 equipped notebooks with a reasonable gpu and let this guy here rest in pieces Cheers Simbelmynë likes this.

June 17, 2022, 05:49		#538
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,400 Rep Power: 47	Sorry for barging right into the middle of this conversation, but the benchmark running faster on 3 cores than on 4 cores on a laptop can have so many other reasons. "Cache hierarchy" would be way down on my list for checking potential causes. thread placement, especially since Hyperthreading is enabled Thermal throttloing TDP throttling Background processes General variance of benchmark results ... Anything related to CPU caches DVSoares likes this.

June 17, 2022, 10:07		#539
DVSoares New Member Daniel Join Date: Jun 2010 Posts: 12 Rep Power: 15	Hey flotus1, your comments are always most welcome, no need to apologize I’ve repeated the runs at least 5 times, without even X11 running and in separate (in order to control temperatures), results didn’t vary more than 5% - just took the last run and put here. At the end of the day, one has to assess the entire platform (hardware and host software) - Simbelmynë’s last post is all about that too. I confess that laptop still serves my coding needs very well (no local compiling/running on it though) but it’s time has come

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to contribute to the community of OpenFOAM users and to the OpenFOAM technology	wyldckat	OpenFOAM	17	November 10, 2017 15:54
UNIGE February 13th-17th - 2107. OpenFOAM advaced training days	joegi.geo	OpenFOAM Announcements from Other Sources	0	October 1, 2016 19:20
OpenFOAM Training Beijing 22-26 Aug 2016	cfd.direct	OpenFOAM Announcements from Other Sources	0	May 3, 2016 04:57
New OpenFOAM Forum Structure	jola	OpenFOAM	2	October 19, 2011 06:55
Hardware for OpenFOAM LES	LijieNPIC	Hardware	0	November 8, 2010 09:54