CFD Online Logo CFD Online URL
Home > Forums > General Forums > Hardware

A machine specifically for meshing?

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree2Likes
  • 1 Post By flotus1
  • 1 Post By Simbelmynė

LinkBack Thread Tools Search this Thread Display Modes
Old   July 2, 2020, 17:01
Default A machine specifically for meshing?
New Member
Join Date: Jan 2020
Posts: 8
Rep Power: 3
Allen_ is on a distinguished road
My basic problem is that I have a little cluster that seems is able to solve much more than I'm able to mesh. Each node is 24 cores and 128GB ram, and I just dropped a graphics card in the first node and do most of my meshing on it via Parsec. I'm trying to really crank up the resolution of some of my models but I seem to be running out of ram, so I've decided to build a "workstation node" specifically for this problem, but this is something I'm having a hard time finding specific information for. Certainly there's some disconnect between "what makes a good CFD machine" and "what makes a good meshing machine," but where exactly do the differences lie? Obviously the CFD work is directly dependent on memory bandwidth, but is mesh generation? It can certainly use lots of memory, but is the memory performance critical in the same way that it is during solving?

OK, suppose you had access to a small cluster comprised of numerous but lower clocked compute nodes, and all of your CAD work is well-handled by w-laptop/dock stations. You'd need to build a machine that can generate larger meshes in a reasonable amount of time, so what do you look for?

To start with I'd assume that you'd want ECC memory just for reliability... we wouldn't want instances where we come back in the following day to find failed meshes and wondering why when there was nothing actually wrong with the setup. And what about memory channels? In CFD we know that more is better, but is it that important in mesh generation? I know I need a lot of ram, but for instance would a fast cpu(let's say 8 cores) and 256GB over two channels be slower than a cheaper, slower(but also 8 cores) cpu with the same amount of ram over four channels?

As for CPU selection, we know that TR is a bad buy for solving CFD, but would it be an ideal chip for meshing? If the number of memory channels isn't important(or is it???) and ECC isn't important(again, ???), then would the 8 core Ryzen chips like the 3800X make for good meshing?

I'd appreciate anyone who knows or has any insights to chime in. System selection in this forum seems to be centered on the solving aspect of CFD. However, it seems that in many cases different parts of the problem are done on different computers, so it follows that different systems would be better optimized for these different tasks. I just don't see where it's ever discussed here, and there are very few places to look that have the same kind of knowledge base as this forum.
Allen_ is offline   Reply With Quote

Old   July 2, 2020, 18:52
Super Moderator
flotus1's Avatar
Join Date: Jun 2012
Location: Germany
Posts: 2,756
Rep Power: 39
flotus1 has a spectacular aura aboutflotus1 has a spectacular aura about
Having developed both CFD solvers and grid generators, I feel qualified to make an educated guess.

First things first: the easy way would be to just drop in more RAM into that first node. It might not be quite as fast for meshing as a purpose-built machine, but definitely cheaper. And depending on what exactly these compute nodes look like, the performance difference might not be huge compared to a purpose-built workstation.

Grid generation, especially with commercial "general purpose" software, does not scale well across many cores. There are exceptions of massively parallel grid generators, even for distributed memory. But these are mostly for scientific applications.
This impacts what the perfect pre-processing hardware should look like. Memory bandwidth is not as important as for CFD solvers. Few cores running below maximum utilization can not saturate the full memory bandwidth of e.g. an AMD Epyc CPU.
What's more important here is latency. Grid generators do not have the benefit of -for lack of a better term- sequential memory access that CFD solvers can have. Data access will be scattered across the whole data set, and often in very small chunks. This access pattern disqualifies NUMA machines due to their high memory latency penalty when accessing memory that is not local to the node.

Are mainstream CPUs a good choice? I don't think so. They are currently limited to a maximum of 256GB of memory. And while memory bandwidth is not as important, I still would not go below quad-channel memory. On factor here, apart from the higher overall bandwidth, is "loaded memory latency". The closer you are to maxing out memory bandwidth, the higher memory latency gets.

So what's the ideal machine for grid generation, assuming the grid generator you are using can even run in parallel?
A single monolithic CPU with high clock speed and a moderate (8-12) amount of cores. Caches should not be segmented like they are on most current AMD CPUs. And capable of addressing as much memory as you need. Since you "really" want to crank up the resolution, and 128GB are not enough, I would say at least 512GB.
Intels Xeon W lineup for socket 2066 would be the weapon of choice. For example the 8-core Xeon W-2245.
How much faster this might be compared to just upgrading the memory on one of your compute nodes, depends on the hardware of said node.
Allen_ likes this.

Last edited by flotus1; July 3, 2020 at 04:55.
flotus1 is offline   Reply With Quote

Old   July 3, 2020, 04:31
Senior Member
Simbelmynė's Avatar
Join Date: May 2012
Posts: 471
Rep Power: 13
Simbelmynė is on a distinguished road
I'll quote myself from the benchmark thread. It may give some indication at least for OpenFOAM meshing. I think this indicates that bandwidth may not play a very large role. Also, parallel performance may not be the best so high single core performance seems important.

The quad channel 16 core 1950X is faster than the 8700k, but that may be because of core count alone. A 9900k (or ryzen 3900X) may possibly be faster for meshing than a 1950X.

The soon-to-be-released 4700G paired with high speed memory may be a cheap option for meshing and lighter simulations considering that it seems to be able to reach really low memory latencies and high bandwidth for dual channel, while still boasting 8 cores at reasonable frequencies. This is speculation from my side though and CFD tests are needed first.

If memory amount is important (>128 GB) then you obviously disregard all these options. From a cost perspective it is clear that memory becomes a very large part of the budget when you opt for high speed memory in large amounts, so the cheaper options may not be that much cheaper, so with that in mind:

Flotus1 has the most reasonable suggestion, but I thought I might chime in with some other options.

Originally Posted by Simbelmynė View Post
It is also interesting to analyze the meshing time.

For the 8700K system we have:
# cores   real time:
1            16m35s
2            10m56s
4            07m01s
6            05m30s
While the 1950X performs as:
# cores   real time:
1            23m32s
2            16m01s
4            08m44s
6            06m50s
8            05m48s
12          04m38s
16          04m12s
It seems that the meshing part is not as memory bound as the CFD solver.
Allen_ likes this.
Simbelmynė is offline   Reply With Quote

Old   July 3, 2020, 15:13
New Member
Join Date: Jan 2020
Posts: 8
Rep Power: 3
Allen_ is on a distinguished road
Jeez, this is what I was afraid of. 64GB RDIMMs are not cheap.

Originally Posted by flotus1 View Post
...depending on what exactly these compute nodes look like, the performance difference might not be huge compared to a purpose-built workstation.
They are (2x) E5-2690 v3 nodes with 128GB ram. So far the entire "cluster" sits in a single Supermicro FatTwin 6028TR-DTR server with a direct QDR IB connection between the nodes. One issue here that is immediately apparent is that these boards only have 4 dimm slots per socket, so just dropping in more ram is out. Also I have been ogling over the W-22xx chips, but as the ram speeds increase the price of large dimms really goes up, so I think that's out.

So, it looks like whatever solution I come up with is going to start with buying eight to sixteen 32GB sticks. And since I have nowhere to put them I'll be needing a new system as well (of course making sure I room to expand). I do have a Dell R720xd on hand that is being woefully underutilized, and that allows for buying ddr3 ram which is almost half the price. I could also explore going ahead and making it the head/storage node; I wouldn't think filling those roles on v2 hardware should matter (or am I wrong?).

This also means it's time for an IB switch, which I've been trying to find out more about as well. Mainly, would I need a managed switch or would unmanaged be fine, and when would one opt for one over the other? Once that's figured out though a QDR switch is pretty cheap nowadays.

Thanks for y'all's help, and I'm open to further suggestions.

Last edited by Allen_; July 3, 2020 at 18:02.
Allen_ is offline   Reply With Quote


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
[TUTORIAL] Run fluent on distributed memory with 2 windows 7 64 bit machines ghost82 FLUENT 53 January 25, 2021 12:37
[Other] Support for the USB-stick image and Virtual Machine Appliance for the OFW11 wyldckat OpenFOAM Installation 2 August 7, 2016 17:50
Evaluating machine performance me3840 Hardware 0 December 29, 2015 15:13
Distributed Parallel on dual core remote machine Justin CFX 1 February 3, 2008 17:23
PC vs. Workstation Tim Franke Main CFD Forum 5 September 29, 1999 15:01

All times are GMT -4. The time now is 15:13.