A machine specifically for meshing?

Allen_ · July 2, 2020, 17:01

My basic problem is that I have a little cluster that seems is able to solve much more than I'm able to mesh. Each node is 24 cores and 128GB ram, and I just dropped a graphics card in the first node and do most of my meshing on it via Parsec. I'm trying to really crank up the resolution of some of my models but I seem to be running out of ram, so I've decided to build a "workstation node" specifically for this problem, but this is something I'm having a hard time finding specific information for. Certainly there's some disconnect between "what makes a good CFD machine" and "what makes a good meshing machine," but where exactly do the differences lie? Obviously the CFD work is directly dependent on memory bandwidth, but is mesh generation? It can certainly use lots of memory, but is the memory performance critical in the same way that it is during solving?

OK, suppose you had access to a small cluster comprised of numerous but lower clocked compute nodes, and all of your CAD work is well-handled by w-laptop/dock stations. You'd need to build a machine that can generate larger meshes in a reasonable amount of time, so what do you look for?

To start with I'd assume that you'd want ECC memory just for reliability... we wouldn't want instances where we come back in the following day to find failed meshes and wondering why when there was nothing actually wrong with the setup. And what about memory channels? In CFD we know that more is better, but is it that important in mesh generation? I know I need a lot of ram, but for instance would a fast cpu(let's say 8 cores) and 256GB over two channels be slower than a cheaper, slower(but also 8 cores) cpu with the same amount of ram over four channels?

As for CPU selection, we know that TR is a bad buy for solving CFD, but would it be an ideal chip for meshing? If the number of memory channels isn't important(or is it???) and ECC isn't important(again, ???), then would the 8 core Ryzen chips like the 3800X make for good meshing?

I'd appreciate anyone who knows or has any insights to chime in. System selection in this forum seems to be centered on the solving aspect of CFD. However, it seems that in many cases different parts of the problem are done on different computers, so it follows that different systems would be better optimized for these different tasks. I just don't see where it's ever discussed here, and there are very few places to look that have the same kind of knowledge base as this forum.

flotus1 · July 2, 2020, 18:52

Having developed both CFD solvers and grid generators, I feel qualified to make an educated guess.

First things first: the easy way would be to just drop in more RAM into that first node. It might not be quite as fast for meshing as a purpose-built machine, but definitely cheaper. And depending on what exactly these compute nodes look like, the performance difference might not be huge compared to a purpose-built workstation.

Grid generation, especially with commercial "general purpose" software, does not scale well across many cores. There are exceptions of massively parallel grid generators, even for distributed memory. But these are mostly for scientific applications.
This impacts what the perfect pre-processing hardware should look like. Memory bandwidth is not as important as for CFD solvers. Few cores running below maximum utilization can not saturate the full memory bandwidth of e.g. an AMD Epyc CPU.
What's more important here is latency. Grid generators do not have the benefit of -for lack of a better term- sequential memory access that CFD solvers can have. Data access will be scattered across the whole data set, and often in very small chunks. This access pattern disqualifies NUMA machines due to their high memory latency penalty when accessing memory that is not local to the node.

Are mainstream CPUs a good choice? I don't think so. They are currently limited to a maximum of 256GB of memory. And while memory bandwidth is not as important, I still would not go below quad-channel memory. On factor here, apart from the higher overall bandwidth, is "loaded memory latency". The closer you are to maxing out memory bandwidth, the higher memory latency gets.

So what's the ideal machine for grid generation, assuming the grid generator you are using can even run in parallel?
A single monolithic CPU with high clock speed and a moderate (8-12) amount of cores. Caches should not be segmented like they are on most current AMD CPUs. And capable of addressing as much memory as you need. Since you "really" want to crank up the resolution, and 128GB are not enough, I would say at least 512GB.
Intels Xeon W lineup for socket 2066 would be the weapon of choice. For example the 8-core Xeon W-2245.
How much faster this might be compared to just upgrading the memory on one of your compute nodes, depends on the hardware of said node.

Simbelmynë · July 3, 2020, 04:31

I'll quote myself from the benchmark thread. It may give some indication at least for OpenFOAM meshing. I think this indicates that bandwidth may not play a very large role. Also, parallel performance may not be the best so high single core performance seems important.

The quad channel 16 core 1950X is faster than the 8700k, but that may be because of core count alone. A 9900k (or ryzen 3900X) may possibly be faster for meshing than a 1950X.

The soon-to-be-released 4700G paired with high speed memory may be a cheap option for meshing and lighter simulations considering that it seems to be able to reach really low memory latencies and high bandwidth for dual channel, while still boasting 8 cores at reasonable frequencies. This is speculation from my side though and CFD tests are needed first.

If memory amount is important (>128 GB) then you obviously disregard all these options. From a cost perspective it is clear that memory becomes a very large part of the budget when you opt for high speed memory in large amounts, so the cheaper options may not be that much cheaper, so with that in mind:

Flotus1 has the most reasonable suggestion, but I thought I might chime in with some other options.

Quote:

Originally Posted by Simbelmynë

It is also interesting to analyze the meshing time.

For the 8700K system we have:

Code:

# cores   real time:
------------------------
1            16m35s
2            10m56s
4            07m01s
6            05m30s

While the 1950X performs as:

Code:

# cores   real time:
------------------------
1            23m32s
2            16m01s
4            08m44s
6            06m50s
8            05m48s
12          04m38s
16          04m12s

It seems that the meshing part is not as memory bound as the CFD solver.

Allen_ · July 3, 2020, 15:13

Jeez, this is what I was afraid of. 64GB RDIMMs are not cheap.

Quote:

Originally Posted by flotus1

...depending on what exactly these compute nodes look like, the performance difference might not be huge compared to a purpose-built workstation.

They are (2x) E5-2690 v3 nodes with 128GB ram. So far the entire "cluster" sits in a single Supermicro FatTwin 6028TR-DTR server with a direct QDR IB connection between the nodes. One issue here that is immediately apparent is that these boards only have 4 dimm slots per socket, so just dropping in more ram is out. Also I have been ogling over the W-22xx chips, but as the ram speeds increase the price of large dimms really goes up, so I think that's out.

So, it looks like whatever solution I come up with is going to start with buying eight to sixteen 32GB sticks. And since I have nowhere to put them I'll be needing a new system as well (of course making sure I room to expand). I do have a Dell R720xd on hand that is being woefully underutilized, and that allows for buying ddr3 ram which is almost half the price. I could also explore going ahead and making it the head/storage node; I wouldn't think filling those roles on v2 hardware should matter (or am I wrong?).

This also means it's time for an IB switch, which I've been trying to find out more about as well. Mainly, would I need a managed switch or would unmanaged be fine, and when would one opt for one over the other? Once that's figured out though a QDR switch is pretty cheap nowadays.

Thanks for y'all's help, and I'm open to further suggestions.

July 2, 2020, 17:01	A machine specifically for meshing?	#1
Allen_ New Member Allen Join Date: Jan 2020 Posts: 8 Rep Power: 6	My basic problem is that I have a little cluster that seems is able to solve much more than I'm able to mesh. Each node is 24 cores and 128GB ram, and I just dropped a graphics card in the first node and do most of my meshing on it via Parsec. I'm trying to really crank up the resolution of some of my models but I seem to be running out of ram, so I've decided to build a "workstation node" specifically for this problem, but this is something I'm having a hard time finding specific information for. Certainly there's some disconnect between "what makes a good CFD machine" and "what makes a good meshing machine," but where exactly do the differences lie? Obviously the CFD work is directly dependent on memory bandwidth, but is mesh generation? It can certainly use lots of memory, but is the memory performance critical in the same way that it is during solving? OK, suppose you had access to a small cluster comprised of numerous but lower clocked compute nodes, and all of your CAD work is well-handled by w-laptop/dock stations. You'd need to build a machine that can generate larger meshes in a reasonable amount of time, so what do you look for? To start with I'd assume that you'd want ECC memory just for reliability... we wouldn't want instances where we come back in the following day to find failed meshes and wondering why when there was nothing actually wrong with the setup. And what about memory channels? In CFD we know that more is better, but is it that important in mesh generation? I know I need a lot of ram, but for instance would a fast cpu(let's say 8 cores) and 256GB over two channels be slower than a cheaper, slower(but also 8 cores) cpu with the same amount of ram over four channels? As for CPU selection, we know that TR is a bad buy for solving CFD, but would it be an ideal chip for meshing? If the number of memory channels isn't important(or is it???) and ECC isn't important(again, ???), then would the 8 core Ryzen chips like the 3800X make for good meshing? I'd appreciate anyone who knows or has any insights to chime in. System selection in this forum seems to be centered on the solving aspect of CFD. However, it seems that in many cases different parts of the problem are done on different computers, so it follows that different systems would be better optimized for these different tasks. I just don't see where it's ever discussed here, and there are very few places to look that have the same kind of knowledge base as this forum.

July 2, 2020, 18:52		#2
flotus1 Super Moderator Alex Join Date: Jun 2012 Location: Germany Posts: 3,399 Rep Power: 46	Having developed both CFD solvers and grid generators, I feel qualified to make an educated guess. First things first: the easy way would be to just drop in more RAM into that first node. It might not be quite as fast for meshing as a purpose-built machine, but definitely cheaper. And depending on what exactly these compute nodes look like, the performance difference might not be huge compared to a purpose-built workstation. Grid generation, especially with commercial "general purpose" software, does not scale well across many cores. There are exceptions of massively parallel grid generators, even for distributed memory. But these are mostly for scientific applications. This impacts what the perfect pre-processing hardware should look like. Memory bandwidth is not as important as for CFD solvers. Few cores running below maximum utilization can not saturate the full memory bandwidth of e.g. an AMD Epyc CPU. What's more important here is latency. Grid generators do not have the benefit of -for lack of a better term- sequential memory access that CFD solvers can have. Data access will be scattered across the whole data set, and often in very small chunks. This access pattern disqualifies NUMA machines due to their high memory latency penalty when accessing memory that is not local to the node. Are mainstream CPUs a good choice? I don't think so. They are currently limited to a maximum of 256GB of memory. And while memory bandwidth is not as important, I still would not go below quad-channel memory. On factor here, apart from the higher overall bandwidth, is "loaded memory latency". The closer you are to maxing out memory bandwidth, the higher memory latency gets. So what's the ideal machine for grid generation, assuming the grid generator you are using can even run in parallel? A single monolithic CPU with high clock speed and a moderate (8-12) amount of cores. Caches should not be segmented like they are on most current AMD CPUs. And capable of addressing as much memory as you need. Since you "really" want to crank up the resolution, and 128GB are not enough, I would say at least 512GB. Intels Xeon W lineup for socket 2066 would be the weapon of choice. For example the 8-core Xeon W-2245. How much faster this might be compared to just upgrading the memory on one of your compute nodes, depends on the hardware of said node. Allen_ likes this. Last edited by flotus1; July 3, 2020 at 04:55.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[TUTORIAL] Run fluent on distributed memory with 2 windows 7 64 bit machines	ghost82	FLUENT	54	February 9, 2022 03:32
[Other] Support for the USB-stick image and Virtual Machine Appliance for the OFW11	wyldckat	OpenFOAM Installation	2	August 7, 2016 17:50
Evaluating machine performance	me3840	Hardware	0	December 29, 2015 15:13
Distributed Parallel on dual core remote machine	Justin	CFX	1	February 3, 2008 17:23
PC vs. Workstation	Tim Franke	Main CFD Forum	5	September 29, 1999 15:01