CFD Online Discussion Forums - RAM drive for rapid i/o buffering

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Hardware (https://www.cfd-online.com/Forums/hardware/)

- - RAM drive for rapid i/o buffering (https://www.cfd-online.com/Forums/hardware/232280-ram-drive-rapid-i-o-buffering.html)

RAM drive for rapid i/o buffering

I’ve got some spare RAM on my head node and was considering partitioning it into a virtual RAM drive to accelerate file i/o for transient simulations. Just trying to determine if it’s worth the effort. Has anyone tried this before? I guess I’d also need some script to move files to disk periodically so I wouldn’t overflow the RAM drive buffer.

I’m running Fluent in MPI mode and typically write files from the head node in serial (default i/o config). I haven’t messed with the parallel data file i/o option since I often need to move the model to more or less nodes mid run and repartition the mesh (I assume this would cause problems, but I’m not 100% sure).

At first glance, this does not seem like a particularly good idea. At least if you value your time and data.
What type of interconnect is used to connect the compute nodes to the file system on the head node? There might already be a bottleneck here, negating any advantage of a lightning fast file system.
Have you identified file I/O as a relevant bottleneck? What type of drives are you currently using?
I think these questions need to be answered before you go down the rabbit hole that is RAM drives.
And also consider that fast, high endurance SSDs can be had rather cheap these days.

It's a very complex field to go into, however at least with ZFS as backing there is the option to use NVMe as slog and speed up (nfs-)sync writes dramatically, which will normally put the bottleneck back on the network, certainly with 10G. If you care a about your data then you will want sync writes in any case, which with normal ssd or spinning disks become torturously slow.

This (The RAM disk) would be a very quick thing to try out (literally not even a reboot needed when using linux) - would be interesting to see if your results benefit from using a RAM disk. Then depending on that you could see if speeding up your fs would provide comparable benefits. Do you have any numbers?

Normally the drives are not used during computation for CFD. Almost all data is kept in RAM during the solution process.
The drives of course are used during post processing though. I can see this perhaps helping with that portion of transient analyses. I remember post processing transient animations and transient graphs, and SSDs made this process MUCH faster. I would consider trying a RAM drive for this purpose perhaps.

Insightful comments from all. I think what I am hearing is that I need to do some more analysis to determine how long I’m actually spending in file i/o. Also, as I mentioned this only really matters in transient cases where I am frequently writing to disc since otherwise everything goes in local RAM of course.

I’m using 1 Gbps Ethernet for the network and standard HHDs in the NAS (don’t have the specs handy, but it’s middling read/write performance I believe). Definitely room for improvement on both fronts. I’m definitely on the low end of network speeds. Since I have a relatively small setup, last I checked I was better off adding nodes that investing in a faster network, but that may change as I expand.

I like the idea of just adding a fast SSD to the head node instead of writing directly to the NAS. Seems like a simpler solution (and safer for the data) than configuring a RAM drive.

I did a rudimentary test on my current setup where I saved a ~8GB Fluent case+data a few times and it took ~15 minutes for each save, but I need to look at that result a bit more closely since that is about an order of magnitude slower than the theoretical performance of my network and HDDs by my calculations. Perhaps the bottleneck is the MPI gather operation transferring the data from each node to the head node prior to the write, although again it appear to be way slower than what the network can handle so that does not seem to be a satisfactory explain action either. I’m somewhat new to Fluent, but it seems like the performance data provided to the user is fairly rudimentary relative to other codes I’ve used. I don’t have a great sense for how much time is being spent in various operations.

I still think the RAM drive may be a fun project, but I’ll probably kick that down the road for a few years until the rest of my setup is better optimized first.