Solid-state drive as a write buffer

JDR · July 9, 2012, 17:11

During long transient simulations where the entire simulation waits while a timestep is written to the hard drive, the write time can be considerable. Writing to a solid-state drive could speed up this step and could reduce the overall run-time by a significant amount. However, they are currently expensive and I would like to try a less expensive solution.

What I would like to do is try to use a solid-state drive as an intermediate storage buffer that the code writes to before transferring the data to a traditional hard drive. Has anyone tried this before?

I'm currently looking at a new workstation and my rep said this was possible, but I haven't seen any indication that people have done this before. All opinions are welcome.

Thanks,
JDR

kyle · July 10, 2012, 16:10

I struggled with this too. Most solutions that use a fast drive for a cache for a slower drive are optimized for reading many small files quickly, which is exactly the opposite of what we care about. We want to write large files quickly.

In typical computing writes are extremely important, because if the power goes out or a machine crashes and recently written data is lost, you could lose a customer purchase, or a record of a stock trade. There is all kinds of stuff in place in typical RAID and hybrid disk arrays that ensure that if you write something and the power goes out or the computer crashes, then that data will be preserved.

In CFD, we don't really care if we lose all the writes we did in the last 5 minutes. It only means that we lost 5 minutes of simulation time. Additionally, we know that our writes are going to be happening at somewhat consistent intervals (we won't be just saving data constantly for hours on end, there are breaks while the solver is iterating). What we want to be able to do is save data in 2GB-20GB bursts as quickly as possible.

The best solution I have found is to use a fileserver with at least 1.5x-2.0x as much RAM as the largest file you will ever save. You can get 32GB of RAM for less than $200. Then, you have to tell the kernel that you don't need it to be so paranoid about losing your writes. On Linux this is done by changing the value of "/proc/sys/vm/dirty_background_ratio" to something like 75%, which will allow you to use 75% of your RAM as a filesystem write cache (~10% is default). Now whenever your store a timestep, the solver can dump the results extremely quickly to the fileserver's RAM, and the fileserver can take its time writing out to disk while the solver continues on its way.

Intel's "Smart Response Technology" is a piece of software for Windows that does exactly what you are asking about, but the speed of RAM is orders of magnitude faster than even SSDs. I think it only works on the z68 chipset though (which is bogus, because it is just a software technology). There is also an young Linux project that does much the same thing called bcache.

JDR · July 10, 2012, 18:38

Kyle,

Thanks for your response. I'm glad others have been thinking about this as well.

Your solution seems interesting. Regular RAM would clearly be faster than a SSD, but I'm curious as to what type of configuration you're talking about. On a large cluster you have a fileserver, but what about on just a workstation with the hard drive located internally? I suppose you could write to an external file server that had its own memory, but you would want to make sure you can still use SATA data speeds and not drop down to 1Gbps ethernet speeds. Is there a way to allocate memory to the hard drive as an i/o buffer?

I have heard a little bit about the whole hybrid drive concept, but based upon what you said, it seems unlikely that they contain the volume of memory required for our applications.

kyle · July 10, 2012, 19:53

I am using a small cluster with a dedicated fileserver. It is connected with 20Gb Infiniband, but even when I switched to just gig-e to test it, I was able to pretty much overwhelm the cheap hard drives whenever it saved.

You should be able apply a similar technique to a single workstation setup, but the RAM requirements go up a lot because you have to not only store your transient history files, but the simulation itself.

Anna Tian · July 13, 2014, 05:29

Quote:

Originally Posted by kyle

I struggled with this too. Most solutions that use a fast drive for a cache for a slower drive are optimized for reading many small files quickly, which is exactly the opposite of what we care about. We want to write large files quickly.

In typical computing writes are extremely important, because if the power goes out or a machine crashes and recently written data is lost, you could lose a customer purchase, or a record of a stock trade. There is all kinds of stuff in place in typical RAID and hybrid disk arrays that ensure that if you write something and the power goes out or the computer crashes, then that data will be preserved.

In CFD, we don't really care if we lose all the writes we did in the last 5 minutes. It only means that we lost 5 minutes of simulation time. Additionally, we know that our writes are going to be happening at somewhat consistent intervals (we won't be just saving data constantly for hours on end, there are breaks while the solver is iterating). What we want to be able to do is save data in 2GB-20GB bursts as quickly as possible.

The best solution I have found is to use a fileserver with at least 1.5x-2.0x as much RAM as the largest file you will ever save. You can get 32GB of RAM for less than $200. Then, you have to tell the kernel that you don't need it to be so paranoid about losing your writes. On Linux this is done by changing the value of "/proc/sys/vm/dirty_background_ratio" to something like 75%, which will allow you to use 75% of your RAM as a filesystem write cache (~10% is default). Now whenever your store a timestep, the solver can dump the results extremely quickly to the fileserver's RAM, and the fileserver can take its time writing out to disk while the solver continues on its way.

Intel's "Smart Response Technology" is a piece of software for Windows that does exactly what you are asking about, but the speed of RAM is orders of magnitude faster than even SSDs. I think it only works on the z68 chipset though (which is bogus, because it is just a software technology). There is also an young Linux project that does much the same thing called bcache.

I'm gonna to purchase my new hardware for CFD. I'm thinking about SSD.

I noticed that it always took a while to open a case file or to save a case file. If the case file is large, it takes more than 10 seconds to open it or save it. That just disperses my attentions. And the accumulated time waiting cost won't be small. We actually save or read files very frequently, because we don't want to lose our work and we need to do some tests to make sure the simulation won't diverge or to make sure there isn't anything wrong with it. My idea is to use a small SSD (e.g. 32 GB or 60 GB, not expensive) as the current working file disk. Once it is full, move the finished simulation files to the usual disks. I also think if I install the CFD software in the SSD, the waiting time could be further reduced a lot. Is this correct?

I'm not a CFD hardware expert. Could anyone comment on this idea?

Btw, I use Fluent as the solver and ICEM to generate grids.

July 9, 2012, 17:11	Solid-state drive as a write buffer	#1
JDR New Member Jonathan Regele Join Date: Jul 2010 Posts: 6 Rep Power: 15	During long transient simulations where the entire simulation waits while a timestep is written to the hard drive, the write time can be considerable. Writing to a solid-state drive could speed up this step and could reduce the overall run-time by a significant amount. However, they are currently expensive and I would like to try a less expensive solution. What I would like to do is try to use a solid-state drive as an intermediate storage buffer that the code writes to before transferring the data to a traditional hard drive. Has anyone tried this before? I'm currently looking at a new workstation and my rep said this was possible, but I haven't seen any indication that people have done this before. All opinions are welcome. Thanks, JDR

July 10, 2012, 16:10		#2
kyle Senior Member Join Date: Mar 2009 Location: Austin, TX Posts: 160 Rep Power: 18	I struggled with this too. Most solutions that use a fast drive for a cache for a slower drive are optimized for reading many small files quickly, which is exactly the opposite of what we care about. We want to write large files quickly. In typical computing writes are extremely important, because if the power goes out or a machine crashes and recently written data is lost, you could lose a customer purchase, or a record of a stock trade. There is all kinds of stuff in place in typical RAID and hybrid disk arrays that ensure that if you write something and the power goes out or the computer crashes, then that data will be preserved. In CFD, we don't really care if we lose all the writes we did in the last 5 minutes. It only means that we lost 5 minutes of simulation time. Additionally, we know that our writes are going to be happening at somewhat consistent intervals (we won't be just saving data constantly for hours on end, there are breaks while the solver is iterating). What we want to be able to do is save data in 2GB-20GB bursts as quickly as possible. The best solution I have found is to use a fileserver with at least 1.5x-2.0x as much RAM as the largest file you will ever save. You can get 32GB of RAM for less than $200. Then, you have to tell the kernel that you don't need it to be so paranoid about losing your writes. On Linux this is done by changing the value of "/proc/sys/vm/dirty_background_ratio" to something like 75%, which will allow you to use 75% of your RAM as a filesystem write cache (~10% is default). Now whenever your store a timestep, the solver can dump the results extremely quickly to the fileserver's RAM, and the fileserver can take its time writing out to disk while the solver continues on its way. Intel's "Smart Response Technology" is a piece of software for Windows that does exactly what you are asking about, but the speed of RAM is orders of magnitude faster than even SSDs. I think it only works on the z68 chipset though (which is bogus, because it is just a software technology). There is also an young Linux project that does much the same thing called bcache. ShowponyStuart and Anna Tian like this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Calculation of the Governing Equations	Mihail	CFX	7	September 7, 2014 06:27
No results for solid domain	Gary Holland	CFX	10	March 13, 2009 03:30
Two-Phase Buoyant Flow Issue	Miguel Baritto	CFX	4	August 31, 2006 12:02
THERMAL CONDUCTIVITY FOR SOLID DOMAIN -BOUYANCY	CARL	CFX	1	June 9, 2006 16:44
Convective Heat Transfer - Heat Exchanger	Mark	CFX	6	November 15, 2004 15:55

July 10, 2012, 18:38		#3
JDR New Member Jonathan Regele Join Date: Jul 2010 Posts: 6 Rep Power: 15	Kyle, Thanks for your response. I'm glad others have been thinking about this as well. Your solution seems interesting. Regular RAM would clearly be faster than a SSD, but I'm curious as to what type of configuration you're talking about. On a large cluster you have a fileserver, but what about on just a workstation with the hard drive located internally? I suppose you could write to an external file server that had its own memory, but you would want to make sure you can still use SATA data speeds and not drop down to 1Gbps ethernet speeds. Is there a way to allocate memory to the hard drive as an i/o buffer? I have heard a little bit about the whole hybrid drive concept, but based upon what you said, it seems unlikely that they contain the volume of memory required for our applications.

July 10, 2012, 19:53		#4
kyle Senior Member Join Date: Mar 2009 Location: Austin, TX Posts: 160 Rep Power: 18	I am using a small cluster with a dedicated fileserver. It is connected with 20Gb Infiniband, but even when I switched to just gig-e to test it, I was able to pretty much overwhelm the cheap hard drives whenever it saved. You should be able apply a similar technique to a single workstation setup, but the RAM requirements go up a lot because you have to not only store your transient history files, but the simulation itself.