CFD Online Discussion Forums - Binary gives significant performance advantage (Mesh & Solve)

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- OpenFOAM (https://www.cfd-online.com/Forums/openfoam/)

- - Binary gives significant performance advantage (Mesh & Solve) (https://www.cfd-online.com/Forums/openfoam/136983-binary-gives-significant-performance-advantage-mesh-solve.html)

Binary gives significant performance advantage (Mesh & Solve)

I've run a handful of different external aero cases. I am very surprised to see meshing and solution running significantly faster when output is set at binary compared to ASCII. The only difference in these cases is the setting of 'writeFormat' in the controlDict. Time stated below is User time. Please note each case is an entirely different geometry, block mesh & type).

Case1 - snappyHexMesh - 4.4 million cell
Binary: 1 hr 24 min
ASCII: 3 hr 0 min
Percentage decrease: 53%

Case2 - snappyHexMesh - 1.5 million cell
Binary: 20 min
ASCII: 23 min
Percentage decrease: 10%

Case2 - simpleFoam - 1700 steps
Binary: 9 hr 04 min
ASCII: 11 hr 52 min
Percentage decrease: 24%

Case3 - snappyHexMesh - 4.1 million cells
Binary: 9 hr 18 min
ASCII: 10 hr 15 min
Percentage decrease: 10%

I appreciate that writing ASCII might slow the system, perhaps a minute or so over a long run, but nothing like the significant and repeatable amounts I've encountered. Surely OpenFOAM can only 'understand' binary, thus even when running 'ASCII' these files are read into the system memory as binary? There shouldn't be any major performance difference, but there is. Has anybody else experienced this? Any explanations or solutions other than running foamConvertMesh before and after each run would be greatly appreciated.

Greetings Jason,

There are a few important details that aren't clear in your description:

How frequently are the field files written during the run?
What OpenFOAM version are you using and what steps did you follow for installing it?

As for the frequency, the following details come to mind:

When you used snappyHexMesh, were all stages executed? Namely castellation, snapping and layer addition?
Was snappyHexMesh executed in parallel or serial mode?
Was the debug flag turned on in "system/snappyHexMeshDict"?
If you check the log file for snappyHexMesh, how long did it take to write each meshing iteration? It should tell you something like this on each major stage:

Code:

Writing mesh to time constant Wrote mesh in = 0.83 s. Mesh refined in = 5.05 s.
When simpleFoam is used, how frequently are the fields saved to disk? More specifically, what settings do you use for the following variables in "controlDict":

Code:

endTime deltaT writeControl writeInterval

Overall, knowing the contents of the file "system/controlDict" would help to diagnose the issue.
Did you use simpleFoam in serial or parallel?

Best regards,
Bruno

Bruno,

In response to your questions, this was performed with OpenFOAM 2.3 on both CentOS 6.3 and Ubuntu 12.04 using the OpenFOAM repositories on both. Of the three cases there is a mixture of serial and parallel meshing, the flow solution was run in parallel. I varied the write intervals between the cases, from infrequent to frequent, with little impact on the time.

Having run this on desktop and cluster, parallel and series, different OS thus different binaries, different geometries and meshes... I don't believe it's an anomaly isolated to me or the way I have setup OpenFOAM or my cases. In fact, after sharing my results someone different ran their own test, on a different system again. This case was entirely removed from me, he also found the same performance increase with binary over ASCII.

Of course I'm happy to share the values from the files if you wish, but I believe if anyone tries switching between binary and ascii on their own cases they will see an unexpected and disproportionate performance increase.

Many thanks,
Jason

This is to be expected. It is noticeably faster to write binary files than writing ASCII files, therefore this should be why you get this increase in speed.

This is due to "two" main factors :
-> Binary files are much more compressed, the size difference between an ASCII and binary file is important
-> C++ file streams are much faster flushing out binary data than ASCII. This is the same for any languages anyway.

Therefore, I am not sure I understand why this would be surprising / or a problem?

It is to be expected? In some cases, three hours extra to write a handful of files in binary rather than ASCII?? I'm sorry, but I don't believe it is! I'd expect a read-write overhead of ASCII to be in the order of seconds or minutes for a typical 3D aero CFD solution.

As I stated in my initial post an overhead is to be expected, I am not surprised to see one. What I am surprised by is the magnitude of the performance difference. It's massive, and surely can't solely be attributed to the simple writing of files??

It's true that the overhead seems very large. On which material architecture did you run those test? What kind of harddrive did you use? What amount of ram does the computer/cluster have? You mentioned Ubuntu so I presume this is a personnal computer/workstation right?

I cannot comment much on SnappyHexMesh because I do not know how often it writes to disk, but for a simpleFoam simulation the 24% is plausible depending on the number of writes you do and if it is in parallel or serial. You also did not mention the size of the simpleFoam case. It is also in the range of a couple million cells?

Just an addition :
ASCII wastes a lot of bit, especially if you are using a large numerical precision (let's say trying to output 8 digits) or if you are not using ASCII but UTF8 as an encoding format. For a large case (let's say million of cells) and for the same precision, you could easily expect that writing each file would take twice or more the amount of time (10s binary vs 20s or more ASCII). It's a relative problem. The bigger the file you are writing is, the more you will notice the difference.

The answer to those questions are in my reply to Bruno. I ran this across a number of desktops (Ubuntu) and on cluster (CentOS). Hardware is relatively powerful, desktops are 6-core (12 thread) Intel i7 Extreme with 64 GB RAM and SSD's, cluster nodes are Core i7 Quad Cores with 16 GB/each, I ran across two nodes, hence 16 threads and 32 GB. I kept close eye on RAM utilisation, as expected for these size cases (1-4 million cells) RAM usage was 2-6 GB range. All with high-end motherboards and good HDD and SSD.

The simpleFoam case was 'Case 2' in my original post, hence the stated 1.5 million cells. I have tried, other people too, with completely different cases, yet all with similar significant advantages from binary to ASCII.

Let me phrase my confusion in a different way... there is a command in OpenFOAM called 'fileFormatConvert' that reads the settings in system/controlDict and converts the relevant files in the case to that format. On the 1.5 million cell simpleFoam case this conversion process from binary to ASCII takes, in CPU time, just a couple of minutes for every time step (there are just 10). I would imagine this is exactly the same process OpenFoam would be using with simpleFoam, thus I would expect running simpleFoam in ASCII compared to binary to have this overhead. This is also in line with common sense and my own computing experience. The reality is, however, the ASCII penalty is orders of magnitude higher.

Then I am at lost here, this is indeed abnormal. The cost difference of writing ASCII instead of binary should not exceed that of the fileFormatConvert utility, because obviously in the last one you have to re-read the file again from scratch AND write it...

I have no idea. Do you know if the same writing method is used in both ASCII and Binary case (meaning, are MPI-IO functionnality used in the binary case that would not be used in the ASCII case?).

In all cases, this is very surprising and a bit troubling...

hello,

I've done this test in the old time with OF 1.6, and conclude mostly the same (and same order of magnitude):
ascii compressed files faster than << binary compressed << binary << ascii.
So try aslo compressed file (ascii and binary) to see if you get the same.

regards,
olivier

Hi!

Do you have the standard output from the simpleFoam runs? One could maybe do some statistical analysis of the execution and wall clock times to try to point out where the difference occurs.

Kalle

Olivier, I haven't looked at the different between compressed and uncompressed. Thank you for the suggestion, I'll run some cases now to compare to see if the same is true with OF 2.3.

Kalle, I used the linux 'Time' facility which reports real time (Wall clock), user time (CPU time) and system time (overhead). The times I reported above are User times. I still have the output files, but have not done a comparison to see at which point they start to diverge, this is a good suggestion, thank you. I'll perform additional runs based in Olivier's suggestion of write compression and then I'll report back here with the findings, and analysis of when the diverge.

In the meantime I would suggest OpenFOAM users run in Binary and user foamFormatConvert utility to convert to ASCII if they require it.

Many thanks,
Jason

Could you share the log files? I guess they are a few hundred megabytes in total, but if you could, I would be interested in looking at them!

Regards,
Kalle

In case anybody is still interested, following Olivier's comment I ran the 1.5 million cell simpleFoam case as mentioned in my first post (Case 2) with the following results:
(to refresh memory, this is with OpenFOAM 2.3. Exactly same case other than ASCII/Binary and Compression in system/controlDict)

Binary Uncompressed: 9 hr 04 min
Binary Compressed: 19 hr 24 min

ASCII Uncompressed: 11 hr 52 min
ASCII Compressed: 11 hr 04 min

Based on Kalle's suggestion I compared the log files, these cases diverge in time from the very first time step, long before writing, and continue to diverge with each time step. Very odd!! Compressing binary would be very expensive, but why is it slower between steps which aren't being written or read?! In summary, I've found it's significantly beneficial to run OpenFOAM in Binary Uncompressed and convert using foamFormatConvert after the run. In terms of runtime:

Binary Uncompressed < ASCII Compressed < ASCII Uncompressed << Binary Compressed

Kind Regards,
Jason

Quote:

Originally Posted by glypo (Post 498633)

Binary Uncompressed: 9 hr 04 min
Binary Compressed: 19 hr 24 min

ASCII Uncompressed: 11 hr 52 min
ASCII Compressed: 11 hr 04 min

Is that really 19 hours?

Indeed a strange thing you've run into here. What if you disable write-out altogether? Do you start both simulations from identical cases, i.e. both cases start from either ASCII data or Binary data... or are maybe your fields uniform, and there is no difference in ASCII or Binary at start?

Kalle

It's difficult to explain but it's not a rogue case, others have experienced this, and judging by Olivier's reply this dates back to early OpenFOAM. The time I stated is correct, in that consistent with the rest of my data it is recorded Linux User Time, this is a parallel case (8 processors) thus the time difference is exaggerated.

Uniform fields, to make it fairer, and as I say the times diverge long before write. I've not tried single write or no write, good suggestion, that would indeed be an interesting experiment. Thank you, I will give this a go.

p.s. if you prefer real time (aka wall-clock time)

Binary Uncompressed: 1 hr 14 min
Binary Compressed: 2 hr 37 min

ASCII Uncompressed: 1 hr 36 min
ASCII Compressed: 1 hr 27 min

Different numbers, but same improvement.

Greetings to all!

:eek: OK, only after 6 months did I manage to find enough time to run some tests of my own on a stable machine (mine isn't as stable :().
The results are now available here: https://github.com/wyldckat/wyldckat...nce_Analysis_2

The summary results:

If no file is written, it doesn't matter if it's configured to write in "binary" or "ascii".
Example timings of running 200 iterations with simpleFoam and writing every 10 iterations:
- ascii precision 12: 184.65 s
- binary: 167.26 s
Example timings of running 50 iterations with simpleFoam and writing every iteration:
- ascii precision 12: 86.62 s
- binary: 42.7 s
Writing to disk did not affect performance, since memory cache came into play to assist in writing as fast as possible, due to the case not being all that big. We're talking about 3GB in files versus a machine with 24GB of RAM... hurray for memory cache :D
Then comes the 6th diagnosis approach I took, where I wrote my own simplistic codes in C++ to write a double array of 5000000 (5 million) values to file in ascii and binary. Results:
- ascii: 3.102s - 52180111 (byte)
- ascii_sprintf_f: 3.911s - 69767509 (byte)
- ascii_sprintf_g: 2.609s - 52180111 (byte)
- binary: 0.142s - 40000000 (byte)
And here's the kick to the nuts, if we run gzip on the resulting files for these 4 tests:
- ascii: 1.062s - 2452385 (byte)
- ascii_sprintf_f: 6.208s - 29881711 (byte)
- ascii_sprintf_g: 1.061s - 2452395 (byte)
- binary: 2.689s - 27629527 (byte)

Are you guys happy now? If I or anyone else had bothered to do this last approach first, it would all have been as clear as water a long time ago :rolleyes:

Want more performance out of OpenFOAM? Then please suggest proven file storage software/technology, instead of simply complaining about it ;)

... And I'll go first :cool:: LZO and LZ4 are high speed compression algorithms that offer impressive compression/decompression speeds, where the decompression is almost as fast as memcpy. These compression algorithms don't offer as much compression ratios as gzip and bz2, but for most cases, they could offer improved throughput to disk for writing such files when the data is in binary format.

Best regards,
Bruno

Very interesting tests. I'm glad that it explains why in my previously highlighted cases binary is significantly faster than ASCII. I tend to use OpenFOAM with bespoke utilities that require ASCII output. Since my original 'discovery' 6 months ago I've been running all of my OpenFOAM calculations in binary and later running the foamFormatConvert if I need any output in ASCII. It may seem a roundabout way of going about the problem but it's very fast.

Perhaps I'm misinterpreting what I've read on your github page, but your experiments don't explain why running ASCII compressed is faster than running in uncompressed? My findings, which are repeatable:

Binary Uncompressed - Fastest
ASCII Compressed - ~25% Slower
ASCII Uncompressed - ~40% Slower
Binary Compressed - ~115% Slower

Hi Jason,

Quote:

Originally Posted by glypo (Post 525811)

It may seem a roundabout way of going about the problem but it's very fast.

There is a nice expression among the engineering community: Optimum is enemy of good. And that's a good example of what happens when we need an optimal solution :)

Quote:

Originally Posted by glypo (Post 525811)

Perhaps I'm misinterpreting what I've read on your github page, but your experiments don't explain why running ASCII compressed is faster than running in uncompressed?

Ooops, sorry, it felt so obvious to me when I saw the results, that I simply was too mad to even explain exactly what's happening.

If we look at my artificial results from the 6th approach:

Quote:

Then comes the 6th diagnosis approach I took, where I wrote my own simplistic codes in C++ to write a double array of 5000000 (5 million) values to file in ascii and binary. Results:
- ascii: 3.102s - 52180111 (byte)
- ascii_sprintf_f: 3.911s - 69767509 (byte)
- ascii_sprintf_g: 2.609s - 52180111 (byte)
- binary: 0.142s - 40000000 (byte)
And here's the kick to the nuts, if we run gzip on the resulting files for these 4 tests:
- Then comes the 6th diagnosis approach I took, where I wrote my own simplistic codes in C++ to write a double array of 5000000 (5 million) values to file in ascii and binary. Results:
  - ascii: 3.102s - 52180111 (byte)
  - ascii_sprintf_f: 3.911s - 69767509 (byte)
  - ascii_sprintf_g: 2.609s - 52180111 (byte)
  - binary: 0.142s - 40000000 (byte)
- And here's the kick to the nuts, if we run gzip on the resulting files for these 4 tests:
  - ascii: 1.062s - 2452385 (byte)
  - ascii_sprintf_f: 6.208s - 29881711 (byte)
  - ascii_sprintf_g: 1.061s - 2452395 (byte)
  - binary: 2.689s - 27629527 (byte)

Thanks to RAM cache, the file saving part is pretty much ignorable. So if we add the two timings, i.e. array interpretation to ascii + compression, we get these times:

ascii compressed: 3.102s + 1.062s = 4.164s
ascii_sprintf_f compressed: 3.911s + 6.208s = 10.119s
ascii_sprintf_g compressed: 2.609s + 1.061s = 3.67s
binary compressed: 0.142s + 2.689s = 2.831s

Now if we sort the timings, and ignoring the "ascii_sprintf_f" test (which is really bad):

binary: 0.142s
ascii_sprintf_g: 2.609s
binary compressed: 2.831s
ascii: 3.102s
ascii_sprintf_g compressed: 3.67s
ascii compressed: 4.164s

In comparison to yours:

Quote:

Originally Posted by glypo (Post 525811)

My findings, which are repeatable:

Binary Uncompressed - Fastest
ASCII Compressed - ~25% Slower
ASCII Uncompressed - ~40% Slower
Binary Compressed - ~115% Slower

Ah, now I see why it isn't very clear. The problem with artificial results is that they can easily be sub-optimum or too optimized. In addition, disk speed and cache dumping is not taken into account in these specific tests.

The explanation - based on these results and my experience on this topic (data storage and compression) - is as follows:

The binary example of this artificial example is overly optimized, which is why it's not clear why when compressed it would be the slowest one. Compressing binary data with the ZIP algorithms usually isn't as fast as when compressing text data, because binary data is less likely to have repeating patterns that can be added to a dictionary. Therefore, it spends more time checking the dictionary back and forth, in comparison to the compression of text data, which usually has a much higher order of repeatability of patterns, i.e. faster lookups on the dictionary under construction.
- The result: Binary data takes a whole lot of time to compress, specially for CFD data, where patterns are rarer in such data. Nonetheless, saving to disk can be faster in most cases, since the resulting files are usually smaller.
  - OK, still not very clear on this one. What I mean is that in comparison to uncompressed binary, the compressed binary is usually smaller and therefore is faster when it comes to saving to the disk itself.
The "ascii" examples show various ambiguities, since it all depends on how optimized is the conversion, how easily are the numbers converted to text and finally the subsequent storage on disk. In addition, the resulting files can be considerably larger, hence disk speed comes into play, which is not accounted for in these artificial tests.
The artificial tests are using the application gzip, which adds an additional layer of interpretation of the data, i.e. it has an overhead which isn't so big in OpenFOAM... or at least it shouldn't be. Nonetheless, such overhead is probably ignorable.
Now, if we compare the artificial results with the ones you have gotten, what happened is that:
1. The "ASCII compressed" is the second fastest due to the final file size being smaller than when uncompressed. The compression is fast enough to overlap and bypass the slower disk storage speed.
2. The "binary compressed" is the worst one not due to disk speed, but due to the complexity of the data being stored, as demonstrated by the artificial tests. Two of the "ascii" compressions were roughly 2.6 times faster than the binary compression. And as indicated above, the compression of binary data is proportional to how complex is the data being stored. Quoting the man gzip page:
  
  Quote:
  
  The default compression level is -6 (that is, biased towards high compression at expense of speed).
  
  The compression levels are 1 to 9, from fastest to slowest. If OpenFOAM uses the same default, it can easily explain why this happens.
  - Perhaps the best for OpenFOAM would be to be possible to control the compression level, so that in binary mode we can set to fast, which will likely not worry too much about searching for every possible pattern and instead only pick up some of the easiest ones.
    But again "optimum is enemy of good", because the fastest compression sometimes can actually make the file larger...

Let's see... another term of comparison to make it easier to visualize this whole issue:

Hard-disk speeds are currently still in the 60-150MB/s write speeds, slower if there are a lot of little files.
SSD drives are in the 200-500 MB/s write speeds (or more).
RAM speeds are somewhere between 5 and 60GB/s (or more).

The visual perspective comes up when we think about storing, for example, 5GB of data:

Hard-disk: 83-33 s
SSD: 25-10 s
RAM: 1-0.083 s

Keep in mind that compression deals mostly with CPU and RAM speeds, hence the disk speed in comparison to compression can still be 10 or 100 times of difference in the time scale, at least for easy compressions.

I hope this is now clearer? If not, I can try and do some theoretical graphs, to demonstrate how much time is spent on each operation.

Best regards,
Bruno

Bruno,

This is now crystal clear, thank you for taking the time to explain. This kind of information is very interesting. It's incredible to see that the times in interpretation may vary the calculation time notably. It is also pleasing to learn this isn't a problem specifically with OpenFOAM and is instead a computing phenomenon that is repeatable outside of OpenFOAM.

Many thanks for your investigation, I hope many other read this thread, very useful knowledge to have for speeding up calculation time.

Happy New Year,
Jason