CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   OpenFOAM Programming & Development (http://www.cfd-online.com/Forums/openfoam-programming-development/)
-   -   HDF5 IO library for OpenFOAM (http://www.cfd-online.com/Forums/openfoam-programming-development/122579-hdf5-io-library-openfoam.html)

haakon August 22, 2013 10:32

HDF5 IO library for OpenFOAM
 
I have for some time developed an HDF5 IO library for OpenFOAM. This library can write the results from a simulation into a HDF5 archive instead of the usual (horrible) file-based structure OpenFOAM use as default. The major benefits does show up when you are increasing the number of processes (say in the range 1000-10000) and want to write more than a few timesteps to the disk. It is also highly useful if you are using the Lagrangian particle functionality of OpenFOAM, as this produces ~50 files per timestep per process. A nice addition is that the savings in terms of disk space is significant, however this depend on what IO format you compare against (ASCII, binary, with or without compression).

When the simulation is finished, the HDF5 archive can be parsed, and an XDMF metadata file written. This XDMF file can be opened in for example ParaView, VisIT or Ensight and the visualization is performed as for any other OpenFOAM case.

Another benefit is the ability to easily load the data into a tool like Matlab or Python to perform calculations or processing of the results. Personally I have used this to process data from fluid-particle simulations.

The code is found in a Github repository: https://github.com/hakostra/IOH5Write together with some installation instructions and hints. I hope that this code can be useful for the OpenFOAM community, and in special those of you that have access to a HPC system. In case any of you have any suggestions for improvements, please feel free to use this thread to discuss it.

ngj August 25, 2013 06:29

Hi Håkan,

That is interesting, and I have myself been thinking of how to change the write format in OF. My motivation was long simulations, where I needed to output 2400 time folders for a lot of post-processing. The simulation was decomposed on 6 processors and each time folder contained 35-45 individual files, thus 0.5M-0.65M files.

Essentially, this should not have been a problem, because most of the files are based on the faMesh, so they are pretty small (say 1KB), but if you are downloading these from a cluster with a overloaded/slow infrastructure, it will take ages with a lot of small files.

I see that you are doing it through the functionObjects, so essentially OF keeps on doing its own outputting, so my question is, whether you have considered to make your code into an additional option in the controlDict at the same level as ascii, binary, compressed, uncompressed? It will be somewhat more interfering in the core of OF, but on the other hand the outputting would not be a dual process.

Kind regards

Niels

haakon August 26, 2013 03:19

Thank you for your interest in my work. If you ever try to compile it and try it out, I would appreciate your feedback and suggestions for improvement.

Your case is an example on why I developed this code. Even tough I never have been in the situation where I need 2400 timesteps written to disk, I often want to decompose the case massively, running it on several hundreds or thousands of processes. As for your case, my simulations produce approx. 40-50 files per process per timestep, hence the total number of files would become 45*1000=45 000 per timestep for 1000 processors. This is very problematic, especially on a parallel file system designed to handle a few, very large files. As far as I know, there are no HPC file systems on the market that are designed to cope with this amount of files of that size in an efficient manner.

Regarding the implementation, I think the current way is a good way, as it allows for (relatively) easy transitions between OpenFOAM versions.I do not need a single modification to the OpenFOAM core, and hopefully the amount of work needed when a new OF version is released is limited. For example, when going from 2.1.1 to 2.2.0 (or from 2.1.x to 2.2.x if you prefer that), I only needed to change one single line of code (if my memory is not playing with me).

Another factor for doing it the current way is that there is no restart functionality in the HDF5-plugin, i.e. currently you cannot take a field from the HDF5 file and use it as an initial condition for the restart of simulations. Therefore, I always specify a few writes in the "native" way (perhaps once every approx. 6-24 hours of walltime), in this way I can always restart a simulation in case of a crash.

akidess August 26, 2013 03:40

Quote:

Originally Posted by haakon (Post 448022)
Another factor for doing it the current way is that there is no restart functionality in the HDF5-plugin, i.e. currently you cannot take a field from the HDF5 file and use it as an initial condition for the restart of simulations. Therefore, I always specify a few writes in the "native" way (perhaps once every approx. 6-24 hours of walltime), in this way I can always restart a simulation in case of a crash.

I think this was pretty much Niels' point - having full HDF5 capabilities (not just for postprocessing) would be great! I'd even try submitting it to the OpenFOAM foundation: http://www.openfoam.org/contrib/unsupported.php

In any case, thanks for sharing your code! Even "just" for postprocessing, it's quite a nice thing to have :)

haakon August 26, 2013 07:22

Quote:

Originally Posted by akidess (Post 448026)
I think this was pretty much Niels' point - having full HDF5 capabilities (not just for postprocessing) would be great! I'd even try submitting it to the OpenFOAM foundation: http://www.openfoam.org/contrib/unsupported.php

I think implementing full HDF5 capabilities in OF is a great task, since, as far as I can see, there is no central IO class or library. As long as the various parts of the code just dump data into streams ending up in files, it is a great job to change this behaviour. since every single part of the code that does IO needs to be modified.

Quote:

Originally Posted by akidess (Post 448026)
In any case, thanks for sharing your code! Even "just" for postprocessing, it's quite a nice thing to have :)

I would dare to say that "just" postprocessing is the main task and purpose of writing data. I really cannot find very many good reasons for writing gigabytes (or terabytes) of data to the disk without the need of any postprocessing or visualization of the data.

And in case there is a need for f.ex. restarting of simulations, I think it would be fairly easy to create a "HDF5ToFoam" converter, based on many of the "xxxxToFoam" converters already available.

haakon November 18, 2013 05:31

A short update: I have now made a simple Pythin-progran, that uses h5py to read the metadata from the HDF5-files and write the corresponding XDMF-files. Both field data and Lagrangian clouds are supported, and all attributes present will be included in the file. One XDMF-file is generated for the field (mesh) data, and one for each cloud.

The profram is called 'writeXDMF.py', and a help message is displayed if you run it with '--help' argument. It require Python 3. The program/script is installed to $FOAM_USER_APPBIN when you run the ./Allwmake script.

'writeXDMF.py' makes the attached Matlab-files obsolete, however I have not yet removed them from the repository in case someone will use them as a basis for further work in Matlab.

My next area of focus will be to clear up some really, really, bad code in the writer module itself...

wyldckat November 23, 2013 09:04

Greetings to all!

@Håkon: I picked up on this thread when you made the recent post above. This is a very nice function object and I've taken the liberty of adding a quick reference page for it at openfoamwiki.net: http://openfoamwiki.net/index.php/Contrib/IOH5Write - Feel free to update the wiki page!
It's accessible from here: http://openfoamwiki.net/index.php/Ma...nction_objects


And I have also been wondering on how to add an optional input/output file format for field files in OpenFOAM, but I was thinking more along the lines of having an in-place replacement for OpenFOAM's IOstream related classes. In fact, by using SQLite.
But HDF5 makes a lot more sense! Although using HDF5 would require considerably more hacking than a mere replacement for IOstream... mmm... then again, maybe it wouldn't be all that hard.

Best regards,
Bruno

ganeshv December 1, 2013 16:41

Dude,

This is super awesome! I will try this out. I have a couple of questions though.. feel free to ignore them. What you've done is more than enough!!

1. Have you benchmarked/recorded the speed up in write time? Esp. for large parallel cases? I'm running a case with 1760 and 4600 procs now... will be super happy if this will speed things up!
2. Does it make loading of large data sets any faster in Paraview? I have a case running with 60 million cells and another with 150 million? I'd be blessed if this works out to be faster!

Thanks again. Big fan!

haakon December 2, 2013 03:56

1: I have done some benchmarking, yes. My conclusion is:
  • With few processes (relative to the case size), the difference in performance is none.
  • With large amount of processes, my HDF5 writer is faster. But it is difficult to estimate how much faster, since that of course will depend on how many timesteps you want to write, number of variables, disk system, numerical precision etc.
  • I have done all my benchmarks on a parallel file system capable of doing MASSIVE parallel I/O so my conclusions might not be valid on a small cluster with serial IO (on such a platform I suspect that HDF5 IO might be faster for smaller number of processes due to the savings in space = savings in amount to be written).
  • The number of variables to write is also significant. "My method" gives the user an opportunity to not write variables that are of no interest. If you only are interested in say, velocity and pressure, and not omega/k/epsilon etc, the gain might be larger. The comparisons and benchmarks are however based on a case where all variables present are written.
  • The main advantage is really the ability to store more data in a more optimal way. Compared to uncompressed OpenFOAM ASCII format you can store ~5 times as much data in the same space on the disk!
2: I think ParaView is dead slow anyways, I think that is more a ParaView memory handling issue than anything else. To be honest, I haven't benchmarked this, but for everyday purposes, I don't think there is that big difference.

If you end up testing it, I would really appreciate some feedback! But please remember that there are some limitations... I mainly developed this as a way of storing large amounts of particle data (order of magnitude 200 GB) and have not cared too much about flow fields.

ganeshv December 2, 2013 12:13

Thanks. I think this is awesome and the way to go for future large parallel datasets.

As far as your comparison to uncompressed ASCII goes, I think it would be better to compare against the binary output in OpenFOAM. I think switching from uncompressed ascii to compressed ascii to binary itself results in the savings like you mention. However, I think the IO would be greatly improved simply because of writing to one file using optimized HDF5 rather than multiple thousand files... not to mention the ease of handling the files if you're transferring them to a different visualization cluster.

You mention in your README file that you haven't implemented writing out the boundary mesh and data simply because you are lazy. Could you tell me how to do that? I wouldn't mind implementing it.

btw.. I got your code to run on OpenFOAM-2.1.x and python 2.6. Required some changes. I think I'll fork your repo and upload it there.

ganeshv December 2, 2013 13:54

Never mind explaining the boundary data part. I just realized that as far as XDMF is concerned, there's no difference between a volume element and a face element.... it treats both as cells. You've already written the point data out. So I just need to add the boundary topology at the end with quad/tri elements, corresponding data and almost no modifications to the XDMF file. I think that will work.

However I have a mesh that's rotating with no topology change but the geometry points are changing. So this currently requires a lot more work. I'll get to it some day!

haakon December 2, 2013 16:51

Quote:

Originally Posted by ganeshv (Post 464448)
Never mind explaining the boundary data part. I just realized that as far as XDMF is concerned, there's no difference between a volume element and a face element.... it treats both as cells. You've already written the point data out. So I just need to add the boundary topology at the end with quad/tri elements, corresponding data and almost no modifications to the XDMF file. I think that will work.

Yes, that is more or less correct. It should not be too difficult, I have just not found time to do it myself yet. Programming CFD codes is not among my core tasks, and since I have not yet needed the boundary fields, I have not written that code.

Quote:

Originally Posted by ganeshv (Post 464448)
However I have a mesh that's rotating with no topology change but the geometry points are changing. So this currently requires a lot more work. I'll get to it some day!

I certainly don't think that is too much work either. I have implemented some moving mesh functionality in the code, but it writes both the points and cells if it detect a transient simulation. And the python-script to create the XDMF files will need some refreshments too for this to work.

Anyways, see https://github.com/hakostra/IOH5Writ...h5Write.C#L174
BTW: I have actually NEVER tried to use this on a dynamic mesh...

And as you correctly states in your previous post, the point of this code is not to save space on the disk, it is to make ones life easier when working on large clusters and postprocessing these large datasets. As an example, I work on a simulation with 50 million Lagrangian particles at the moment, and opening the HDF5 dataset in Python, calculating statistics, making plots and distributions based on these particle data is EASY. Parsing the OpenFOAM file format to do the same would have required a lot of coding just to read in particle locations and velocities.

gigilentini8 January 23, 2014 07:50

1 Attachment(s)
congrats!! very interesting project!! I hope it could improve the parallel visualization of big simulations.
However I am not able even to run the tutorial, please find attached the log file of compilation
I am using Ubuntu 12.04 + OF 2.2.2 + system HDF5 and system OMPI
That's the error I am getting during running:
Code:

HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 3:
  #000: ../../../src/H5D.c line 141 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: ../../../src/H5Gloc.c line 241 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value


haakon January 23, 2014 07:55

I most certainly think that your HDF5 version is too old. I know that I am using some features of the HDF5 library that is introduces recently, but I do not know exactly where the version cut-off is wrt. compatibility. Perhaps you can try version 1.8.9 or newer?

gigilentini8 January 23, 2014 11:47

I tried to install the new version from source but it still gives te same error.
did you check the warning that I get during the compilation? it could be related to that

Quote:

Originally Posted by haakon (Post 471389)
I most certainly think that your HDF5 version is too old. I know that I am using some features of the HDF5 library that is introduces recently, but I do not know exactly where the version cut-off is wrt. compatibility. Perhaps you can try version 1.8.9 or newer?


gigilentini8 January 24, 2014 09:55

Thanks Haakon
I solved by switching to OF22x and using Gcc instead of Intel
looking forward to testing it in big test cases

Quote:

Originally Posted by gigilentini8 (Post 471472)
I tried to install the new version from source but it still gives te same error.
did you check the warning that I get during the compilation? it could be related to that


gigilentini8 January 27, 2014 04:19

changing computer, same error:
Code:

HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1:
  #000: hdf5-1.8.12/src/H5D.c line 141 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: hdf5-1.8.12/src/H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value

tried both with Icc and Gcc, OF222 and OF22x
It only worked on my laptop with gcc4.6, icc and gcc4.7 gave me this error

any hint?

dhuckaby January 28, 2014 18:00

I think you can fix this error by commenting out line 222 in h5WriteCloud.C which current reads
" H5Sclose(fileSpace); "

Thanks Haakon for developing and releasing this tool.

haakon February 3, 2014 07:44

I am sorry for mt late reply in this issue that have come up here. I want to comment on a few things:

1: Line 222 of h5WriteCloud.C is now removed. Thanks for the bug report!

2: I am doing all development on Gcc, so If anyone have any problems with Intel Compilers, please let me know, and I will check it out. I have access to Icc as well, but does not use it on a daily basis.

3: I think you will need HDF5 version equal to or above 1.8.9 independent on this error/bug, but do not take that version for granted.

It now works for me, with Gcc 4.8, Linux Mint 16 and OpenFOAM 2.2.x, please let me know if anyone else encounter any issues.

gigilentini8 February 3, 2014 10:10

Thanks Haakon but it is still not working with OF 2.2.2 and Icc
Code:

h5Write::fileCreate:
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 0:
  #000: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5F.c line 1503 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5F.c line 1274 in H5F_open(): unable to open file: time = Mon Feb  3 17:05:34 2014
, name = 'h5Data/h5Data0.h5', tent_flags = 13
    major: File accessibilty
    minor: Unable to open file
  #002: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FD.c line 987 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FDmpio.c line 1057 in H5FD_mpio_open(): MPI_File_open failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #004: /home/icardim/SOFTWARE/hdf5-1.8.12/src/H5FDmpio.c line 1057 in H5FD_mpio_open(): MPI_ERR_OTHER: known error not in list
    major: Internal error (too specific to document in detail)
    minor: MPI Error String

Quote:

Originally Posted by haakon (Post 473106)
I am sorry for mt late reply in this issue that have come up here. I want to comment on a few things:

1: Line 222 of h5WriteCloud.C is now removed. Thanks for the bug report!

2: I am doing all development on Gcc, so If anyone have any problems with Intel Compilers, please let me know, and I will check it out. I have access to Icc as well, but does not use it on a daily basis.

3: I think you will need HDF5 version equal to or above 1.8.9 independent on this error/bug, but do not take that version for granted.

It now works for me, with Gcc 4.8, Linux Mint 16 and OpenFOAM 2.2.x, please let me know if anyone else encounter any issues.



All times are GMT -4. The time now is 22:49.