HDF5 IO library for OpenFOAM
I have for some time developed an HDF5 IO library for OpenFOAM. This library can write the results from a simulation into a HDF5 archive instead of the usual (horrible) file-based structure OpenFOAM use as default. The major benefits does show up when you are increasing the number of processes (say in the range 1000-10000) and want to write more than a few timesteps to the disk. It is also highly useful if you are using the Lagrangian particle functionality of OpenFOAM, as this produces ~50 files per timestep per process. A nice addition is that the savings in terms of disk space is significant, however this depend on what IO format you compare against (ASCII, binary, with or without compression).
When the simulation is finished, the HDF5 archive can be parsed, and an XDMF metadata file written. This XDMF file can be opened in for example ParaView, VisIT or Ensight and the visualization is performed as for any other OpenFOAM case. Another benefit is the ability to easily load the data into a tool like Matlab or Python to perform calculations or processing of the results. Personally I have used this to process data from fluid-particle simulations. The code is found in a Github repository: https://github.com/hakostra/IOH5Write together with some installation instructions and hints. I hope that this code can be useful for the OpenFOAM community, and in special those of you that have access to a HPC system. In case any of you have any suggestions for improvements, please feel free to use this thread to discuss it. |
Hi Håkan,
That is interesting, and I have myself been thinking of how to change the write format in OF. My motivation was long simulations, where I needed to output 2400 time folders for a lot of post-processing. The simulation was decomposed on 6 processors and each time folder contained 35-45 individual files, thus 0.5M-0.65M files. Essentially, this should not have been a problem, because most of the files are based on the faMesh, so they are pretty small (say 1KB), but if you are downloading these from a cluster with a overloaded/slow infrastructure, it will take ages with a lot of small files. I see that you are doing it through the functionObjects, so essentially OF keeps on doing its own outputting, so my question is, whether you have considered to make your code into an additional option in the controlDict at the same level as ascii, binary, compressed, uncompressed? It will be somewhat more interfering in the core of OF, but on the other hand the outputting would not be a dual process. Kind regards Niels |
Thank you for your interest in my work. If you ever try to compile it and try it out, I would appreciate your feedback and suggestions for improvement.
Your case is an example on why I developed this code. Even tough I never have been in the situation where I need 2400 timesteps written to disk, I often want to decompose the case massively, running it on several hundreds or thousands of processes. As for your case, my simulations produce approx. 40-50 files per process per timestep, hence the total number of files would become 45*1000=45 000 per timestep for 1000 processors. This is very problematic, especially on a parallel file system designed to handle a few, very large files. As far as I know, there are no HPC file systems on the market that are designed to cope with this amount of files of that size in an efficient manner. Regarding the implementation, I think the current way is a good way, as it allows for (relatively) easy transitions between OpenFOAM versions.I do not need a single modification to the OpenFOAM core, and hopefully the amount of work needed when a new OF version is released is limited. For example, when going from 2.1.1 to 2.2.0 (or from 2.1.x to 2.2.x if you prefer that), I only needed to change one single line of code (if my memory is not playing with me). Another factor for doing it the current way is that there is no restart functionality in the HDF5-plugin, i.e. currently you cannot take a field from the HDF5 file and use it as an initial condition for the restart of simulations. Therefore, I always specify a few writes in the "native" way (perhaps once every approx. 6-24 hours of walltime), in this way I can always restart a simulation in case of a crash. |
Quote:
In any case, thanks for sharing your code! Even "just" for postprocessing, it's quite a nice thing to have :) |
Quote:
Quote:
And in case there is a need for f.ex. restarting of simulations, I think it would be fairly easy to create a "HDF5ToFoam" converter, based on many of the "xxxxToFoam" converters already available. |
A short update: I have now made a simple Pythin-progran, that uses h5py to read the metadata from the HDF5-files and write the corresponding XDMF-files. Both field data and Lagrangian clouds are supported, and all attributes present will be included in the file. One XDMF-file is generated for the field (mesh) data, and one for each cloud.
The profram is called 'writeXDMF.py', and a help message is displayed if you run it with '--help' argument. It require Python 3. The program/script is installed to $FOAM_USER_APPBIN when you run the ./Allwmake script. 'writeXDMF.py' makes the attached Matlab-files obsolete, however I have not yet removed them from the repository in case someone will use them as a basis for further work in Matlab. My next area of focus will be to clear up some really, really, bad code in the writer module itself... |
Greetings to all!
@Håkon: I picked up on this thread when you made the recent post above. This is a very nice function object and I've taken the liberty of adding a quick reference page for it at openfoamwiki.net: http://openfoamwiki.net/index.php/Contrib/IOH5Write - Feel free to update the wiki page! It's accessible from here: http://openfoamwiki.net/index.php/Ma...nction_objects And I have also been wondering on how to add an optional input/output file format for field files in OpenFOAM, but I was thinking more along the lines of having an in-place replacement for OpenFOAM's IOstream related classes. In fact, by using SQLite. But HDF5 makes a lot more sense! Although using HDF5 would require considerably more hacking than a mere replacement for IOstream... mmm... then again, maybe it wouldn't be all that hard. Best regards, Bruno |
Dude,
This is super awesome! I will try this out. I have a couple of questions though.. feel free to ignore them. What you've done is more than enough!! 1. Have you benchmarked/recorded the speed up in write time? Esp. for large parallel cases? I'm running a case with 1760 and 4600 procs now... will be super happy if this will speed things up! 2. Does it make loading of large data sets any faster in Paraview? I have a case running with 60 million cells and another with 150 million? I'd be blessed if this works out to be faster! Thanks again. Big fan! |
1: I have done some benchmarking, yes. My conclusion is:
If you end up testing it, I would really appreciate some feedback! But please remember that there are some limitations... I mainly developed this as a way of storing large amounts of particle data (order of magnitude 200 GB) and have not cared too much about flow fields. |
Thanks. I think this is awesome and the way to go for future large parallel datasets.
As far as your comparison to uncompressed ASCII goes, I think it would be better to compare against the binary output in OpenFOAM. I think switching from uncompressed ascii to compressed ascii to binary itself results in the savings like you mention. However, I think the IO would be greatly improved simply because of writing to one file using optimized HDF5 rather than multiple thousand files... not to mention the ease of handling the files if you're transferring them to a different visualization cluster. You mention in your README file that you haven't implemented writing out the boundary mesh and data simply because you are lazy. Could you tell me how to do that? I wouldn't mind implementing it. btw.. I got your code to run on OpenFOAM-2.1.x and python 2.6. Required some changes. I think I'll fork your repo and upload it there. |
Never mind explaining the boundary data part. I just realized that as far as XDMF is concerned, there's no difference between a volume element and a face element.... it treats both as cells. You've already written the point data out. So I just need to add the boundary topology at the end with quad/tri elements, corresponding data and almost no modifications to the XDMF file. I think that will work.
However I have a mesh that's rotating with no topology change but the geometry points are changing. So this currently requires a lot more work. I'll get to it some day! |
Quote:
Quote:
Anyways, see https://github.com/hakostra/IOH5Writ...h5Write.C#L174 BTW: I have actually NEVER tried to use this on a dynamic mesh... And as you correctly states in your previous post, the point of this code is not to save space on the disk, it is to make ones life easier when working on large clusters and postprocessing these large datasets. As an example, I work on a simulation with 50 million Lagrangian particles at the moment, and opening the HDF5 dataset in Python, calculating statistics, making plots and distributions based on these particle data is EASY. Parsing the OpenFOAM file format to do the same would have required a lot of coding just to read in particle locations and velocities. |
1 Attachment(s)
congrats!! very interesting project!! I hope it could improve the parallel visualization of big simulations.
However I am not able even to run the tutorial, please find attached the log file of compilation I am using Ubuntu 12.04 + OF 2.2.2 + system HDF5 and system OMPI That's the error I am getting during running: Code:
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) MPI-process 3: |
I most certainly think that your HDF5 version is too old. I know that I am using some features of the HDF5 library that is introduces recently, but I do not know exactly where the version cut-off is wrt. compatibility. Perhaps you can try version 1.8.9 or newer?
|
I tried to install the new version from source but it still gives te same error.
did you check the warning that I get during the compilation? it could be related to that Quote:
|
Thanks Haakon
I solved by switching to OF22x and using Gcc instead of Intel looking forward to testing it in big test cases Quote:
|
changing computer, same error:
Code:
HDF5-DIAG: Error detected in HDF5 (1.8.12) MPI-process 1: It only worked on my laptop with gcc4.6, icc and gcc4.7 gave me this error any hint? |
I think you can fix this error by commenting out line 222 in h5WriteCloud.C which current reads
" H5Sclose(fileSpace); " Thanks Haakon for developing and releasing this tool. |
I am sorry for mt late reply in this issue that have come up here. I want to comment on a few things:
1: Line 222 of h5WriteCloud.C is now removed. Thanks for the bug report! 2: I am doing all development on Gcc, so If anyone have any problems with Intel Compilers, please let me know, and I will check it out. I have access to Icc as well, but does not use it on a daily basis. 3: I think you will need HDF5 version equal to or above 1.8.9 independent on this error/bug, but do not take that version for granted. It now works for me, with Gcc 4.8, Linux Mint 16 and OpenFOAM 2.2.x, please let me know if anyone else encounter any issues. |
Thanks Haakon but it is still not working with OF 2.2.2 and Icc
Code:
h5Write::fileCreate: Quote:
|
I will try HDF5 1.8.12 myself tomorrow or Wednesday (I am very busy tomorrow) and see what I get. I think I am using HDF5 1.8.11 myself right now. Have you checked that you have write permissions in the folder?
|
thanks Haakon
sure, with OF22x and Gcc it works |
1 Attachment(s)
Thanks Håkon for the HDF5 Plugin.
I am currently porting it to the foam-extend-3.0 version. I have modified all the environment variables different from OF-2.2.x and foam-extend-3.0. I am currently getting the following error in the core hd5Write.H, could you please comment on this error. Cheers, Krishna |
Have there been further developments since the last post (almost 1 year ago)? This is a pace-setting issue for true HPC (1000+ core) simulations. My HPC admins are demanding that we come up with a solution, and I wanted to check on status before digging in.
|
The latest update is on Github (https://github.com/hakostra/IOH5Write). It does work on OF 2.3.x as far as I know. As with any software there is some installation problems if your environment is not configured properly. Use a recent HDF5 version. Try to read the installation guidelines.
The "known bugs and limitations" section in the README file (see Github) does still apply. I should also probably emphasize that due to missing support in the XDMF format, polyhedral cells is not supported. I would love to see polyhedral cell support in XDMF, but I doubt that's going to happen any time soon. One possible workaround for this issue is perhaps to rewrite the whole thing to use CGNS, that has support for polyhedral cells. However, it does not have support for discrete particles, and that was a showstopper for me when I wrote this library, hence the current HDF5+XDMF approach. |
Wrting Boundary Data to HDF5?
Hi Haakon,
How do we implement to write the boundary data? Cheers! |
Quote:
It's a great tool you've written! I've been writing on a CGNS exporter my self for a while. I did this since I need polyhedral support. Now ParaView's CGNS reader doesn't support polyhedrals so I had to write my own reader. I got it to work rather well but when I started to implement a parallel version I got into all sorts of trouble with CGNS. Now since CGNS uses HDF5 in the background I've come to the conclusion that it might be better to just stick to HDF5. Since I need polyhedrals I think I need to develop my own HDF5 writer. My idea is basically to write the mesh fields just as they are and keep a similar structure as OpenFOAM already has. Then I would have to develop a reader or paraview which would read the hdf5-file and use vtkPolyhedron. Do you see any problems with this approach or can I leverage on some of the work you have done? Best Regards Nicolas |
I think there is another way. :)
You could extend the XDMF format to handle polyhedral cells. :) There is already a bug report concerning this topic and I discussed this recently with some XDMF developers, see here: http://www.kitware.com/pipermail/xdm...er/000895.html I would start by modifying XdmfTopologyType.cpp, i.e. add polyhedral cell support similar to the POLYGON implementation. Extending the VTK/ParaView readers to handle shouldn't be too difficult either, since polyhedral cells are already supported in VTK. You can get the latest XDMF source like this: git clone git://public.kitware.com/Xdmf2.git However, I would probably recommend you to implement this directly in VTK, which contains a more or less full copy of the XDMF source tree. You get the latest VTK source from here: https://gitlab.kitware.com/vtk/vtk And this is the file, where I would start to implement the polyhedral cell support: https://gitlab.kitware.com/vtk/vtk/b...gyType.cpp#L70 Hope this helps! -Armin |
Quote:
Thanks for the insight. However I don't really see the benefit of going via XDMF. It could be that I'm just not enlightened. Since we already have to write a reader why not skip XDMF. That was my main lesson learned when I implemented the cgns reader writer. It took quite some time of going through the docs. And in the end I discoverer that the standard is so free it is near impossible to write a general reader. So I decided that my reader would have to be specific to my foamwriter. And then I had problems compiling cgns for parallel applications. It was to obvious that I should ditch the CGNS standard. And just go with HDF5. Which is what CGNS uses in the background any ways. Now as I see it the benefit with not using XDMF is that we could make a reader that can read while the simulation is running. And we can make it smart on how to read the results in parallel since would keep the decomposition. And there would be no need have an intermediate step between simulation and postprocessing. Now I've only started to work with it, will see how much I get done. It's a spare time project so it will take a while. If you or some one else is working on something similar or want to pich in I'm happy to discuss the details! Nicolas |
Thanks for elaborating your view.
I certainly agree that the CGNS standard is "too free" and writing a general reader is and rather big task. However, the XDMF standard, is far more specific and there exist already readers for VTK/ParaView and VisIt. So, in my opinion, the XDMF standard is very well suited for CFD applications. I have a few comments below to further the discussion. Quote:
Concerning the VTK/ParaView XDMF reader: I think they are already quite smart concerning the parallel data handling, but I have no general overview of that. Quote:
It's not (yet) publicly available, but I'm happy to share it if you are interested in. (I actually should just put it to some public git repo.) -Armin |
Quote:
If it includes patches I would definitely be interested! Even if I don't choose to go with xdmf it would be interesting to see how you solved some of the issues I currently have. Right now if I use my function object from a custom app "foamToHDF5" then everything is fine. I can write in parallel and see no issues. However if I include my function object in say icoFoam I get and segmentation fault when when icoFoam does "finalize". The writing goes fine. It's just a bit unclean to exit on a segmenation fault. The only difference I can see is that my app includes hdf5.h and dynamically links libhdf5.so at compile time while icoFoam would do it at runtime. Is it something you've seen? Another point I have is that by organizing the data very close to how foam already does simplifies converting back. We would have faces, owners, neighbours, points exactly as they are now. Is it possible with XDMF? The link you posted to the discussion of polyhedral cells for XDF is private so I can't see it. I can offer one point of experience from CGNS. At least in CGNS you can't mix polyhedral cells with structured types. For structured types (tetras, hexas and so on). All you need is a list of points for each cell. (it's actually slightly more complicated but not much). But for a polyhedral you need two lists. One defining all the faces in the mesh and one list defining the cells. (Although I've come to realize that defining the faces i a separate list is redundant.) In any case that makes merging a polyhedral reader with the current CGNS reader difficult since the structure is totally different. I suspect we might run in to similar issues if we implement polyhedrals in XDMF. But I have no experince of XDMF so I don't really know. Best Regards Nicolas |
Just a few quick comments now, I will get back to you concerning the XDMF surface writer for OpenFOAM.
Quote:
http://www.kitware.com/pipermail/xdm...er/000895.html Quote:
http://www.xdmf.org/index.php/XDMF_Model_and_Format Cheers, Armin |
Has anyone been able of updating this library to the new function objects structure of openfoam>3.0 ? I would like to contribute to the implementation of this function object in OpenFOAM v16 though it would be also nice to know if someone has already worked on this
When using OpenFOAM for big unsteady computations the amount of files outputted by OpenFOAM makes it impossible to use clusters with number of files limitations, therefore it seems really interesting to use HDF5 to write the solution data |
Quick note to ssss: In OpenFOAM+, they have been implementing the interface with ADIOS, for optimized data storage: http://www.openfoam.com/version-v161...ty-parallel-io
Nonetheless, support for HDF5 will likely be welcome too, given that having other options for IO is always good to have :) |
Quote:
Thank you for your answer :) I have already tryed to use ADIOS with OpenFOAMv16.12+ and we saw important reductions in the time used by the solver to write the fields. However, ADIOS does not change the file structure or the file count of the OpenFOAM's solution file. When dealing with thousands of cores and unsteady simulations, file count can grow to millions of files pretty quickly, that's why hdf5 could be a good solution to the problem although the current HDF5 function object is not capable of reading HDF5 files into OpenFOAM. I will definitely try to implement this function object into OpenFOAM>3.0 when I have time |
Hi ssss,
Another quick note for you, in case you haven't seen it yet, a new feature was introduced into OpenFOAM's Foundation development repository yesterday: https://openfoam.org/news/parallel-io/ I haven't tested it yet, but the information provided there seems to indicate that this new feature will store each field into a single file for all time steps. Best regards, Bruno |
Hello everybody,
I want to spend some time until Christmas to update the IOh5Write library from Hakon so that it is compatible with new openfoam versions (e.g. OF6,7 or OFV1806/1812/1904) and also writes boundary data. I think it is a useful library and it is worth updating it and share it with the community so that it can also be improved by the community. I would not consider myself a good programmer or a programmer at all, and that's why I want to ask the experts if somebody can give me some hints how to start best. Is somebody maybe also interested in this and wants to collaborate? Has somebody already extended the code to write out boundary data? What I realized up to now: Regarding OF6 the code in the outputFilterFunctionObject.H must be replaced and updated to the new OF structure. Still have not get the overview here where to attack =) I appreciate any comments/help ... =) |
1 Attachment(s)
Hello dear openfoam community,
I managed to extend Hakon's code so that it is now possible to write out patch/boundary field data as well. It is only a prototype of code and therefore has some limitations. Although, it should be quite easy to get rid of the limitations, I first want to update the code in such a way that it is usable with newer openfoam versions ... lets say from openfoam6 or v1806 on (current state is that it works for openfoam versions 2.2.x and 2.3.x, only tested with 2.3.1). But right now I have no clue how to do that and I would really appreciate any help regarding that issue!!! I think the main part of work is done regarding the patch/boundary writing and it would be awesome if the community could help to get the code working in latest openfoam versions so that everybody can benefit from it. If the code works in latest OF versions, next step would be to optimize it in such a way that the h5 writing is more efficient. One main advantage over collated writing (fileHandler collated) is that we do not have to reconstruct the fields to be able to use the data in paraview (decomposed view in paraview is not working with collated fileHandler). For me, the reconstructing takes a long time because I need a lot of time steps in my simulations. Attached you'll find the code "IOH5Write_b". The limitations are the same as for Hakon's code (only works in parallel, tensor fields not implemented, no clean simulation ending) but with some additional limitations for patch/boundary writing (works only for quadrilateral faces, does not work for boundaries with boundary condition "empty"). There are two tutorials included (h5cavity_b, h5pitzDaily_b) that show how to write patch/boundary data. After the simulation, you can use the python code "writeXDMF_b.py" to generate xdmf files for using internal field and patch/boundary field data in paraview. |
1 Attachment(s)
Dear openfoam community,
I extended my previous code that was based on Hakon's work. It is now possible to use newer versions of openfoam. Like in the previous version it is possible to write patch or boundary data to the hdf5 archive (additionally to the internal field data). I adapted the functionObject for openfoam6.0. The code might also work with other versions but it was only tested with openfoam6. Attached you'll find the code "IOH5Write_b_OF6". Again, the limitations are the same as for Hakon's code (only works in parallel, tensor fields not implemented yet, no clean simulation ending) but with some additional limitations for patch/boundary writing (works only for quadrilateral faces, does not work for boundaries with boundary condition "empty"). Again, there are two OF6 tutorials included (cavity_IOh5Write_b_OFv6, pitzDaily_IOh5Write_b_OFv6) that show how to use the functionObject and to write patch/boundary data. After the simulation, you can use the python code "writeXDMF_b_OF6.py" to generate xdmf files for using internal field and patch/boundary field data in paraview. I excluded the "h5WriteCloud.C" in Make/files because i get an error (see below) when compiling the code. I am not familiar with particle clouds so maybe the community can help to fix it and knows what Uc is and how to replace Uc so that the particle part of the code works in openfoam6. Code:
h5Write_b_OFv6/h5WriteCloud.C:380:68: error: ‘class Foam::KinematicParcel<Foam::particle>’ has no member named ‘Uc’; did you mean ‘U’? |
Dear OpenFOAM community,
you can find the latest code "IOH5Write_b_OF6" (works with openfoam6) in the github branch: https://github.com/ipat-fau/IOH5Writ...H5Write_b_OFv6 The previous code IOH5Write_b" (works with openfoam-2.3.1) can be found in the github branch: https://github.com/ipat-fau/IOH5Write/tree/IOH5Write_b I think Hakon's "IOHWrite" is a useful library and it is worth updating it and share it with the community so that it can also be improved by the community. I hope it is useful and maybe helps somebody. Cheers |
All times are GMT -4. The time now is 23:46. |