CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Installation

[OpenFOAM.org] utilities with 32 bit labels works but 64 does not

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By wyldckat

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   October 16, 2018, 17:52
Default utilities with 32 bit labels works but 64 does not
  #1
Senior Member
 
Join Date: Nov 2010
Location: USA
Posts: 1,232
Rep Power: 24
me3840 is on a distinguished road
I'm working with some larger meshes and I need 64 bit integers. I have an existing installation with labelsize 32. When I set it to 64 and recompile, everything seems to compile fine.


However, it seems none of the mesh utilities work. For example, if I run mergeMeshes with the 32 bit labels it executes mergeMeshes, but for 64 I just get command not found. Can anyone help me decipher why this would be?
me3840 is offline   Reply With Quote

Old   October 20, 2018, 15:16
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick questions:
  1. How are you switching between 32 and 64 bit labels?
  2. What were the exact command you've used to build with 64-bit labels?
  3. With the 64-bit mode, run Allwmake once again and it will either complain that something went wrong during building or it might tell you that everything is up-to-date... either way, sending the output from Allwmake to a log file usually makes it easier to spot any errors... e.g.:
    Code:
    ./Allwmake > log.make 2>&1
__________________
wyldckat is offline   Reply With Quote

Old   October 20, 2018, 22:45
Default
  #3
Senior Member
 
Join Date: Nov 2010
Location: USA
Posts: 1,232
Rep Power: 24
me3840 is on a distinguished road
1. I just swapped the $WM_LABEL_SIZE in the /etc/bashrc file to be 64.
2. Nothing special, just the ./Allwmake -j 8
3. I will have to do this tomorrow and look at the logs, thanks for your help.
me3840 is offline   Reply With Quote

Old   October 24, 2018, 17:19
Default
  #4
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by me3840 View Post
1. I just swapped the $WM_LABEL_SIZE in the /etc/bashrc file to be 64.
2. Nothing special, just the ./Allwmake -j 8
3. I will have to do this tomorrow and look at the logs, thanks for your help.
Quick answer: OK, the logs will really tell us what's going on wrong... because there is no clear indication of what's wrong right now.

Mmmm... I forgot to ask, but: Which exact version/variant/fork of OpenFOAM are you using?
wyldckat is offline   Reply With Quote

Old   October 31, 2018, 10:01
Default
  #5
Senior Member
 
Join Date: Nov 2010
Location: USA
Posts: 1,232
Rep Power: 24
me3840 is on a distinguished road
This is vanilla OF 5 with just 1 or two minor changes.

I seem to have gotten to the point now where both builds will run the same applications (at least, recognize they exist) just fine. I'm not quite what difference I made, after going through the install guide several times and reconciling package differences with the cluster I have something is different and it worked? I know that's not helpful.

Perhaps there's new light to shed though. Now I have a new problem - the build with 64-bit integers will give a failure reading binary block error when opening a case that the 32-bit build has no problem with. Though after reading the mesh for some time the 32-bit build will crash giving a std::bad_array_new_length error, as the mesh is too large to open in serial, but it can at least read the mesh, apparently the 64 cannot.

Am I correct in thinking if I run Allwmake twice, and the second gives no errors, then the build was successful? There are a few errors like:
cp: cannot stat '../bin/d[agm]*': no such file or directory
in the stderr for scotch.

I am just doing a clean build and storing the stderr and stdout separately so I can review them more carefully this time.
me3840 is offline   Reply With Quote

Old   October 31, 2018, 22:58
Default
  #6
Senior Member
 
Join Date: Nov 2010
Location: USA
Posts: 1,232
Rep Power: 24
me3840 is on a distinguished road
Since the 32-bit version can read the binary mesh and the 64 cannot, could it just be because the mesh is written with 32-bit integers that the 64 doesn't know how to read? So in theory if I used the 32 to convert to ascii, then use the 64 to convert to binary, everything would be OK?
me3840 is offline   Reply With Quote

Old   November 1, 2018, 11:26
Default
  #7
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quick answers:
Quote:
Originally Posted by me3840 View Post
Perhaps there's new light to shed though. Now I have a new problem - the build with 64-bit integers will give a failure reading binary block error when opening a case that the 32-bit build has no problem with. Though after reading the mesh for some time the 32-bit build will crash giving a std::bad_array_new_length error, as the mesh is too large to open in serial, but it can at least read the mesh, apparently the 64 cannot.
From your description, it does seem like the case was originally meshed with 64-bit labels, which would explain why the 32-bit build could crash when trying to reconstruct it. More details after the last quote.

Quote:
Originally Posted by me3840 View Post
Am I correct in thinking if I run Allwmake twice, and the second gives no errors, then the build was successful? There are a few errors like:
cp: cannot stat '../bin/d[agm]*': no such file or directory
in the stderr for scotch.
That's mostly a warning an not an error. What I mean is that it (almost?) always happens and the Scotch libraries that we need are built without problems.

Quote:
Originally Posted by me3840 View Post
I am just doing a clean build and storing the stderr and stdout separately so I can review them more carefully this time.
The downside of that is that we loose context of when the error happens. I usually search for the word "Error" and can find said errors fairly easy.

Quote:
Originally Posted by me3840 View Post
Since the 32-bit version can read the binary mesh and the 64 cannot, could it just be because the mesh is written with 32-bit integers that the 64 doesn't know how to read? So in theory if I used the 32 to convert to ascii, then use the 64 to convert to binary, everything would be OK?
Yes, in principle, you can convert from binary to ascii in the same bit version and then read it with the other bit build.
However, there are a few limitations:
  1. If there are more than 2^31 (2147483648) elements of anything (e.g. points, faces, cells), then the 32-bit version will not be able to process the data, because it goes beyond the 32-bit integers (the missing bit is for the sign +-).
  2. Reading in binary means that the arrays in memory are identical in size to the ones in the files. Therefore, if they don't match, there will be problems, either due to a crash due to missing memory (had the wrong number of elements to read in the first place), or because it only has half of the data (64-bit reading 32-bit integers).
If I remember correctly, you can use foamFormatConvert to convert between modes (ascii or binary), i.e.:
  1. Modify in "system/controlDict" to the target format that you want.
  2. Run the application foamFormatConvert.
  3. Use the argument "-help" for more details, e.g.:
    Code:
    foamFormatConvert -help
lourencosm likes this.
wyldckat is offline   Reply With Quote

Old   November 1, 2018, 13:22
Default
  #8
Senior Member
 
Join Date: Nov 2010
Location: USA
Posts: 1,232
Rep Power: 24
me3840 is on a distinguished road
Thanks Bruno. Yes, foamFormatConvert was what I was using (though it doesn't work on the faces file for some reason, I had to hack it to get it to work, see Trouble using foamFormatConvert)


Minor correction. The mesh was built with 32-bit labels from a 3rd party mesher, and will read in to OpenFOAM with 32-bit labels but only in parallel - in any serial operation (besides decomposition), it will crash, giving the error: std::bad_array_new_length


Now in my potentially naive interpretation, this is because some data structure when the grid is loaded (but not the grid itself) becomes too large for a 32-bit integer. When loaded in parallel, this structure is split into multiple pieces among different processes and so it never hits the limit. But in serial, all those pieces are in 1 machine's memory, causing it to crash. So that got me thinking I should try the 64-bit build. I've encountered that sort of problem with other codes on very large meshes.

I will shortly try the ASCII conversion, but from your words it seems trying to natively read the binary files between them will never work, and that makes sense to me. I would have liked the later process more, but it is what it is. I suppose I could just make everything I do 64-bit, though I wonder what the performance impact will be.

I don't understand what data structure it is that causes the serial session to crash but not the parallel one. In this case I was trying to use mergeMeshes, but simply running checkMesh or anything of that nature will cause a serial process to crash. Interestingly enough there aren't any problems with decomposePar (and I think reconstructPar works too), and I';m guessing there will be no problem with foamFormatConvert...
me3840 is offline   Reply With Quote

Old   November 1, 2018, 16:30
Default
  #9
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Quote:
Originally Posted by me3840 View Post
Thanks Bruno. Yes, foamFormatConvert was what I was using (though it doesn't work on the faces file for some reason, I had to hack it to get it to work, see Trouble using foamFormatConvert)
Interesting... With which specific version/fork of OpenFOAM is this happening with?

Quote:
Originally Posted by me3840 View Post
Minor correction. The mesh was built with 32-bit labels from a 3rd party mesher, and will read in to OpenFOAM with 32-bit labels but only in parallel - in any serial operation (besides decomposition), it will crash, giving the error: std::bad_array_new_length


Now in my potentially naive interpretation, this is because some data structure when the grid is loaded (but not the grid itself) becomes too large for a 32-bit integer. When loaded in parallel, this structure is split into multiple pieces among different processes and so it never hits the limit. But in serial, all those pieces are in 1 machine's memory, causing it to crash. So that got me thinking I should try the 64-bit build. I've encountered that sort of problem with other codes on very large meshes.
OK, makes sense and sounds like you're in the right track, although I'm a bit confused by how many elements did the mesh really have? Do you have an account of how many points, faces and cells it has?

Quote:
Originally Posted by me3840 View Post
I will shortly try the ASCII conversion, but from your words it seems trying to natively read the binary files between them will never work, and that makes sense to me.
If you're able to convert from binary to ascii in parallel, then it should work OK. Don't forget to increase the "writePrecision" in "controlDict" to something higher than 14 (16-18 should be enough), so that you don't loose floating point precision between formats.

Quote:
Originally Posted by me3840 View Post
I would have liked the later process more, but it is what it is. I suppose I could just make everything I do 64-bit, though I wonder what the performance impact will be.
Costs more RAM to use, but it shouldn't affect the CPU performance too much, assuming the CPU handles 64-bit integers with the same cost as 32-bit integers, when it comes to registers... although it may affect the occupancy of the L1-L3 cache in the CPU, due to the higher sizes of the lists...

Quote:
Originally Posted by me3840 View Post
I don't understand what data structure it is that causes the serial session to crash but not the parallel one. In this case I was trying to use mergeMeshes, but simply running checkMesh or anything of that nature will cause a serial process to crash. Interestingly enough there aren't any problems with decomposePar (and I think reconstructPar works too), and I';m guessing there will be no problem with foamFormatConvert...
My guess is that there is a collision between a signed and a not signed 32-bit integer:
  • A signed 32-bit integer can have values from -2^31 (-2147483648) to +2^31 (+2147483648) (not sure in which it has last digit as 7 and not 8, due to accounting for the zero).
  • An unsigned integer will go from 0 to 2^32 (4294967296).
So if your exported mesh has more than 2^31 elements, then you will have to switch to 64-bit integers for proper processing, or always run things in parallel.
wyldckat is offline   Reply With Quote

Old   November 1, 2018, 16:57
Default
  #10
Senior Member
 
Join Date: Nov 2010
Location: USA
Posts: 1,232
Rep Power: 24
me3840 is on a distinguished road
This is just on OpenFOAM 5.0. I made a few modifications to a source term, but nothing more, so all the utilities (besides foamFormatConvert, as stated before) should be plain vanilla.

From running checkMesh, it appears the numbers are around:
points: 200M
faces: 600M
cells: 200M

though I expect to be able to run cases much larger than that. These seem under the 32-bit limit, which combined with the fact that everything is fine when I do something that just reads the grid (decomposePar), makes me suspect the problem only occurs when some utilities (mergeMeshes) make some temporary thing that's way too big.
me3840 is offline   Reply With Quote

Old   November 3, 2018, 09:56
Default
  #11
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
It may be that there is some critical flaw in the mesh exported by the 3rd party mesher, which is only exposed by some applications. For example, an incomplete cell definition or excess number of points or faces...

The other possibility is that since you're merging two meshes, one of them has a critical flaw that does not allow the mergeMeshes utility to properly walk along all faces on each mesh. For example, for whatever reason, the walking algorithm may crash due to an index staying set to -1, because it didn't find the correct correspondence...

The 64-bit build may be able to hide any particular flaws, for example, the need for extra memory. For example, accessing to an index -1 may be possible with the 64-bit, just because it had more arrays allocated from a contiguous array in RAM...

Either way, I would have to be able to look into the meshes myself and check with a debugger or some old school outputs in the right places to pinpoint the breaking point and the reason for it.
wyldckat is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
OpenFOAM vs Ubuntu 10.10 64 bit vkrastev OpenFOAM Installation 12 April 23, 2011 10:14
Fluent version 6.3.26 (64 bit) on Window 7 Professional (64 bit) wlt_1985 FLUENT 7 April 18, 2011 03:22
[Technical] Negative labels in faceProcAddressing ngj OpenFOAM Meshing & Mesh Conversion 6 March 29, 2011 15:54
Issue with running in parallel on multiple nodes daveatstyacht OpenFOAM 7 August 31, 2010 17:16
UDF problem/32 bit vs. 64 bit RE13 FLUENT 2 February 25, 2008 11:31


All times are GMT -4. The time now is 18:02.