CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > OpenFOAM > OpenFOAM Programming & Development

segFault when accessing elements of surfaceField in parallel

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 1, 2014, 06:57
Default segFault when accessing elements of surfaceField in parallel
  #1
Senior Member
 
Artur's Avatar
 
Artur
Join Date: May 2013
Location: Southampton, UK
Posts: 372
Rep Power: 19
Artur will become famous soon enough
Dear Foamers,

I am writing a postprocessing utility which needs to access the time derivative of the velocity field and sum it over all the faces of a faceZone (which represents a user-defined control surface).

In order to achieve this I first create a time-derivative of U and then interpolate it on the faces. Then I iterate over each face of the face zone and increment the end result. This works fine in serial but in parallel I get segFaults.

I've pinpointed the problem to be caused by a non-existent element of the time derivative field being accessed (see the highlighted line in the listing below - this sits in the write() method of the utility which is called at the end of each time step).

Code:
        // get the field references
        const volVectorField& U = obr_.lookupObject<volVectorField>(UName_);
        
        // get the mesh reference
        const fvMesh& mesh = U.mesh();
    
        // create the output file if needed
        makeFile(fwhAcousticsFilePtr_);

        // create the time derivative field
        volVectorField dUdt
        (
            IOobject
            (
                "dUdt",
                obr_.time().timeName(),
                mesh,
                IOobject::NO_READ,
                IOobject::AUTO_WRITE
            ),
            fvc::ddt(U)
        );
        
        // sort out the processor patches for the newly created fields so that the process may run in parallel
        dUdt.correctBoundaryConditions();
        
        // interpolate the fields on the faces of the control surface
        surfaceVectorField Uface = fvc::interpolate(U);
        surfaceVectorField dUdtface = fvc::interpolate(dUdt);
    
        // iterate over each face on the CS
        for (int i = 0; i < faces_.size(); i++)
        {
            Pout << "accessing face " << faces_[i] << endl;

            // face-normal unit vector
            vector n = mesh.Sf()[faces_[i]] / mesh.magSf()[faces_[i]];

            scalar Un = (Uface[faces_[i]] & n);
            Pout << "Got Un " << Un << " for face " << faces_[i] << " out of " << Uface.size() << endl;
    
            // PROBLEM HERE !!!
            scalar Undot = (dUdtface[faces_[i]] & n);
            Pout << "Got Undot " << Undot << " for face " << faces_[i] << " out of " << dUdtface.size() << endl;
        }
    }
The faces_ class variable of the type const labelList& is defined in the constructor initialisation list as follows:

Code:
faces_( refCast<const fvMesh>(obr_).faceZones()[zoneLabel_] )
The resulting error printout (including the messages from the "Pout <<" statements) is as follows:

Code:
[5] accessing face 332523
[5] Got Un -55.9077 for face 332523 out of 334736
[5] Got Undot -17707.7 for face 332523 out of 334736
[5] accessing face 396735
[5] #0  Foam::error::printStack(Foam::Ostream&)[5] Got Un 0.0179708 for face 396735 out of 334736
 in "/opt/openfoam222/platforms/linux64GccDPOpt/lib/libOpenFOAM.so"
[5] #1  Foam::sigSegv::sigHandler(int) in "/opt/openfoam222/platforms/linux64GccDPOpt/lib/libOpenFOAM.so"
[5] #2   in "/lib/x86_64-linux-gnu/libc.so.6"
[5] #3  Foam::fwhACoustics::write() in "/home/artur/OpenFOAM/artur-2.2.2/platforms/linux64GccDPOpt/lib/libfwhACoustics.so"
[5] #4  Foam::OutputFilterFunctionObject<Foam::fwhACoustics>::execute(bool) in "/home/artur/OpenFOAM/artur-2.2.2/platforms/linux64GccDPOpt/lib/libfwhACoustics.so"
[5] #5  Foam::functionObjectList::execute(bool) in "/opt/openfoam222/platforms/linux64GccDPOpt/lib/libOpenFOAM.so"
[5] #6  Foam::Time::run() const in "/opt/openfoam222/platforms/linux64GccDPOpt/lib/libOpenFOAM.so"
[5] #7  
[5]  in "/opt/openfoam222/platforms/linux64GccDPOpt/bin/pimpleFoam"
[5] #8  __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6"
[5] #9  
[5]  in "/opt/openfoam222/platforms/linux64GccDPOpt/bin/pimpleFoam"
[artur:22150] *** Process received signal ***
[artur:22150] Signal: Segmentation fault (11)
[artur:22150] Signal code:  (-6)
[artur:22150] Failing at address: 0x3e800005686
[artur:22150] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fe658e394a0]
[artur:22150] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fe658e39425]
[artur:22150] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fe658e394a0]
[artur:22150] [ 3] /home/artur/OpenFOAM/artur-2.2.2/platforms/linux64GccDPOpt/lib/libfwhACoustics.so(_ZN4Foam12fwhACoustics5writeEv+0x509) [0x7fe649608929]
[artur:22150] [ 4] /home/artur/OpenFOAM/artur-2.2.2/platforms/linux64GccDPOpt/lib/libfwhACoustics.so(_ZN4Foam26OutputFilterFunctionObjectINS_12fwhACousticsEE7executeEb+0x8d) [0x7fe6496262ad]
[artur:22150] [ 5] /opt/openfoam222/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam18functionObjectList7executeEb+0x59) [0x7fe659eb7789]
[artur:22150] [ 6] /opt/openfoam222/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZNK4Foam4Time3runEv+0xb0) [0x7fe659ec3dd0]
[artur:22150] [ 7] pimpleFoam() [0x41976d]
[artur:22150] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fe658e2476d]
[artur:22150] [ 9] pimpleFoam() [0x41cebd]
[artur:22150] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 5 with PID 22150 on node artur exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I am guessing that the problem is associated with the way in which I create the field and the face indexing being mismatched somehow but I am clueless as to how this can be overcome. One thing I've noticed which I think might be relevant is that when the case is ran in serial the faceZone has 3680 elements and in parallel it comes out to be 3683 elements long. Just to be absolutely sure I ran checkMesh with all options both in serial and in parallel and it came out fine.

I have also noticed that there might be a connection between the problem occurring and some of the faces from the zone being near/on the processor boundaries and am currently looking closer at how this might lead to the above problem.

Any suggestions on how to fix this are welcome.

Peace,

A
Artur is offline   Reply With Quote

Old   May 2, 2014, 03:40
Default
  #2
Senior Member
 
Tomislav Maric
Join Date: Mar 2009
Location: Darmstadt, Germany
Posts: 284
Blog Entries: 5
Rep Power: 21
tomislav_maric is on a distinguished road
Hi Artur,

judging by your run-time error, you are running the code compiled in the 'Opt' mode. Compiling both OF and your library in 'Debug' mode (see $WM_PROJECT_DIR/etc/bashrc) would result in a more informative error - you would be able to see right away where the problem lies, without using output statements.

You state that the problem is here:

Code:
// PROBLEM HERE !!!             scalar Undot = (dUdtface[faces_[i]] & n); 
            Pout << "Got Undot " << Undot << " for face " << faces_[i] << " out of " << dUdtface.size() << endl;
However, the output in your error never gets passed:

Code:
accessing face 396735
The listing above that:

Code:
[5] accessing face 332523
[5] Got Un -55.9077 for face 332523 out of 334736
[5] Got Undot -17707.7 for face 332523 out of 334736
represents the output of a successful previous execution.

If you examine this line of your previous successful iteration:

[CODE]
[5] Got Un -55.9077 for face 332523 out of 334736
[\CODE]

it becomes clear that you have 334736 for the size of the 'Uface' field. Now, this brings me to conclusion that that is the number of internal faces of your mesh since 'Uface' is a surface scalar field.

From this conclusion I believe that your error is actually in this line:

Code:
     // face-normal unit vector
            vector n = mesh.Sf()[faces_[i]] / mesh.magSf()[faces_[i]];
because 'magSf()' returns a constant reference to a 'surfaceScalarField' which is of size 334736 (easily checked by checkMesh) and you are trying to access the internal face with the label of 'faces_[i]' which is:

Code:
[5] accessing face 396735
And hence the segmentation fault. In short, your face zone has labels of boundary faces and you are using a label of a boundary face to access a face from the internal part of a 'surface*Field' which maps only to internal faces.

This would be my initial guess based on your post. Compiling the code in Debug and using 'gdb' might prove this to be wrong, I would definitely try that.
__________________
When asking a question, prepare a SSCCE.

Last edited by tomislav_maric; May 2, 2014 at 05:44.
tomislav_maric is offline   Reply With Quote

Old   May 2, 2014, 04:29
Default
  #3
Senior Member
 
Artur's Avatar
 
Artur
Join Date: May 2013
Location: Southampton, UK
Posts: 372
Rep Power: 19
Artur will become famous soon enough
HiTomislav,

I have come to similar conclusions as you. Here is my understanding of what actually happens:

1. when mesh is decomposed additional patches are created and the internal faces are split by using baffles
2. the indices of faces stored in the face zone are automatically updated to point to the newly created faces; however, since 2 of my faces happen to be on the processor boundary they too become split (no. faces in the face zone changes, as I mentioned in the first post) and are shifted to the end of the faces list to reside with other "external" faces (they are now a part of a patch and no longer internal)
3. as you've pointed out, magSf is an internal surface scalar field and so it doesn't have as many elements as some of the indices from the faces_ list think it does and so when a faulty face becomes iterated upon a seg fault occurs

My conclusion is that there are two ways in which this may be overcome:
a) be very sure that the face zone is well away from processor boundaries
b) use a different way of storing the geometry data over which the summation is conducted (preferably using cell indices and cell value interpolation) so that the indices don't change and the fact that the case is being run in parallel may be more or less ignored from this point of view

Thanks a lot for spending the time to answer my question,

Peace,

A

P.S. Thanks for the debug compilation advice. I'll consider doing it if it turns out I'll be doing more development, sounds helpful indeed.
Artur is offline   Reply With Quote

Old   May 2, 2014, 08:18
Default
  #4
Senior Member
 
Tomislav Maric
Join Date: Mar 2009
Location: Darmstadt, Germany
Posts: 284
Blog Entries: 5
Rep Power: 21
tomislav_maric is on a distinguished road
Quote:
Originally Posted by Artur View Post
HiTomislav,

My conclusion is that there are two ways in which this may be overcome:
a) be very sure that the face zone is well away from processor boundaries
b) use a different way of storing the geometry data over which the summation is conducted (preferably using cell indices and cell value interpolation) so that the indices don't change and the fact that the case is being run in parallel may be more or less ignored from this point of view

Thanks a lot for spending the time to answer my question,

Peace,

A
Np, I'm glad if it was of help to you. Two hints.

#1 Zones and regions are an older part of OF, so I'm sure there are ways in writing parallel code that uses them - check out the public interface of 'faceZone' - there are even functions that perform parallel sync operations.

#2 Using different way of storing data for interpolations: I'm curious to hear what are you actually trying to do with zones? For most problems, owner-neighbor addressing based numerics suffices + there you can have almost automatic parallelism (you have to write a boundary field loop + cross communicate data, but that's it).
__________________
When asking a question, prepare a SSCCE.
tomislav_maric is offline   Reply With Quote

Old   May 3, 2014, 09:02
Default
  #5
Senior Member
 
Artur's Avatar
 
Artur
Join Date: May 2013
Location: Southampton, UK
Posts: 372
Rep Power: 19
Artur will become famous soon enough
Hi,

As to #1, I think the way to do it would be to access the appropriate faces of the boundary field instead of the internal one; thing here would be to take each face of the control surface into account just one (since they get split into baffles if they are on the processor boundary) and to do a check each time it is being accessed whether to know if it's in the internal or boundary field

With regards to #2, I'm writing an acoustic analogy utility which integrates certain flow quantities over an arbitrary surface. I figured it would be easier to use surface fields since when combined with a face zone they readily give me what I want, i.e. an arbitrary surface with known field values at each location.

Peace,

A
Artur is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
accessing neighbour cells in parallel computing impecca OpenFOAM Programming & Development 4 April 11, 2019 04:09
simpleFoam parallel AndrewMortimer OpenFOAM Running, Solving & CFD 12 August 7, 2015 18:45
simpleFoam in parallel issue plucas OpenFOAM Running, Solving & CFD 3 July 17, 2013 11:30
parallel Grief: BoundaryFields ok in single CPU but NOT in Parallel JR22 OpenFOAM Running, Solving & CFD 2 April 19, 2013 16:49
CFX4.3 -build analysis form Chie Min CFX 5 July 12, 2001 23:19


All times are GMT -4. The time now is 08:49.