CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

Limiters at interfaces in MPI

Register Blogs Members List Search Today's Posts Mark Forums Read

Like Tree1Likes
  • 1 Post By naffrancois

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   May 28, 2018, 03:28
Default Limiters at interfaces in MPI
  #1
New Member
 
Nived M R
Join Date: May 2018
Posts: 6
Rep Power: 4
nived_mr is on a distinguished road
We have a 3D unstructured grid serial code which we are trying to parallelize using MPI. For the cases involving shocks we have problems of convergence when using a 2nd order upwind scheme. The code is blowing up.

We have tried communicating the higher order reconstructed values at the interface faces. But there are oscilations along the shock in case of parallel code w.r.t the serial code.

Can anyone suggest a possible solution to the issue?

Thanks.
nived_mr is offline   Reply With Quote

Old   May 28, 2018, 03:43
Default Blowing up
  #2
Senior Member
 
Selig
Join Date: Jul 2016
Posts: 213
Rep Power: 7
selig5576 is on a distinguished road
When you say "blow up" are you referring to numerical blow up or your simulation crashing due to memory leaks? If its due to blow up, I would confirm that your serial code under the exact same conditions to ensure its not an implementation issue. In terms of debugging MPI codes I would use TotalView, it is pretty much the standard.
selig5576 is offline   Reply With Quote

Old   May 28, 2018, 04:09
Default
  #3
New Member
 
Nived M R
Join Date: May 2018
Posts: 6
Rep Power: 4
nived_mr is on a distinguished road
The blow-up is a numerical blow-up. The parallel code converges for a few time step iterations and then diverges. We have set the same conditions in both the cases. The first order accurate simulations converge well. We have issues in 2nd order accurate convective scheme implementation.
nived_mr is offline   Reply With Quote

Old   May 28, 2018, 11:53
Default
  #4
Senior Member
 
Join Date: Oct 2011
Posts: 130
Rep Power: 11
naffrancois is on a distinguished road
Did you check at first order if you have exact same results (up to machine accuracy) between serial and // code ? You did not mention it.

If so, then second order. You mention that you have a communication point for the limited recontructed states, I guess prior to the Riemann solver.

How do you compute the reconstructed states ? Assuming that first you compute the gradients with Green-Gauss or Least-Squares, both methods needs average values coming from a neighborhood. Did you communicate these average values prior gradient computation ?

Depending on your implementation, same remark may hold for the limiter, for example BJ limiter needs Min and Max over a neighborhood.
naffrancois is offline   Reply With Quote

Old   May 28, 2018, 12:01
Default
  #5
Senior Member
 
Join Date: Dec 2017
Posts: 150
Rep Power: 4
AliE is on a distinguished road
Hi,

As people here is writing, you have a bug in the parallel code. Trust me, I have done the same job with my code at the resulta between the two codes must be the same. Since for the second order you need the upwind and the downwind cell, have you cheked that the ghost nodes are exchanged correctly? Naturally this is just a guess!
AliE is offline   Reply With Quote

Old   May 29, 2018, 01:42
Default
  #6
New Member
 
Nived M R
Join Date: May 2018
Posts: 6
Rep Power: 4
nived_mr is on a distinguished road
Thanks for the replies.

I would like to clarify that we are getting a maximum error of about 3.47 x 10^-3 percentage in case of first order calculations while using 16 processors. I have attached the images of results showing the same for both 1st order and second order results. Is this fine? We are having a code with Implicit Jacobians solved using LU-SGS method and we haven't modified the Jacobian part till now for parallel computations.

Regarding the communication of reconstructed values, we communicate the solution at interface cells once we get the solution at a new time step, then we use these to find average values of the variables. These are then used to calculate Green-Gauss Gradient at all cell centres. Then we calculate the limited reconstructed values at faces, which are communicated prior to solving for the solution at the next time set.

We use the Venkatkrishnan limiter and we are calculating minimum and maximum across the neighbourhood considering the neighbour cells in other zones also, which does not change the result.

Attached Images
File Type: png U_x_1stord.png (66.2 KB, 12 views)
File Type: png U_x_1stord_zoom.png (67.1 KB, 11 views)
File Type: png U_vs_x_10p_shock.png (75.1 KB, 10 views)
nived_mr is offline   Reply With Quote

Old   May 29, 2018, 02:11
Default
  #7
Senior Member
 
Join Date: Dec 2017
Posts: 150
Rep Power: 4
AliE is on a distinguished road
Hi,

Unfortunatly a bug in mpi is very tedious to find. In my case for example I have done a contour of the central coeff of the matrices for the serial and the parallel version and the same thing for the rhs. Then I have looked at this results togwther. With these method you can see immeditlely if your equations are the same and I mean exactly the same at the 16 decimal digit maybe you can try to do the same if rhs are different than you can start the debug by going back.
AliE is offline   Reply With Quote

Old   May 29, 2018, 04:10
Default
  #8
Senior Member
 
Join Date: Oct 2011
Posts: 130
Rep Power: 11
naffrancois is on a distinguished road
Your first results are in my opinion NOT fine. Even if it may look almost the same on the graphs, the error is much higher than real precision.

Then before switching to second order, you should first fix it with first order. Can you switch your temporal scheme to a simple explicit time integrator like 1st order Euler scheme for testing purpose ? If it works, then you add once at a time new ingredients, implicit, 2nd order etc

edit: you say you fill in ghost cells at the END of the time step. Is the initial solution communicated for the first time step ?
naffrancois is offline   Reply With Quote

Old   May 30, 2018, 01:01
Default
  #9
New Member
 
Nived M R
Join Date: May 2018
Posts: 6
Rep Power: 4
nived_mr is on a distinguished road
Dear naffrancois,

As per your suggestion, I tried running the same case using explicit Euler time stepping and it gives exact match(13 decimal places) in case of first order calculations. Please see the attached results. But once I use the 2nd order limiter, it gives a result similar to the one in case of implicit code even with explicit Euler time stepping.

I tried running explicit code with a limiter value of 1.0, giving match up to 11-12 decimal places suggesting that the error is not in the gradient calculations.
Attached Images
File Type: png U_x_1stord_expl.png (64.4 KB, 2 views)
File Type: png U_x_1stord_expl_zoom.png (77.2 KB, 2 views)
File Type: png U_x_2ndord_expl_zoom.png (74.2 KB, 5 views)
nived_mr is offline   Reply With Quote

Old   May 31, 2018, 07:42
Default
  #10
New Member
 
Nived M R
Join Date: May 2018
Posts: 6
Rep Power: 4
nived_mr is on a distinguished road
We have finally fixed a bug in the 2nd order convective flux limiter calculation, thus serial and parallel results of explicit Euler time-stepping are matching up to decimal places of machine precision.
Now we have a slight difference in the results of serial and parallel codes in case of implicit time-stepping using LU-SGS solver for the same case. Will it be because of a mismatch in the calculation in the LU-SGS solver caused due to decomposition of the computational domain in the parallel code? Kindly suggest a way to fix it.
nived_mr is offline   Reply With Quote

Old   May 31, 2018, 17:47
Default
  #11
Senior Member
 
Join Date: Oct 2011
Posts: 130
Rep Power: 11
naffrancois is on a distinguished road
Well you have done a big step finding the bug for the explicit solver. I am not familiar with LUSGS time stepping, but clearly extra care has to be taken. I parallelized not so long time ago an unstructured implicit scheme using 1st order in space Jacobian with gmres, I did it that way: each cpu owns a submatrix of size Nelem*Nelem_overlap, like an augmented matrix. In krylov linear solvers, typical operations involve matrix-vector products Jac*x=b and scalar products. Communication of x vector is then necessary prior matrix-vector product and sum reduction (MPI_ALLREDUCE) for scalar products.

I dont know how this applies to LUSGS, maybe you forgot contributions coming from ghost cells in the Upper part ?

By the way I assume you are testing LUSGS with 1st order
naffrancois is offline   Reply With Quote

Old   June 1, 2018, 02:30
Default
  #12
New Member
 
Nived M R
Join Date: May 2018
Posts: 6
Rep Power: 4
nived_mr is on a distinguished road
Right now we are not using any MPI Functions inside the LU-SGS solver. We are working out a possible solution to account for missing neighbour cell information in the LU-SGS Matrix in all the processes.
nived_mr is offline   Reply With Quote

Old   June 4, 2018, 15:22
Default
  #13
Senior Member
 
Join Date: Oct 2011
Posts: 130
Rep Power: 11
naffrancois is on a distinguished road
I would say as long as you have off-diagonal blocks into your matrix, e.g. not a block-diagonal matrix, means that you have for each element neighbor information. Then at a certain point in your LUSGS solver you will need MPI calls to synchronize values between the partitions. Anytime you need values coming from a neighbor element you have to ask yourself if this value may belong to a different CPU.
AliE likes this.
naffrancois is offline   Reply With Quote

Old   June 5, 2018, 05:17
Default
  #14
Senior Member
 
Join Date: Dec 2017
Posts: 150
Rep Power: 4
AliE is on a distinguished road
Quote:
Originally Posted by naffrancois View Post
I would say as long as you have off-diagonal blocks into your matrix, e.g. not a block-diagonal matrix, means that you have for each element neighbor information. Then at a certain point in your LUSGS solver you will need MPI calls to synchronize values between the partitions. Anytime you need values coming from a neighbor element you have to ask yourself if this value may belong to a different CPU.
naffrancois is right. Have you tried to use PETSC in order to solve the linear system? It resolved a lot of problem related to the linear system parallelization in my case and it is super-fast!
AliE is offline   Reply With Quote

Reply

Tags
mpi

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Doubt regarding MUSCL scheme implementation and associated limiters SangeethCFD Main CFD Forum 0 October 7, 2017 09:00
mpirun, best parameters pablodecastillo Hardware 18 November 10, 2016 12:36
[OpenFOAM.org] MPI compiling and version mismatch pki OpenFOAM Installation 7 June 15, 2015 16:21
Error using LaunderGibsonRSTM on SGI ALTIX 4700 jaswi OpenFOAM 2 April 29, 2008 10:54
Is Testsuite on the way or not lakeat OpenFOAM Installation 6 April 28, 2008 11:12


All times are GMT -4. The time now is 20:43.