
[Sponsors] 
January 17, 2012, 23:13 
Parallelization and Processor Interaction

#1 
Member

Friends,
I was wondering how exactly parallalization works in CFD. I understand that the domain is decomposed and each decomposed section is allotted to a particular processor. It is obvious that the simulation of the downstream domain cannot proceed without the simulation results from the upstream domain. Does that mean that the other processors remain silent till the iteration happens in the first processor and take that information and then proceed? I know that is not what happens but am not really sure what exactly happens !!. Can someone else what exactly happens. 

January 18, 2012, 12:37 

#2 
Member
Join Date: Jul 2011
Location: US
Posts: 39
Rep Power: 7 
Generally speaking the domain is decomposed but a small overlap is kept on both processors on each interface. A timestep is taken and the updates are synced via MPI, etc. and the solution continues. Thus, each process knows what its neighboring solutions look like on the interfaces. This is only possible b/c the NavierStokes equations are hyperbolic (local stencils)... if they were parabolic the updating is more complicated. This is exactly what happens in an explicit solver but implicit methods have to perform updates during the linear system solve to update the deltaQ values or the solution will not converge.
__________________
CFD engineering resource 

January 18, 2012, 12:42 

#3  
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Quote:
Cheers! 

January 18, 2012, 14:15 

#4 
Member
Join Date: Jul 2011
Location: US
Posts: 39
Rep Power: 7 
Yes. You are correct that the NavierStokes equations aren't purely hyperbolic but my statement above still stands. The problem can be and is still handled in a localized way. Especially when considering the approximations made via RANS equations, the viscous terms not driving the solution, and generally a few Newton iterations if solving time accurate (transient) flows, we aren't committing an enormous error by treating them as approximately local.
__________________
CFD engineering resource 

January 18, 2012, 14:33 

#5 
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
I agree with you, and didn't doubt your statement.
Just as a tangent to that: Ff you use diffusive Riemann solvers for the viscos terms, there's no difference in terms of parallelization for hyperbolic or parabolic terms. 

January 18, 2012, 15:54 

#6 
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 479
Rep Power: 12 
I thought steady state incompressible inviscid flow was elliptical?
And pressure waves go upstream. So I'm confused about the original post in regards to downstream and upstream. Are we talking purely supersonic flow? If that is the case than a space marching Euler method can be used. Yes, compressible flow is hyperbolic, i.e. the eigenvalues are real, in theory. However, I thought in practice it depends on the speed of the waves and how they are bouncing around. If you put two bodies close together (or a lot of interference), things may get a little stiff. The grid needs to get fine or the problem is easier to solve with central differencing. Is this not true? 

January 18, 2012, 16:11 

#7  
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Quote:
Quote:
Quote:
Quote:
Cheers! 

January 18, 2012, 16:39 

#8 
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 479
Rep Power: 12 
Yes, I was talking about pure Euler. Viscous terms are not hyperbolic.
In regards to the last part, I was referring to inviscid flow. Solid boundaries are modeled by reflecting the waves. When two solid boundary conditions "see" themselves there is the opportunity for a lot reflections to occur and information is passed between them very rapidly. From what I understand this can cause an issue (i.e. stability) with flux splitting methods. I don't know the ins and outs of it though. 

January 19, 2012, 06:18 

#9 
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Yeah, I can see that "too many waves, to little resolution" might trhow any Riemann solver of track... Especially with the upwind bias that convective discretizations usually have, right? So if there are two waves crossing in a single cell, the lower part of the face might be "upwind", while the upper half might be "downwind". I can see how that would be a problem!
Only solution I can think of is the one you mentioned: more resolution or higher order schemes on the same grid (Yeah, I like them, I admit ) Cheers! 

January 19, 2012, 12:04 

#10 
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 479
Rep Power: 12 
In regards to flux splitting, it would be interesting if someone would try it... I've not seen it addressed in papers, but that does not mean it does not exists.
In regards to the original topic, there are two types of parallalization, parallelization between machines and parallelization between processors. Between machines one needs to use domain decomposition. Between processors one can either use domain decomposition or multiple threads on one big domain. Two types of solution methodologies are implicit and explicit. Explicit methods only rely on information from the previous time step. Explicit methods are just one big loop where each point is updated individually. They lend themselves to domain decomposition and GPUs. Implicit methods rely on surrounding information from the current time step and require a matrix inversion. Because of this, they are more challenging for decomposition and GPUs to solve. From what I understand, matrix inversion does not lend itself to large scale parallelization, such as GPUs. In all cases, processors do not wait nor do they remain silent. In general, each domain is calculated separately with boundary conditions being set from values from the previous iteration. In other words, the boundary values are lagged. This is not an issue with explicit methods since that's what they do. But, it does introduce an error (and instability) with implicit methods. Well, those are the methodologies I'm familiar with. There probably are others. 

January 19, 2012, 13:53 

#11  
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Quote:
Just a remark along those lines: For explicit methods, the limiting factor (in terms of parallelization efficiency) is the message passing latency (i.e. the communication time) while for implict methods, the limiting factor is often the RAM available, since they need to fit a large matrix (or parts of it) in the memory. Explicit methods are generally easier to parallelize and scale better than implicit ones, which require more sophisticated strategies. With regards to this, there's somewhat of a paradigm shift in supercomputing: The trend goes not towards higher clock frequencies, but to more and faster CPUs with less RAM. The idea is to have O(10⁵  10⁶) simple, not too fast procs, with only little RAM. Since implicit methods are in desperate need of RAM, it will be a real challenge to get these methods to scale well on the next generation of supercomputers. I realize that this is not of relevance for (most) engineering applications, but I find it interesting! Cheers! 

January 19, 2012, 16:19 

#12 
Member
Join Date: Jul 2011
Location: US
Posts: 39
Rep Power: 7 
A bit of a note here. Implicit methods are generally what "big" CFD codes as well as research codes utilize. The time step is limited only by the physics you wish to capture with implicit methods while explicit methods are severely limited and thus take a very long time to converge. Implicit methods are nearly always cheaper in the long run. Also, implicit methods have been scaled satisfactorily to well over 100,000 cores. This is not a trivial task but it is well within the grasp of modern CFD practitioners.
On a side note, I have a research code which is implicit and scales well up to several hundred processors (separate machines). This is not beyond the realm of possibility and in fact, is the norm.
__________________
CFD engineering resource 

January 19, 2012, 16:33 

#13 
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 479
Rep Power: 12 
Is the code implicit structured or implicit unstructured?
I'm familiar with structured implicit codes and they do scale up well. Structured implicit solvers, when they use a factorization scheme, also don't have high memory overhead. Not sure about non factorized schemes. And I'm not sure about the ins and outs of parallelizing unstructured implicit methods. 

January 19, 2012, 16:37 

#14 
Member
Join Date: Jul 2011
Location: US
Posts: 39
Rep Power: 7 
It is implicit unstructured with multipolyhedral element type capability. I do store the entire left hand side matrix structure using a CRS approach. It is expensive but the only way to do it if you don't want to be sitting around watching the solver wasting machine time all day. Explicit methods are just not robust enough to do the large scale simulations we are interested in.
__________________
CFD engineering resource 

January 19, 2012, 16:38 

#15  
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Quote:
Quote:
Quote:
Quote:
This is not a trivial Quote:
Interesting discussion, folks Seems a few of us tend to hijack threads lately Cheers! 

January 19, 2012, 16:40 

#16 
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Could you describe the type of physics you are interested in a little bit? You are the first one I hear talking about explicit not being robust, so I'm curious to find out. In my community, people tend to see implicit as fickle and explicit as robust....


January 19, 2012, 16:52 

#17 
Member
Join Date: Jul 2011
Location: US
Posts: 39
Rep Power: 7 
By robust I am meaning at reasonable time steps. For instance, modeling turbomachinery or combustion chambers we are interested in the temperature loading of the walls or the aeroelastic effects on the blades, etc. Modeling extremely small time scales is superfluous even in unsteady cases. Explicit methods require time scales which are very small and with complicated physics the time step size is driven to a ridiculously low size at which the physics are not interesting to us.
Implicit methods on the other hand allow us to make decisions based on the physics we are trying to capture almost independent of numerical stability concerns. Hence I call them robust. Also, please note that explicit methods do not allow us to have time accurate BCs and implicit methods allow us to place each step in a Newton loop and therefore keeps the BCs time accurate as well. I'll definitely give you that if your time scale of interest is already below the explicit stability limit then it makes no sense to run an implicit method. We are almost never in this region, even with combustion modeling. I can't give you specific examples because most of the codes I'm referring to are not for public release and you wouldn't recognize the names anyway. However, they do exhibit good scaling. As far as strong vs. weak... Strong scaling is nearly impossible to show from 1 out of 100,000 procs. We can't even load the case on one machine. Also, if we could load it on one machine, the whole case would be in cache by the time we reached 100,000 procs and that is hardly a fair comparison. So, my code does show strong scaling within these limits. That is, small enough to load on a single node out to the point where we get superlinear speedup due to cache effects. I'd definitely say that it is more than plausible if you can take care of the I/O at that level.
__________________
CFD engineering resource 

January 19, 2012, 17:04 

#18  
Senior Member
cfdnewbie
Join Date: Mar 2010
Posts: 557
Rep Power: 13 
Quote:
I just always note that people doing implicit solvers spend way more time optimising their time integrator than doing simulations or analyzing physics. At least that's my impression, maybe I'm wrong.... it just seems like the whole implicitness brings with it so many parameters to optimize that it is just no use doing and that it is kind of arbitrary how you set your limits. For example, I recently overheard two very well known professors of well known US institutions argue about whether 10E8 or 10E6 should be set as a convergence criteria for their implicit solver and whether one solution was "correct" and the other one was not....that's what really makes me shake my head when it comes to implicit... but I can see that it does make sense in certain situations. Quote:
Quote:
Quote:
Cheers! 

January 19, 2012, 17:05 

#19 
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 479
Rep Power: 12 
In general I do agree with what your saying, but I didn't understand this part.
As the local CFL number for an implicit solver gets much below one it becomes explicit, i.e. the off diagonal terms go to zero, or so I thought. I think that an explicit scheme could be used within a Newton loop. Not that one would want to, but that is a different story. 

January 19, 2012, 17:12 

#20 
Senior Member
Martin Hegedus
Join Date: Feb 2011
Posts: 479
Rep Power: 12 
Oh, the implicit solvers I'm aware of lag the boundary conditions. Also, for domain decomposition, aren't the fringe boundaries usually lagged?


Thread Tools  
Display Modes  

