Parallel implementation of k-e model

June 26, 2008, 03:38

Hello,

I have coded a 3D vertex-centered finite volume solver for the Navier-Stokes equations that runs either as serial code or in parallel on a cluster. I would like to add a k-epsilon turbulence model with a two-layer approach to near-wall treatment and I'm facing a problem with the parallel version. The problem is this:

In order to compute the convective and viscous fluxes on a process I need vertex states of vertices that may physically have been allocated to other processes/processors. For any process, P, I divide the global vertex set into two subsets; the set P_on consisting of those vertices that are allocated to this process (by way of a domain decomposition) and the set P_ass of associated vertices where an associated vertex is one that is allocated to another process but belongs to an element with at least one vertex allocated to P. Then I simply exchange vertex states such that every process receives states for every vertex in its P_ass from other processes. This allows me to compute the convective fluxes on an edge-by-edge basis and the viscous fluxes by a standard Galerkin approach and works fine.

The same thing won't work when I add my turbulence model. The reason is that now, in order to compute the viscous fluxes in the Navier-Stokes equations, I need the turbulent (eddy) viscosity, mu_t, of the vertices in P_ass. However, I cannot compute mu_t locally on a process because I need the local Reynolds number, y+, to distinguish between high Re and low Re regions and to compute scales in the low Re region. And to compute y+ I need information from the nearest wall nodes of all the vertices in the union P_on U P_ass. Ie. to compute y+ for any vertex in this union I need data from a wall vertex that may not be a member of this set. This means that I may not have the required information available even after exchanging information with other processes as described above. Clearly, I could exchange the required information between processes, but in contrast to the usual exchange of P_ass, which is a small set, this exchange could be very large.

The don't see any clever solutions to this. Can anyone help me out here ? If my question is too convoluted, please let me know.

Thanks, Martin

June 26, 2008, 16:02

How about this?

(1) I assume that if the entire problem were solved serially on a single processor, you have an algorithm for finding the nearest wall vertex for any given vertex in the mesh. The algorithm must involve the spatial locations (x,y,z) of the vertices. I further assume that all that matters is finding the nearest wall point, that the angle made by the vector to the nearest wall point with the wall does not matter.

(2) "v" stands for vertex, not vector. First, within every subdomain, do the following computation. For every vertex v_P in {P_on .U. P_ass}, find (using the above mentioned hopefully efficient algorithm) the nearest wall vertex v_P_wnear among all the wall vertices v_P_w belonging to this set {P_on .U. P_ass}. If such a wall vertex exists, store its location (x,y,z) together with any other information needed to calculate y+ at v_P, such as the shear stress (and density, etc.) at v_P_wnear or any path integral along the straight line connecting v_P with v_P_wnear. This information should be stored in array/s at a principal array index equal to the vertex number of v_P. Use a zero flag or infinity location if such a nearest wall vertex does not exist within the current domain for the given v_P.

(3) Next transfer this y+-related information for the border vertices in P_ass in your usual manner (MPI?) of interprocessor communication. Note that at least for the border vertices, you will need array space for both the info calculated on P and the info transferred from the neighboring processor P_ass. Effectively, neighboring processors exchange such information for the border vertices they share. At the end of this step, each border vertex will have (on both of its processors) both a v_P_wnear (calculated on the current domain P) and a v_N_wnear (calculated on the neighboring processor N). Either or both of these values may be nil (i.e., infinite wall distance).

(4) Now here is where the efficient algorithm hoped for in (1) would come in handy. In every subdomain P, for every vertex v_P in {P_on .U. P_ass}, find the shortest vector to the wall that passes through the set of border vertices of P. You do this by, for each v_P_ass, adding the vector from v_P to v_P_ass to the vector from v_P_ass to its corresponding v_N_wnear (repeat this also with v_P_wnear of v_P_ass), and finding the length of the resultant vector. Keep track of the shortest such distance found during the enumeration of the P_ass set (which includes the border with all neighboring processors), and call it v_P_wbord. For this v_P, replace the corresponding v_P_wnear with the smaller of v_P_wnear and v_P_wbord. For v_P belonging to the border set P_ass, next replace v_N_wnear with the new v_P_wnear. This addition of vectors uses the fact that the vector from v_P to its nearest wall point v_wnear lying outside P intersects the border of P in a point v_P_ass whose corresponding v_wnear also coincides with the v_wnear of v_P. If the wall is facing toward v_P_ass but away from v_P, there should be a wall segment which is closer to v_P and facing it. Perhaps this assumption would be violated near the lip of an inlet.

(5) Repeat steps (3) and (4) for all subdomains until the nearest wall distance information has propagated through all intervening subdomains. At the end of this process, each vertex in every subdomain should have the location and properties of the wall vertex nearest to it, regardless of whether the latter lies on the same subdomain or not. This should enable you to calculate y+ and mu_t, etc.

If your mesh is static, the information can be divided into purely geometric which would be unchanging, and flow-dependent (such as wall shear stress) which would change at every global iteration or time step. It would be efficient to do steps (2) through (5) only at the start of time-marching to find the nearest wall vertex for every vertex. Alternatively, this could be achieved using the efficient algorithm in (1) on a single processor serially without partitioning into subdomains. Then, for each subdomain you can form lists of the subdomains (not necessarily neighbors) that it needs wall info from, and can then supply wall shear stress etc. by direct exchange of wall info between the subdomains once every few time-marching steps. For a static mesh, this would be more efficient (though not as elegant) than repeating steps (2) through (5) at each time step.

Well, I just dreamt up all this stuff on seeing your question today. So do not accept it as correct, but only as a suggestion to be examined. Particularly, examine where and how the assumptions might be violated. Also, in step (4), you might need "old" and "new" arrays to avoid working with new values when you need the old ones for consistency.

June 26, 2008, 16:04

The more systematic way to do it starts with computation of the medial surface of the domain, a huge topic in itself

June 26, 2008, 18:40

My friend says that my approach is too idealistic. He uses a multiblock structured-mesh code with MPI. He says you work your way outward from the wall, projecting the distance onto the normal from the wall, marking each cell with its y+ (not so easy to do on an unstructured mesh). Normally, your wall boundary layer region should be within the subdomain where that part of the wall begins. In cells further away, in outer subdomains, you set a large value of y+ for each cell which is unlikely to contribute significantly to turbulence. It all sounds too ad hoc for me, but that is his two cents.

June 26, 2008, 18:41

And my friend's other suggestion was to switch to a k-omega turbulence model which does not require you to compute a distance normal to the wall. Sorry for the multiple posts.

June 27, 2008, 00:54

Hi Ananda,

Thanks a lot for your answers. I will examine your suggestion in detail. Too me it sounds like a valid proposal.

Yes, I use MPI for interprocess communication. It all runs on a Linux Beowulf cluster that I have set up. Cheap off-the-shelf PC's but quite potent in a cluster.

Last night I was actually thinking about switching to a k-omega model instead to get rid of this vexing problem. I'll need to study k-w in models detail though, as I'm not terribly familiar with them.

Again, thanks for your time,

Martin

June 27, 2008, 14:31

You are welcome, Martin.

June 26, 2008, 03:38	Parallel implementation of k-e model	#1
Niels Guest Posts: n/a	Hello, I have coded a 3D vertex-centered finite volume solver for the Navier-Stokes equations that runs either as serial code or in parallel on a cluster. I would like to add a k-epsilon turbulence model with a two-layer approach to near-wall treatment and I'm facing a problem with the parallel version. The problem is this: In order to compute the convective and viscous fluxes on a process I need vertex states of vertices that may physically have been allocated to other processes/processors. For any process, P, I divide the global vertex set into two subsets; the set P_on consisting of those vertices that are allocated to this process (by way of a domain decomposition) and the set P_ass of associated vertices where an associated vertex is one that is allocated to another process but belongs to an element with at least one vertex allocated to P. Then I simply exchange vertex states such that every process receives states for every vertex in its P_ass from other processes. This allows me to compute the convective fluxes on an edge-by-edge basis and the viscous fluxes by a standard Galerkin approach and works fine. The same thing won't work when I add my turbulence model. The reason is that now, in order to compute the viscous fluxes in the Navier-Stokes equations, I need the turbulent (eddy) viscosity, mu_t, of the vertices in P_ass. However, I cannot compute mu_t locally on a process because I need the local Reynolds number, y+, to distinguish between high Re and low Re regions and to compute scales in the low Re region. And to compute y+ I need information from the nearest wall nodes of all the vertices in the union P_on U P_ass. Ie. to compute y+ for any vertex in this union I need data from a wall vertex that may not be a member of this set. This means that I may not have the required information available even after exchanging information with other processes as described above. Clearly, I could exchange the required information between processes, but in contrast to the usual exchange of P_ass, which is a small set, this exchange could be very large. The don't see any clever solutions to this. Can anyone help me out here ? If my question is too convoluted, please let me know. Thanks, Martin

June 26, 2008, 16:02	Re: Parallel implementation of k-e model	#2
Ananda Himansu Guest Posts: n/a	How about this? (1) I assume that if the entire problem were solved serially on a single processor, you have an algorithm for finding the nearest wall vertex for any given vertex in the mesh. The algorithm must involve the spatial locations (x,y,z) of the vertices. I further assume that all that matters is finding the nearest wall point, that the angle made by the vector to the nearest wall point with the wall does not matter. (2) "v" stands for vertex, not vector. First, within every subdomain, do the following computation. For every vertex v_P in {P_on .U. P_ass}, find (using the above mentioned hopefully efficient algorithm) the nearest wall vertex v_P_wnear among all the wall vertices v_P_w belonging to this set {P_on .U. P_ass}. If such a wall vertex exists, store its location (x,y,z) together with any other information needed to calculate y+ at v_P, such as the shear stress (and density, etc.) at v_P_wnear or any path integral along the straight line connecting v_P with v_P_wnear. This information should be stored in array/s at a principal array index equal to the vertex number of v_P. Use a zero flag or infinity location if such a nearest wall vertex does not exist within the current domain for the given v_P. (3) Next transfer this y+-related information for the border vertices in P_ass in your usual manner (MPI?) of interprocessor communication. Note that at least for the border vertices, you will need array space for both the info calculated on P and the info transferred from the neighboring processor P_ass. Effectively, neighboring processors exchange such information for the border vertices they share. At the end of this step, each border vertex will have (on both of its processors) both a v_P_wnear (calculated on the current domain P) and a v_N_wnear (calculated on the neighboring processor N). Either or both of these values may be nil (i.e., infinite wall distance). (4) Now here is where the efficient algorithm hoped for in (1) would come in handy. In every subdomain P, for every vertex v_P in {P_on .U. P_ass}, find the shortest vector to the wall that passes through the set of border vertices of P. You do this by, for each v_P_ass, adding the vector from v_P to v_P_ass to the vector from v_P_ass to its corresponding v_N_wnear (repeat this also with v_P_wnear of v_P_ass), and finding the length of the resultant vector. Keep track of the shortest such distance found during the enumeration of the P_ass set (which includes the border with all neighboring processors), and call it v_P_wbord. For this v_P, replace the corresponding v_P_wnear with the smaller of v_P_wnear and v_P_wbord. For v_P belonging to the border set P_ass, next replace v_N_wnear with the new v_P_wnear. This addition of vectors uses the fact that the vector from v_P to its nearest wall point v_wnear lying outside P intersects the border of P in a point v_P_ass whose corresponding v_wnear also coincides with the v_wnear of v_P. If the wall is facing toward v_P_ass but away from v_P, there should be a wall segment which is closer to v_P and facing it. Perhaps this assumption would be violated near the lip of an inlet. (5) Repeat steps (3) and (4) for all subdomains until the nearest wall distance information has propagated through all intervening subdomains. At the end of this process, each vertex in every subdomain should have the location and properties of the wall vertex nearest to it, regardless of whether the latter lies on the same subdomain or not. This should enable you to calculate y+ and mu_t, etc. If your mesh is static, the information can be divided into purely geometric which would be unchanging, and flow-dependent (such as wall shear stress) which would change at every global iteration or time step. It would be efficient to do steps (2) through (5) only at the start of time-marching to find the nearest wall vertex for every vertex. Alternatively, this could be achieved using the efficient algorithm in (1) on a single processor serially without partitioning into subdomains. Then, for each subdomain you can form lists of the subdomains (not necessarily neighbors) that it needs wall info from, and can then supply wall shear stress etc. by direct exchange of wall info between the subdomains once every few time-marching steps. For a static mesh, this would be more efficient (though not as elegant) than repeating steps (2) through (5) at each time step. Well, I just dreamt up all this stuff on seeing your question today. So do not accept it as correct, but only as a suggestion to be examined. Particularly, examine where and how the assumptions might be violated. Also, in step (4), you might need "old" and "new" arrays to avoid working with new values when you need the old ones for consistency.

June 26, 2008, 16:04	Re: Parallel implementation of k-e model	#3
Ananda Himansu Guest Posts: n/a	The more systematic way to do it starts with computation of the medial surface of the domain, a huge topic in itself

June 26, 2008, 18:40	Re: Parallel implementation of k-e model	#4
Ananda Himansu Guest Posts: n/a	My friend says that my approach is too idealistic. He uses a multiblock structured-mesh code with MPI. He says you work your way outward from the wall, projecting the distance onto the normal from the wall, marking each cell with its y+ (not so easy to do on an unstructured mesh). Normally, your wall boundary layer region should be within the subdomain where that part of the wall begins. In cells further away, in outer subdomains, you set a large value of y+ for each cell which is unlikely to contribute significantly to turbulence. It all sounds too ad hoc for me, but that is his two cents.

June 26, 2008, 18:41	Re: Parallel implementation of k-e model	#5
Ananda Himansu Guest Posts: n/a	And my friend's other suggestion was to switch to a k-omega turbulence model which does not require you to compute a distance normal to the wall. Sorry for the multiple posts.

June 27, 2008, 00:54	Re: Parallel implementation of k-e model	#6
Niels Guest Posts: n/a	Hi Ananda, Thanks a lot for your answers. I will examine your suggestion in detail. Too me it sounds like a valid proposal. Yes, I use MPI for interprocess communication. It all runs on a Linux Beowulf cluster that I have set up. Cheap off-the-shelf PC's but quite potent in a cluster. Last night I was actually thinking about switching to a k-omega model instead to get rid of this vexing problem. I'll need to study k-w in models detail though, as I'm not terribly familiar with them. Again, thanks for your time, Martin

June 27, 2008, 14:31	Re: Parallel implementation of k-e model	#7
Ananda Himansu Guest Posts: n/a	You are welcome, Martin.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OpenFoam implementation of Wilcox's latest k-omega model	swetkyz	OpenFOAM	3	June 19, 2017 11:05
non-linear k-epsilon model implementation problems	Saidi	Main CFD Forum	2	March 4, 2010 13:23
Implementation of wall function with SA model	Bala	Main CFD Forum	0	October 7, 2004 23:54
UDFs Parallel Implementation Rules	Greg Perkins	FLUENT	0	February 4, 2001 05:59
A reference on implementation of Spalart-Alam. Turbulence Model?	Mohammad Kermani	Main CFD Forum	2	December 26, 1999 02:56