Potentially redundant set of computations for G object within turbulence models

HPE · August 16, 2020, 08:53

Hi,

Some of the turbulence models compute G (i.e. the turbulent kinetic energy production rate due to the anisotropic part of the stress tensor). For example, in kEpsilon:

Code:

    volScalarField::Internal G
    (
        this->GName(),
        nut.v()*(dev(twoSymm(tgradU().v())) && tgradU().v())
    );
    tgradU.clear();

Here, we compute a deviatoric-symmetric tensor ((dev(twoSymm(tgradU().v()))) with a full tensor tgradU().v().

Any tensor can be divided into its symmetric and anti-symmetric parts. And any double-inner product of a symmetric tensor and an anti-symmetric tensor is (as far as I know) always zero.

Therefore, the above double-inner product can be reduced between two symmetric tensors without losing any level of accuracy in the final outcome.

Question: Is there any reason why such reduction is/should not performed to your knowledge?

Such reduction will help to considerably reduce the computational costs.

Wenyuan · August 16, 2020, 15:22

Hi,

Have you made tests to support the following statement?

Quote:

Such reduction will help to considerably reduce the computational costs.

The double dot operation should be quite fast since it only involves a few multiplications and additions. Moreover, you don't get the symmetric tensor for free. So I don't expect your approach to be "considerably" faster.

Also, the native implementation is consistent with the definition.

HPE · August 17, 2020, 04:02

Quote:

The double dot operation should be quite fast since it only involves a few multiplications and additions. Moreover, you don't get the symmetric tensor for free. So I don't expect your approach to be "considerably" faster.

I kindly disagree:

- The cost reduction comes from the fact that there would need a single symmetric tensor instead of a symmetric tensor and full tensor. If such reduction is possible (and I think it is without losing generality), there will not be two symmetric tensors to double-product, but only a single one. Instead of storing and computing 18 floating-point numbers, there will be only 6. So, the save would be 12 floating-point numbers per cell to compute and to store (9 from the tensor, 3 from the symmetric tensor).
- Also, please note that the gradient operation is the one of the most expensive standalone operations in OpenFOAM (though still not sure, the symmetrized gradient can be computed without computing the full gradient in OpenFOAM).
- In fact, the same reduction is held in [Pope, Turbulent flows, p. 126]. That's what encouraged me to inspect the possibility.

Quote:

Also, the native implementation is consistent with the definition.

Does this mean that the aforementioned (possible) complexity reduction would be inconsistent with the definition?

I kindly don't think so. No contradiction I see.

Many thanks for your answers.

Wenyuan · August 17, 2020, 05:42

Hi,

Quote:

- The cost reduction comes from the fact that there would need a single symmetric tensor instead of a symmetric tensor and full tensor. If such reduction is possible (and I think it is without losing generality), there will not be two symmetric tensors to double-product, but only a single one. Instead of storing and computing 18 floating-point numbers, there will be only 6. So, the save would be 12 floating-point numbers per cell to compute and to store (9 from the tensor, 3 from the symmetric tensor).

I am afraid there are issues in your calculation.
The double-inner-product operator for a symmetric tensor and a full tensor is defined as

Code:

 //- Double-inner-product of a SymmTensor and a Tensor
 template<class Cmpt>
 inline Cmpt
 operator&&(const SymmTensor<Cmpt>& st1, const Tensor<Cmpt>& t2)
 {
     return
     (
         st1.xx()*t2.xx() + st1.xy()*t2.xy() + st1.xz()*t2.xz() +
         st1.xy()*t2.yx() + st1.yy()*t2.yy() + st1.yz()*t2.yz() +
         st1.xz()*t2.zx() + st1.yz()*t2.zy() + st1.zz()*t2.zz()
     );
 }

So we have 15, instead of 18, floating numbers in total. There are 9 multiplications and 8 additions, so 17 operations in total.

The same operator for two symmetric tensors is defined as

Code:

//- Double-dot-product between a symmetric tensor and a symmetric tensor
template<class Cmpt>
inline Cmpt
operator&&(const SymmTensor<Cmpt>& st1, const SymmTensor<Cmpt>& st2)
{
    return
    (
        st1.xx()*st2.xx() + 2*st1.xy()*st2.xy() + 2*st1.xz()*st2.xz()
                          +   st1.yy()*st2.yy() + 2*st1.yz()*st2.yz()
                                                +   st1.zz()*st2.zz()
    );
}

There are 9 multiplications and 5 additions, so 14 operations in total. As you can see, the save in number of operations is only 3. Also, the code does not store such tensors. The corresponding memory gets freed when the calculation is done.

Quote:

- Also, please note that the gradient operation is the one of the most expensive standalone operations in OpenFOAM (though still not sure, the symmetrized gradient can be computed without computing the full gradient in OpenFOAM).

Could you please direct me to the code where such symmetrized gradients are calculated?

Also, please note that this piece of code is only a tiny part of the turbulence model, and only involves most basic operations, e.g. + and *. Usually, the pressure Poisson equation requires much more time to solve than the turbulence model does. So I would say the reduction would be negligible in terms of overall simulation time.

Quote:

Does this mean that the aforementioned (possible) complexity reduction would be inconsistent with the definition?

I meant to say that it is the form you get directly from mathematical derivations, where only the velocity gradient, not its symmetric part, is present. Such a consistency makes it easier to understand the physical meaning of the implemented term.

HPE · August 17, 2020, 06:44

Hi,

I am afraid that you have also misunderstood some of my remarks even though I made a summation mistake

, i.e. I have never talked about the number of floating-point operations, but the floating-point numbers themselves.

- Tensor<Cmpt> stores 9 elements, i.e. grad(U).
- SymmTensor<Cmpt> stores 6 elements, i.e. symm(grad(U)).
- These kept stored in memory no matter how many of them were used in the double-inner product of a SymmTensor and a Tensor.
- So, in total we have 15 elements per cell in memory.
- The potential reduction will require to store a single SymmTensor<Cmpt>, i.e. symm(grad(U)), which has 6 elements.
- The save for the memory storage, allocation and deallocation, will then be 9 elements per cell per iteration step. grad(U) might or not kept in the memory if not cached, but the allocation and deallocation have their own cost, not to mention the peak memory usage, and limiting the transfers to the CPU cache.
- The double-inner product of a SymmTensor and Tensor has 9 multiplications+8 summations.
- The double-inner product of a (different) SymmTensor and SymmTensor has 6 multiplications + 5 summations, since the compilation-time constant "2" will definitely be optimised away.
- Further the double-inner product of the same SymmTensor can be coded to reduce the above (magSqr?).
- It seems that I forgot to add "whether" in the sentence: "(though still not sure whether the symmetrized gradient can be computed without computing the full gradient in OpenFOAM)." Therefore, there is no such symmetric-gradient function (yet), but assuming if we would have, we would even further reduce the cost.
- Poisson equation solution is held by a set of standalone operations, but gradient computation function is a standalone operation as I said, and I argue that it is one of the most expensive standalone functions. I did not compare the cost of both, since one is apple, another is orange.
- I think it seems that the reduction is doable (I have found that that exact reduction had been carried out, and hardcoded in realizableKE), but not preferable in terms of the arguments you have provided. Fair enough.

Many thanks for your remarks, and contribution. Highly appreciated.

Wenyuan · August 17, 2020, 07:50

Hi,

Quote:

I have found that that exact reduction had been carried out, and hardcoded in realizableKE

I see your point. So you are working with incompressible turbulence models which allow you to further simplify the deviatoric tensor by using the divergence-free condition. And your observation should only be true for quite old implementations (< OpenFOAM-v3.0) where there were separate implementations for incompressible turbulence models.

Quote:

- Further the double-inner product of the same SymmTensor can be coded to reduce the above (magSqr?).

magSqr will do the work. However, the total number of operations is the same if you check the source code.

Currently, most turbulence models are written in a manner that makes maintenance easier by avoiding code duplication. The side effect is that it might not be the most efficient form for some flows, especially single-phase strict incompressible flows.

HPE · August 17, 2020, 12:17

Sure, thank you.

Just out of curiosity now (let's forget about the cost/code maintenance issue):

I think, this reduction can be applied to compressible flows as well?

Do you think I miss a point in this concluding remark?

Wenyuan · August 17, 2020, 14:59

I agree with you. In the compressible flow case, one needs to play with the diagonal of the symmetric matrix to calculate the extra term.

Anyway, this topic is interesting. I would appreciate if you could share your findings in the future.

August 16, 2020, 08:53	Potentially redundant set of computations for G object within turbulence models	#1
HPE Senior Member Herpes Free Engineer Join Date: Sep 2019 Location: The Home Under The Ground with the Lost Boys Posts: 932 Rep Power: 12	Hi, Some of the turbulence models compute G (i.e. the turbulent kinetic energy production rate due to the anisotropic part of the stress tensor). For example, in kEpsilon: Code: volScalarField::Internal G ( this->GName(), nut.v()(dev(twoSymm(tgradU().v())) && tgradU().v()) ); tgradU.clear(); Here, we compute a deviatoric-symmetric tensor ((dev(twoSymm(tgradU().v())))* with a full tensor tgradU().v(). Any tensor can be divided into its symmetric and anti-symmetric parts. And any double-inner product of a symmetric tensor and an anti-symmetric tensor is (as far as I know) always zero. Therefore, the above double-inner product can be reduced between two symmetric tensors without losing any level of accuracy in the final outcome. Question: Is there any reason why such reduction is/should not performed to your knowledge? Such reduction will help to considerably reduce the computational costs. __________________ The OpenFOAM community is the biggest contributor to OpenFOAM: User guide/Wiki-1/Wiki-2/Code guide/Code Wiki/Journal Nilsson/Guerrero/Holzinger/Holzmann/Nagy/Santos/Nozaki/Jasak/Primer Governance Bugs/Features: OpenFOAM (ESI-OpenCFD-Trademark) Bugs/Features: FOAM-Extend (Wikki-FSB) Bugs: OpenFOAM.org How to create a MWE New: Forkable OpenFOAM mirror

August 17, 2020, 06:44		#5
HPE Senior Member Herpes Free Engineer Join Date: Sep 2019 Location: The Home Under The Ground with the Lost Boys Posts: 932 Rep Power: 12	Hi, I am afraid that you have also misunderstood some of my remarks even though I made a summation mistake , i.e. I have never talked about the number of floating-point operations, but the floating-point numbers themselves. - Tensor<Cmpt> stores 9 elements, i.e. grad(U). - SymmTensor<Cmpt> stores 6 elements, i.e. symm(grad(U)). - These kept stored in memory no matter how many of them were used in the double-inner product of a SymmTensor and a Tensor. - So, in total we have 15 elements per cell in memory. - The potential reduction will require to store a single SymmTensor<Cmpt>, i.e. symm(grad(U)), which has 6 elements. - The save for the memory storage, allocation and deallocation, will then be 9 elements per cell per iteration step. grad(U) might or not kept in the memory if not cached, but the allocation and deallocation have their own cost, not to mention the peak memory usage, and limiting the transfers to the CPU cache. - The double-inner product of a SymmTensor and Tensor has 9 multiplications+8 summations. - The double-inner product of a (different) SymmTensor and SymmTensor has 6 multiplications + 5 summations, since the compilation-time constant "2" will definitely be optimised away. - Further the double-inner product of the same SymmTensor can be coded to reduce the above (magSqr?). - It seems that I forgot to add "whether" in the sentence: "(though still not sure whether the symmetrized gradient can be computed without computing the full gradient in OpenFOAM)." Therefore, there is no such symmetric-gradient function (yet), but assuming if we would have, we would even further reduce the cost. - Poisson equation solution is held by a set of standalone operations, but gradient computation function is a standalone operation as I said, and I argue that it is one of the most expensive standalone functions. I did not compare the cost of both, since one is apple, another is orange. - I think it seems that the reduction is doable (I have found that that exact reduction had been carried out, and hardcoded in realizableKE), but not preferable in terms of the arguments you have provided. Fair enough. Many thanks for your remarks, and contribution. Highly appreciated. __________________ The OpenFOAM community is the biggest contributor to OpenFOAM: User guide/Wiki-1/Wiki-2/Code guide/Code Wiki/Journal Nilsson/Guerrero/Holzinger/Holzmann/Nagy/Santos/Nozaki/Jasak/Primer Governance Bugs/Features: OpenFOAM (ESI-OpenCFD-Trademark) Bugs/Features: FOAM-Extend (Wikki-FSB) Bugs: OpenFOAM.org How to create a MWE New: Forkable OpenFOAM mirror

August 17, 2020, 12:17		#7
HPE Senior Member Herpes Free Engineer Join Date: Sep 2019 Location: The Home Under The Ground with the Lost Boys Posts: 932 Rep Power: 12	Sure, thank you. Just out of curiosity now (let's forget about the cost/code maintenance issue): I think, this reduction can be applied to compressible flows as well? Do you think I miss a point in this concluding remark? __________________ The OpenFOAM community is the biggest contributor to OpenFOAM: User guide/Wiki-1/Wiki-2/Code guide/Code Wiki/Journal Nilsson/Guerrero/Holzinger/Holzmann/Nagy/Santos/Nozaki/Jasak/Primer Governance Bugs/Features: OpenFOAM (ESI-OpenCFD-Trademark) Bugs/Features: FOAM-Extend (Wikki-FSB) Bugs: OpenFOAM.org How to create a MWE New: Forkable OpenFOAM mirror

August 17, 2020, 14:59		#8
Wenyuan New Member Wenyuan Fan Join Date: Mar 2017 Posts: 27 Rep Power: 9	I agree with you. In the compressible flow case, one needs to play with the diagonal of the symmetric matrix to calculate the extra term. Anyway, this topic is interesting. I would appreciate if you could share your findings in the future. HPE likes this.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Table bounds warnings at: END OF TIME STEP	CFXer	CFX	4	July 16, 2020 23:44
yPlus function object and transitional turbulence models	gflorent	OpenFOAM Post-Processing	0	March 26, 2020 11:33
Multiphase Turbulence Models	im_lenny	OpenFOAM Running, Solving & CFD	8	January 31, 2019 10:37
boundaryFoam, axisymmetry and turbulence models	thomas_toulorge	OpenFOAM Running, Solving & CFD	0	May 12, 2011 13:05
OF 1.6 \| Ubuntu 9.10 (64bit) \| GLIBCXX_3.4.11 not found	piprus	OpenFOAM Installation	22	February 25, 2010 13:43