CFD Online Discussion Forums - fluent UDF parallel problem

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- Fluent UDF and Scheme Programming (https://www.cfd-online.com/Forums/fluent-udf/)

- - fluent UDF parallel problem (https://www.cfd-online.com/Forums/fluent-udf/182382-fluent-udf-parallel-problem.html)

tanpeilai

January 8, 2017 20:24

fluent UDF parallel problem

This is my UDF, I want to caculate the mass flow of pressure out, and return it to mass flow inlet.
It works well in serial, but when I use it in parallel, the mass flow was zero in inlet.
(It's in transient model, return mass at next timestep)
I read UDF mannul and look for the same question，modify it for weeks, try to use F_UDMI too, but it's works wrong.
Could someone help me , thank you very much!!

#include "udf.h"
#define P_outlet_ID 2

real flow = 0.;

DEFINE_PROFILE(MP_mass_pri_l,t,i)
{

face_t f;
begin_f_loop(f,t)
{
F_PROFILE(f,t,i) = flow;
}
end_f_loop(f,t)

}

DEFINE_EXECUTE_AT_END(MP_measure_mass_flow)
{

Domain *d;
Thread *th_p, *th_m;
face_t f;
d = Get_Domain(1);
th_p= Lookup_Thread(d, P_outlet_ID);

flow=0.0;
begin_f_loop(f,th_p)
{
flow += F_FLUX(f,th_p);
}
end_f_loop(f,th_p)

}

tanpeilai

January 8, 2017 22:10

I use message to get variable's value, only the last node return right value, else is zero, and the monitor of mass flow inlet is zero.

KevinZ09

January 9, 2017 05:24

Parallel UDFs work quite a bit different than serial ones. The problem you're encoutering probably comes from the outlet being on another partition than the inlet. So when you calculate the mass flow through the outlet, the partition that contains that face will update its value for "flow". However, the partition responsible for the inlet still has "flow = 0.". So then when the inlet mass flow is set, it remains zero. So you're gonna need a "node_to_host_real_1" call and a "host_to_node_real_1" call to make sure the update value of flow is known by all the nodes.

For a similar reason F_UDMI doesn't work; the partion that contains the data is different from the partition that needs the data.

There's quite a good parallel UDF example in the UDF manual. It's in section 7.8.

See if it helps you. If not, let me know what you don't get/what's not working?

tanpeilai

January 10, 2017 00:36

Thank you for your answer. I read mannual and modify my udf. and I also found some macro may help, but has none results. The udf I tried as follow:
#include "udf.h"
#define P_outlet_ID 2
real flow;
DEFINE_PROFILE(mass_pri_l,t,i)
{
face_t f;
begin_f_loop(f,t)
{
F_PROFILE(f,t,i) = flow;
}
end_f_loop(f,t)
}

DEFINE_EXECUTE_AT_END(measure_mass_flow)
{
Domain *d;
Thread *th_p, *th_m;
face_t f;
d = Get_Domain(1);
th_p= Lookup_Thread(d, P_outlet_ID);

#if !RP_NODE
flow=0.0;

begin_f_loop(f,th_p)
{
flow += F_FLUX(f,th_p);
}
end_f_loop(f,th_p)

#endif
host_to_node_real_1(flow);
}

and another udf use flow=PRF_GRHIGH1(flow);

#include "udf.h"
#define P_outlet_ID 2

DEFINE_PROFILE(mass_pri_l,t,i)
{
Domain *d;
Thread *th_p;
face_t f1,f2;
real flow;
d = Get_Domain(1);
th_p= Lookup_Thread(d, P_outlet_ID);

flow=0.0;
begin_f_loop(f1,th_p)
{
flow += F_FLUX(f1,th_p);
}
end_f_loop(f1,th_p)

flow=PRF_GRSUM1(flow);

begin_f_loop(f2,t)
{
F_PROFILE(f2,t,i) = flow;
}
end_f_loop(f2,t)

}

Thank you!

KevinZ09

January 10, 2017 11:28

I'm short on time now, probably have more time tomorrow. But my suggestion would be to use DEFINE_ADJUST instead of DEFINE_EXECUTE_AT_END/DEFINE_PROFILE. The former updates the value at the start of the timestep/iteration and then adjusts your boundary value accordingly. Then hook the DEFINE_ADJUST macro.

tanpeilai

January 11, 2017 01:44

Thank you for you kindness, and I try it now. I used DEFINE_ADJUST too, but I think I didn't use it correctly. Thank you and if I correct it, I tell you at first.

KevinZ09

January 11, 2017 05:23

Here's a UDF I think should work, though haven't tried it. Either way, give it a shot, or compare it with yours, and see if it works. If not, or yours doesn't, let me know.

Code:



#include "udf.h"

#define P_outlet_ID 2



  real flow;  /* defined outside because will be used in multiple DEFINE macros */



DEFINE_ADJUST(adjust, domain)

{



  /* "Parallelized" Sections */

  #if !RP_HOST  /* Compile this section for computing processes only (serial

         and node) since these variables are not available on the host */

     Thread *thread;

     face_t f;

     thread = Lookup_Thread(domain, P_outlet_ID);



     flow = 0.0;



     begin_f_loop(f, thread) /* loop over all faces in thread "thread" */

     {

        /* If this is the node to which face "officially" belongs,*/

        if (PRINCIPAL_FACE_P(f,thread)) /* Always TRUE in serial version */

        {

           flow +=F_FLUX(f,thread);

        }

     }

     end_f_loop(f, thread)



     #if RP_NODE

        /* Perform node synchronized actions here. Does nothing in Serial */

        flow = PRF_GRSUM1(flow);

     #endif /* RP_NODE */



  #endif /* !RP_HOST */



}





DEFINE_PROFILE(mass_pri, thread, position)

{

  /* "Parallelized" Sections */

  #if !RP_HOST  /* Compile this section for computing processes only (serial

         and node) since these variables are not available on the host */

     face_t f;

     begin_f_loop(f, thread)

     {

        F_PROFILE(f, thread, position) = flow;

     }

     end_f_loop(f, thread)

 #endif /* !RP_HOST */

}

tanpeilai

January 12, 2017 03:17

Thank you again, it works very well. when I change adjust to EXECUTE_AT_END, it works great too.

But there some different, when use adjust, the mass flow in is from previous iteration, so the two monitors( pressure out and mass flow in) aren't same. when I use EXECUTE_AT_END, they are not same too, but it same to previous timestep. It doesn't change throughout the time step, I think maybe it is better to conservation of mass. thank you.
step flow-time surf-mon-1 surf-mon-2
239 8.6180e+01 -4.8995e-02 0.0000e+00

step flow-time surf-mon-1 surf-mon-2
240 8.6280e+01 -4.8406e-02 4.8995e-02

step flow-time surf-mon-1 surf-mon-2
241 8.6380e+01 -4.7777e-02 4.8406e-02

#include "udf.h"
#define P_outlet_ID 2

real flow; /* defined outside because will be used in multiple DEFINE macros */

DEFINE_EXECUTE_AT_END(measure_mass_flow)
{

/* "Parallelized" Sections */
#if !RP_HOST /* Compile this section for computing processes only (serial
and node) since these variables are not available on the host */
Domain *domain;
Thread *thread;
face_t f;

domain = Get_Domain(1);
thread = Lookup_Thread(domain, P_outlet_ID);

flow = 0.0;

begin_f_loop(f, thread) /* loop over all faces in thread "thread" */
{
/* If this is the node to which face "officially" belongs,*/
if (PRINCIPAL_FACE_P(f,thread)) /* Always TRUE in serial version */
{
flow +=F_FLUX(f,thread);
}
}
end_f_loop(f, thread)

#if RP_NODE
/* Perform node synchronized actions here. Does nothing in Serial */
flow = PRF_GRSUM1(flow);
#endif /* RP_NODE */

#endif /* !RP_HOST */

}

DEFINE_PROFILE(mass_pri, thread, position)
{
/* "Parallelized" Sections */
#if !RP_HOST /* Compile this section for computing processes only (serial
and node) since these variables are not available on the host */
face_t f;
begin_f_loop(f, thread)
{
F_PROFILE(f, thread, position) = flow;
}
end_f_loop(f, thread)
#endif /* !RP_HOST */
}

It's almost same to yours. Thank you very much.

KevinZ09

January 12, 2017 03:58

In steady-state runs, there isn't much of a difference between the two, except for when they are called and executed. DEFINE_AT_END is indeed executed at end of iteration, however, the outcome isn't used yet. Meaning, you calculate the mass flow rate, but it isn't used until the next iteration starts, as the macro is executed at the end of the iteration. With DEFINE_ADJUST, it's called at the start of the iteration, before Fluent updates anything else. So it's basically used at the exact same time in either case, the only difference being when you, as user, can access the value of flow. But you're not using it until the next iteration and won't influence the solution until the next iteration. So, to my understanding, it won't affect mass conservation or anything either, as flow equations, residuals and convergence have already been updated before DEFINE_AT_END is called.

In transient runs, it's different though. DEFINE_AT_END is called only at the end of a timestep, while DEFINE_ADJUST is called at the start of every iteration. So the latter is called more frequently if you've got multiple iterations per timesteps. So depends what you want as well.

razi.me05

March 9, 2017 15:19

Hi since this is a new thread and the context is similar I thought I might ask hear my question: I have written the following udf for calculating power on a 2D wall. It runs totally fine in serial but when I run in parallel at crushes with sigsev error. I put some message flag to identify where it gets stuck and I saw that it stops before

Code:

node_to_host_real_1(power);

and it does not run any

Quote:

#if !RP_HOST
.
.
#endif

my udf is the following:

Code:

DEFINE_EXECUTE_AT_END(POWER_CALC_500)

{

        

        

        real power = 0.0;

        

        #if !RP_HOST

        Domain *dom;

        Thread *thl, *ths, *ct;

        Node *v;

        int n;

        cell_t c;

        face_t f;

        real x[2],y[2];

        real A[ND_ND];

        real dl;

        real tx, ty;

        real powl = 0.0, pows = 0.0;

        #endif

        

        #if !RP_NODE

        FILE *fp;

        fp = fopen("power_500.dat","a");

        #endif

        

        

        #if !RP_HOST

        dom = Get_Domain(1);

        thl = Lookup_Thread(dom,11);

        ths = Lookup_Thread(dom,14);

        

        begin_f_loop_int (f, thl)

        {

                if (PRINCIPAL_FACE_P(f, thl))

                {

                f_node_loop (f, thl, n)

        {

                        v = F_NODE (f, thl, n);

                        x[n] = NODE_X (v);

                        y[n] = NODE_Y (v);

                }         



                dl = sqrt(pow(x[1]-x[0],2)+pow(y[1]-y[0],2));        

                c = F_C0(f, thl);

                ct = THREAD_T1(thl);

                tx = -C_P(c,ct)+mu*(2*C_DUDX(c,ct)+C_DVDX(c,ct)+C_DUDY(c,ct));

                ty = -C_P(c,ct)+mu*(2*C_DVDY(c,ct)+C_DVDX(c,ct)+C_DUDY(c,ct));

                powl += (tx*C_U(c,ct)+ty*C_V(c,ct))*dl;

                }

        }

        end_f_loop_int (f, thl);



        

        begin_f_loop_int (f, ths)

        {

                if (PRINCIPAL_FACE_P(f, ths))

                {

                f_node_loop (f, ths, n)

        {

                        v = F_NODE (f, ths, n);

                        x[n] = NODE_X (v);

                        y[n] = NODE_Y (v);

                }         



                dl = sqrt(pow(x[1]-x[0],2)+pow(y[1]-y[0],2));        

                c = F_C0(f, ths);

                ct = THREAD_T0(ths);

                tx = -C_P(c,ct)+mu*(2*C_DUDX(c,ct)+C_DVDX(c,ct)+C_DUDY(c,ct));

                ty = -C_P(c,ct)+mu*(2*C_DVDY(c,ct)+C_DVDX(c,ct)+C_DUDY(c,ct));

                pows += (tx*C_U(c,ct)+ty*C_V(c,ct))*dl;

                } 

        }

        end_f_loop_int (f, ths);





        

        power = powl-pows;

        

        #if RP_NODE

        power = PRF_GRSUM1(power);

        #endif

        #endif

        

        

        

        node_to_host_real_1(power);

                

        #if !RP_NODE

        fprintf(fp, "%1.6e %1.6e \n", CURRENT_TIME, power);

           fclose(fp);

        #endif

        

}

I would really appreciate if someone can help.

razi.me05

March 9, 2017 18:04

Nevermind. Finally could figure it out. THREAD_T1 does not even exist in my case.

YNREDDY

December 9, 2019 00:51

begin_f_loop in parallel processing

Dear all

A doubt regarding how to make sure if the begin_f_loop computes on all compute nodes in parallel processing.

for instance, if i have 200 faces on a boundary and i want to write a UDF that loops over 200 faces, and if i give a temporary looping variable to count how many times the the loop is executed, I get 100 in parallel processing, but 200 in serial processing.

In order to loop over all the faces in all compute nodes, what should be done?

Please suggest few examples

AlexanderZ

December 9, 2019 02:20

f_loop makes a loop through all surfaces

how did you get value get 100?

Quote:

In order to loop over all the faces in all compute nodes, what should be done?

what does it mean? you want each node loop over each face?
do you understand principle of parallel computing? - in parallel domain is split into several regions (number of computational nodes), each node makes computations on its own part of domain

YNREDDY

December 9, 2019 03:49

I have a UDF with loop like this -

begin_f_loop(face,thread_name)
if (PRINCIPAL_FACE_P(face,thread_name))
nf +=1;
Th = F_T(face,thread_name);
end_f_loop(face,thread_name)

Here 'nf' is the number of times my loop executes and since there are 200 faces on that boundary thread, I am expecting my loop to run 200 times. Which i get right on serial solver but not parallel.
To find the number of times the loop executes, i have done the following in the UDF

DEFINE_SOURCE(s1,c,t,dS,eqn)
{
real source;
real w = 0.002; /* thickness in m */
real tf_area = 0.0016; /* total area of interface in m2 */
real vol = tf_area*w ;
source = nf/vol;
dS = 0;
return source;
}

Define source applies to all the cells in the volume. 'vol' is the total volume of the zone where i want to apply the source. So, the source 'nf/vol' integrated over the volume of the zone should give me 200 as the net imbalance in the Total heat transfer rate reports.
Note- I have done this source UDF only to count the number of times my loop executes. I don't want the count of total number of faces in all compute nodes.
By the way, the begin_f_loop is written inside DEFINE_SOURCE macro only.

This is what i followed and it works in serial. In parallel solver, i know that the domain is split into number of computational nodes (8 nodes including node 0 in my case). The heat transfer reports shows 100 in parallel computing. There are 2 things i can think of
1. The loop runs 100 times meaning, there might be only 100 faces out of 200 faces of that boundary in some compute node
2. Reports do not account for all compute nodes, though begin_f_loop loops in all compute nodes

So, again, having known that f_loop runs in all compute nodes, How to ensure that in FLUENT.

AlexanderZ

December 10, 2019 00:52

If you need information about parallelizing Your Serial UDF
go to Ansys FLuent Customization manual -> parallel consideration
you can find all information there.

to count faces in parallel you need Global Summation

Code:

if (PRINCIPAL_FACE_P(face,thread_name))

nf +=1;

Th = F_T(face,thread_name);

end_f_loop(face,thread_name)

.....

#if RP_NODE

                        num_face = PRF_GISUM1(nf); // sum on all nodes

#endif

Message0("num_face = %d ",num_face);

All times are GMT -4. The time now is 00:48.