CFD Online Discussion Forums - User fortran to input/output file in a parallel run

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)

- CFX (https://www.cfd-online.com/Forums/cfx/)

- - User fortran to input/output file in a parallel run (https://www.cfd-online.com/Forums/cfx/185633-user-fortran-input-output-file-parallel-run.html)

User fortran to input/output file in a parallel run

I managed to use user fortran to input/output file in a parallel run, but I got confused for the role of master processor.

I am using user fortran to assign value of the pressure at the outlet of my computational domain. The user fortran code is a one-dimensional unsteady simulation, whose time step is the same as three-dimensional simulation. Now you see that I need to save transient results of user fortran and, sometimes, read them as the intial solution for user fortran.

My simulation is in a parallel run. I found the help document already mentioned input/output file in a parallel run. Here it is:

The CFX-Solver is designed so that all of the numerically intensive tasks can be performed in parallel. Administrative tasks, such as simulation control and user interaction, as well as the input/output phases of a parallel run, are performed serially by the master process. This approach guarantees good parallel performance and scalability of the parallel code, as well as ensuring that the input/output files are like those of a sequential run.

Therefore I only use master processor to output file in my user fortran code, here is the code block about output:

if ((PAR_MODE .EQ. 1).OR.(RUN_MODE .EQ. 0)) then
write(ctstep,*) currenttimestep
open(12,FILE=('pipeSolutionFile'//trim(adjustl(ctstep))))
write(12,'(<xnodes>F12.4)') u
write(12,'(<xnodes>F12.4)') a
write(12,'(<xnodes>F12.4)') aA
write(12,'(<xnodes>F12.4)') lamda
write(12,'(<xnodes>F12.4)') beta
close(12)
endif

However, when my code read file at the very beginning to get initial solution, the solver crashed in a paraller run but worked well in a serial run if I only user master processor to read. After I deleted code about only using master processor to read, the input phrase got to work well in a parallel run. Here is the code block about input:

! call GET_PARALLEL_INFO ('PARMOD',PAR_MODE,CNAME,CRESLT)
! call GET_PARALLEL_INFO ('RUNMOD',RUN_MODE,CNAME,CRESLT)
! if ((PAR_MODE .EQ. 1).OR.(RUN_MODE .EQ. 0)) then
open(11,FILE='../pipeSolutionFile')
read(11,'(<xnodes>F12.4)') u
read(11,'(<xnodes>F12.4)') a
read(11,'(<xnodes>F12.4)') aA
read(11,'(<xnodes>F12.4)') lamda
read(11,'(<xnodes>F12.4)') beta
close(11)
! endif

All the u, a, aA, lamda and beta are stated by SAVE function which I think would help to share data between master and slave processors.

I have two questions:
1. when a simulation runs at the very begining, is master processor the first one to call user routine? If it is not, then the slave provessor which is called firstly may need to do the input phase in my case.
2. are slave processors able to excute READ but not able to excute WRITE?

Welcome to discuss,
Thanks.

Are you using Distribuitrd or shared memory approach ? I also use parallel computing for my codes but I always run away of parallel writing and reading. There are two reasons: race conditions and overhead. I'm sure there are libraries and good practices for reading and writing in parallel computing but I'd rather to pay a price in the last stage of my computations with writing or when I'm reading.

I forgot to answer your questions. The first questions depends if you are using distribuitrd or shared memory. The second, slave can do both. The problem is: are you reading and writing in the proper manner ? Is your final file contiguos as you expected ?

Hi Julio, I have no idea about distributed and shared memory approach. I only know that I select Distributed Parallel or Local Parallel or Serial while defining run. Based on you answer, I think I may be using distributed memory approach. The output file is what I expected and the data read from input file is also good.

Could you tell me a little bit more about distributed memory approach, shared memory approach, race conditions and overhead? It is also great that if I could know where to find them in the help document or other materials.

By the way, I have a flag variable in my user fortran code. The flag is used to monitor whether this is a new time step for 3D simulation. If it is, then the main body of the user fortran code would be run and, in the case of the first time step, the fortran code will read from input file to get an initial solution; if it is not a new time step, then the main body will be skipped and results computed and stored in the call of former time step would be returned directly. I do this to avoid the main body being excuted frequently and unnecessarily by all master and slave processors, since I just want it to be run once during one time step of 3D simulation. Although the code is called a lot of times, the time cost could be reduced.

Are you using a commercial code ? what you said is a flag I think is an instruction from the main code. Nevertheless, if someone else wrote the code for you, you are ver ucky because writing parallel coding is not easy and it is not a straightforward task, since you need to consider what I said before.
Distributed memory is basically OpenMP, in this approach your spawn the instruction among the threads (multi-cores processors). Off course this is more complex but in this way you can easily get the gist of it. In this approach all the variables are shared, in the memory. Imagine that yours threads access the samve variable on the ram and they can overwrite what other thread did. Now you need to structure your code in such a way that this problem does not happen. Usually you declare private vaiables in this approach.

The distributed memory is MPI (Message PAssing Interface). This approach is more complex because now you have a section of the memory private for each task, now you do not talk about threads now are tasks. This apporach is more complex.

You can visualize OpenMP as multihreads computation. All the computations are performed in the same node (or Motherboard). MPI you send information using a network (a very fast network usually infinityband is a especification) to different nodes or computer.

In both approaches you need to deal with race conditions (one task finish before other), overhead (the price you pay for communication), complexity (this is not a serial approach) and writing and reading using parallel environment is very complex and is not efficient, because you need to make sure that you are reading the information in the same way you have that information in memory on in a file. Usually this information is contiguous in the memory, it means 1...2 ..... 3..... 4..... 5.... If you use different threads or tasks there is no guarantee that your solution will be contiguous too.

This is a VERY fast explanation and maybe a computer scientist will disagree with me, but as engineer that uses supercomputer to have my work done this is the easiest way I understood this.