|
[Sponsors] |
CFX parallel set up hangs after calculations finished |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
July 23, 2018, 10:43 |
CFX parallel set up hangs after calculations finished
|
#1 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8 |
Hi everyone!
I have set up a CFX case of a centrifugal pump. I am currently running a laminar case to try to troubleshoot some issues (probably with mesh quality) which lead to crashes in the case of turbulent simulations. The simulation runs perfectly fine in serial, however when I attempt to run it in parallel, using Intel MPI local parallel set up, the CFX workbench hangs after it finishes the calculations, never writing any output files. On the other hand, checking CPU usage shows that all of the designated processors are being used by solver-mpi.exe. Furthermore, because it never finishes, there is not any error messages. I have attached my runInput.ccl file as well as the out file, however I can't seem to find much information about what the issue could be in either of them. I'm having difficulty pinpointing what it could be. Does anybody have any advice on how I can troubleshoot this issue? If there is anything else I can send to make it easier to diagnose the problem, please ask. Regards, James |
|
July 23, 2018, 16:41 |
|
#2 |
Member
Join Date: Jan 2015
Posts: 62
Rep Power: 11 |
If you’re running in CFX standalone you can turn verbose mode on by adding -v to the command line in startmethods.ccl under the intel local parallel.
What versions of ANSYS and intel MPI are you using? |
|
July 24, 2018, 06:34 |
|
#3 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8 |
Hi Christophe,
Thank you for taking the time to assist me. I appreciate your help. I am using Ansys V18.2 and Intel MPI 5.1.3.223. I was actually using the WB before, but I just ran it from command line using Code:
cfx5solve -def Pump.def -par-local -partition 6 Code:
Start Command = "%{impidir}/bin/mpirun" -v -np %{partitions} -bootstrap fork %{executable} %{arguments} < /dev/null The last thing it prints to the out file is Code:
+--------------------------------------------------------------------+ | Variable Range Information | +--------------------------------------------------------------------+ Domain Name : Rotating +--------------------------------------------------------------------+ | Variable Name | min | max | +--------------------------------------------------------------------+ | Density | 9.97E+02 | 9.97E+02 | | Specific Heat Capacity at Constant Pressure| 4.18E+03 | 4.18E+03 | | Dynamic Viscosity | 8.90E-04 | 8.90E-04 | | Thermal Conductivity | 6.07E-01 | 6.07E-01 | | Static Entropy | 0.00E+00 | 0.00E+00 | | Velocity u | -1.40E+01 | 1.41E+01 | | Velocity v | -1.11E+01 | 1.12E+01 | | Velocity w | -7.41E+00 | 7.25E+00 | | Pressure | 1.11E+04 | 1.20E+05 | | Temperature | 2.98E+02 | 2.98E+02 | +--------------------------------------------------------------------+ Domain Name : Stationary +--------------------------------------------------------------------+ | Variable Name | min | max | +--------------------------------------------------------------------+ | Density | 9.97E+02 | 9.97E+02 | | Specific Heat Capacity at Constant Pressure| 4.18E+03 | 4.18E+03 | | Dynamic Viscosity | 8.90E-04 | 8.90E-04 | | Thermal Conductivity | 6.07E-01 | 6.07E-01 | | Static Entropy | 0.00E+00 | 0.00E+00 | | Velocity u | -5.49E+00 | 4.87E+00 | | Velocity v | -5.35E+00 | 6.22E+00 | | Velocity w | -3.87E+00 | 3.05E+00 | | Pressure | 7.12E+04 | 1.09E+05 | | Temperature | 2.98E+02 | 2.98E+02 | +--------------------------------------------------------------------+ From then on, it is unresponsive although all 6 processors are being used 100% by the solver. In the end I have to kill the process to free up the processors. Any suggestions as to what I can do to fix this issue? Regards, James |
|
July 24, 2018, 11:00 |
|
#4 |
Member
Join Date: Jan 2015
Posts: 62
Rep Power: 11 |
Did you uninstall the old version of Intel MPI when you upgraded to 18.2? It may still be referencing the old MPI in the registry.
|
|
July 24, 2018, 12:14 |
|
#5 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8 |
Hi Christophe,
This is my first and only install of Ansys, and hence of Intel MPI, so unless it the call to mpirun is interfering with a previous install of mpirun, that shouldn't be it. As a little extra information, checking the version of mpirun I use shows it is only version 1.10, meaning the sourced version is the built-in version of mpirun that came with my OS. However looking through the long text file I sent earlier, it is indicated that for the parallel run, mpirun is sourced from the <Ansys_Root>/commonfiles directory. I would rather not uninstall my original version of mpirun, as I will need this for other programs and software. Furthermore, it seems to me that if an incorrect compiler is sourced, the program would not run at all, rather than just having difficulty writing results. Any other ideas about what could be causing this? Regards, James |
|
July 24, 2018, 14:05 |
|
#6 |
Member
Join Date: Jan 2015
Posts: 62
Rep Power: 11 |
We had problems like this when we upgraded to 17.1. They seemed to have taken out any of the bad commands in startmethods in 18.2
I’ve had issues where an antivirus program, Mcafee, was trying to scan a file after things were done solving (however with FEA) I’ve noticed some problems only appeared to affect certain Intel processors and not others. More so with platform MPI and their -affwidth and -affcycle commands. I’ve seen the .def.lck and .res.lck file in the solving directory get locked so ANSYS would just freeze at that point. I had to manually delete the .res.lck.lck file in the silver directory to move forward. Try checking this. |
|
July 25, 2018, 06:57 |
|
#7 |
Member
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8 |
Hi Christophe,
Thank you for the advice. I have attempted to delete any lock files that are present in the working directory. In particular, I noticed there were 2 such lock files. One of which was a file called sm.<userName>.<PID>.lock, which appeared at the beginning of the start of the simulation and I deleted before the solver finished. I guess this was to ensure that processor was not used for other purposes while the solver was running. Then after it finished, another lock file was generated zFas3A4b.lock, which locks a very long binary file with indecipherable contents. Possibly it contains the solution information, but I am not sure of this. I deleted this lock file as well as soon I could, but there was no change in the behaviour of the solver. It still hangs once it outputs the Variable Range Information, just as before. I have been looking in the working directory to see if there are any hints as to why it is behaving this way. After deleting the previous lock files, the working directory contains the following files: Code:
ccl cfx5.mms cfx5.tt def gui-err.txt mms mms.setup mms.setup.attrb mon mpd.hosts out par pids zFas3A4b I really do not understand why it is behaving this way. I have not been able to test if it is problem specific, because as a student, I do not seem have any access to any benchmarks I can test this against. If anyone has any suggestions or ideas about what could be causing this or anything else I can do to determine the problem, any help would be greatly appreciated. Regards, James |
|
December 25, 2018, 08:42 |
|
#8 |
New Member
Join Date: Nov 2018
Posts: 1
Rep Power: 0 |
Hi James!
I seem to have similar problem with ANSYS CFX 19.2. How did you solve this problem? Thank you in advance! |
|
January 10, 2020, 06:38 |
Same Problem Here
|
#9 |
New Member
RoWoM
Join Date: Jul 2018
Posts: 5
Rep Power: 7 |
Hi,
We had / have similar problems. Sometimes we can jog it into closing by opening and closing the .dir file a few times ... made us think it was a read / write permissions problem. Would love this to get solved one day. |
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Parallel Computing for ANSYS CFX R17 | Noco | CFX | 7 | January 17, 2018 16:14 |
RSH problem for parallel running in CFX | Nicola | CFX | 5 | June 18, 2012 18:31 |
CFX parallel multi-node jobs fail w/ SLURM on Ubuntu 10.04 | danieru | CFX | 0 | February 17, 2012 06:20 |
CFX Parallel Setup on windows 7 x64 | SlicedBread | CFX | 1 | November 14, 2011 17:06 |
CFX, NT parallel, Linux, best platform | Heiko Gerhauser | CFX | 1 | August 21, 2001 09:46 |