CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > Software User Forums > ANSYS > CFX

CFX parallel set up hangs after calculations finished

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 23, 2018, 10:43
Default CFX parallel set up hangs after calculations finished
  #1
Member
 
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8
jgross is on a distinguished road
Hi everyone!


I have set up a CFX case of a centrifugal pump. I am currently running a laminar case to try to troubleshoot some issues (probably with mesh quality) which lead to crashes in the case of turbulent simulations. The simulation runs perfectly fine in serial, however when I attempt to run it in parallel, using Intel MPI local parallel set up, the CFX workbench hangs after it finishes the calculations, never writing any output files.


On the other hand, checking CPU usage shows that all of the designated processors are being used by solver-mpi.exe. Furthermore, because it never finishes, there is not any error messages. I have attached my runInput.ccl file as well as the out file, however I can't seem to find much information about what the issue could be in either of them.

I'm having difficulty pinpointing what it could be. Does anybody have any advice on how I can troubleshoot this issue?


If there is anything else I can send to make it easier to diagnose the problem, please ask.


Regards,
James
Attached Files
File Type: zip Case.zip (17.8 KB, 2 views)
jgross is offline   Reply With Quote

Old   July 23, 2018, 16:41
Default
  #2
Member
 
Join Date: Jan 2015
Posts: 62
Rep Power: 11
Christophe is on a distinguished road
If you’re running in CFX standalone you can turn verbose mode on by adding -v to the command line in startmethods.ccl under the intel local parallel.

What versions of ANSYS and intel MPI are you using?
Christophe is offline   Reply With Quote

Old   July 24, 2018, 06:34
Default
  #3
Member
 
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8
jgross is on a distinguished road
Hi Christophe,

Thank you for taking the time to assist me. I appreciate your help.

I am using Ansys V18.2 and Intel MPI 5.1.3.223.

I was actually using the WB before, but I just ran it from command line using
Code:
cfx5solve -def Pump.def -par-local -partition 6
First, I modified the start-methods.ccl in the <CFXROOT>/etc/ directory to turn on verbosity for Intel MPI local parallel.
Code:
Start Command = "%{impidir}/bin/mpirun" -v -np %{partitions} -bootstrap fork %{executable} %{arguments} < /dev/null
I got a significant amount of output from the verbose option, but I don't think much of it is particularly useful. I have included it as text file if it can help at all. More interesting is that there is no verbose output once the calculations begin, and the same issue persists. Namely, the calculations have finished, but the solver gets hung up before writing the results file.

The last thing it prints to the out file is

Code:
 +--------------------------------------------------------------------+
 |                     Variable Range Information                     |
 +--------------------------------------------------------------------+

 Domain Name : Rotating
 +--------------------------------------------------------------------+
 |      Variable Name                         |    min    |    max    |
 +--------------------------------------------------------------------+
 | Density                                    |  9.97E+02 |  9.97E+02 |
 | Specific Heat Capacity at Constant Pressure|  4.18E+03 |  4.18E+03 |
 | Dynamic Viscosity                          |  8.90E-04 |  8.90E-04 |
 | Thermal Conductivity                       |  6.07E-01 |  6.07E-01 |
 | Static Entropy                             |  0.00E+00 |  0.00E+00 |
 | Velocity u                                 | -1.40E+01 |  1.41E+01 |
 | Velocity v                                 | -1.11E+01 |  1.12E+01 |
 | Velocity w                                 | -7.41E+00 |  7.25E+00 |
 | Pressure                                   |  1.11E+04 |  1.20E+05 |
 | Temperature                                |  2.98E+02 |  2.98E+02 |
 +--------------------------------------------------------------------+

 Domain Name : Stationary
 +--------------------------------------------------------------------+
 |      Variable Name                         |    min    |    max    |
 +--------------------------------------------------------------------+
 | Density                                    |  9.97E+02 |  9.97E+02 |
 | Specific Heat Capacity at Constant Pressure|  4.18E+03 |  4.18E+03 |
 | Dynamic Viscosity                          |  8.90E-04 |  8.90E-04 |
 | Thermal Conductivity                       |  6.07E-01 |  6.07E-01 |
 | Static Entropy                             |  0.00E+00 |  0.00E+00 |
 | Velocity u                                 | -5.49E+00 |  4.87E+00 |
 | Velocity v                                 | -5.35E+00 |  6.22E+00 |
 | Velocity w                                 | -3.87E+00 |  3.05E+00 |
 | Pressure                                   |  7.12E+04 |  1.09E+05 |
 | Temperature                                |  2.98E+02 |  2.98E+02 |
  +--------------------------------------------------------------------+

From then on, it is unresponsive although all 6 processors are being used 100% by the solver. In the end I have to kill the process to free up the processors.


Any suggestions as to what I can do to fix this issue?


Regards,
James
Attached Files
File Type: txt IntelMPIVerbose.txt (32.0 KB, 10 views)
jgross is offline   Reply With Quote

Old   July 24, 2018, 11:00
Default
  #4
Member
 
Join Date: Jan 2015
Posts: 62
Rep Power: 11
Christophe is on a distinguished road
Did you uninstall the old version of Intel MPI when you upgraded to 18.2? It may still be referencing the old MPI in the registry.
Christophe is offline   Reply With Quote

Old   July 24, 2018, 12:14
Default
  #5
Member
 
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8
jgross is on a distinguished road
Hi Christophe,

This is my first and only install of Ansys, and hence of Intel MPI, so unless it the call to mpirun is interfering with a previous install of mpirun, that shouldn't be it. As a little extra information, checking the version of mpirun I use shows it is only version 1.10, meaning the sourced version is the built-in version of mpirun that came with my OS. However looking through the long text file I sent earlier, it is indicated that for the parallel run, mpirun is sourced from the <Ansys_Root>/commonfiles directory.

I would rather not uninstall my original version of mpirun, as I will need this for other programs and software. Furthermore, it seems to me that if an incorrect compiler is sourced, the program would not run at all, rather than just having difficulty writing results.

Any other ideas about what could be causing this?

Regards,
James
jgross is offline   Reply With Quote

Old   July 24, 2018, 14:05
Default
  #6
Member
 
Join Date: Jan 2015
Posts: 62
Rep Power: 11
Christophe is on a distinguished road
We had problems like this when we upgraded to 17.1. They seemed to have taken out any of the bad commands in startmethods in 18.2

I’ve had issues where an antivirus program, Mcafee, was trying to scan a file after things were done solving (however with FEA)

I’ve noticed some problems only appeared to affect certain Intel processors and not others. More so with platform MPI and their -affwidth and -affcycle commands.

I’ve seen the .def.lck and .res.lck file in the solving directory get locked so ANSYS would just freeze at that point. I had to manually delete the .res.lck.lck file in the silver directory to move forward. Try checking this.
Christophe is offline   Reply With Quote

Old   July 25, 2018, 06:57
Default
  #7
Member
 
James Gross
Join Date: Nov 2017
Posts: 77
Rep Power: 8
jgross is on a distinguished road
Hi Christophe,

Thank you for the advice.

I have attempted to delete any lock files that are present in the working directory. In particular, I noticed there were 2 such lock files. One of which was a file called sm.<userName>.<PID>.lock, which appeared at the beginning of the start of the simulation and I deleted before the solver finished. I guess this was to ensure that processor was not used for other purposes while the solver was running.

Then after it finished, another lock file was generated zFas3A4b.lock, which locks a very long binary file with indecipherable contents. Possibly it contains the solution information, but I am not sure of this. I deleted this lock file as well as soon I could, but there was no change in the behaviour of the solver. It still hangs once it outputs the Variable Range Information, just as before.

I have been looking in the working directory to see if there are any hints as to why it is behaving this way. After deleting the previous lock files, the working directory contains the following files:
Code:
ccl  cfx5.mms  cfx5.tt  def  gui-err.txt  mms  mms.setup  mms.setup.attrb  mon  mpd.hosts  out  par  pids  zFas3A4b
I have looked at each individual file for any hints as to why it is behaving like it is, but there is nothing I could see explaining this behaviour. However, I have included some of these files as a zip file if anyone can spot anything I'm missing. I'd be happy to supply the others in future posts if it can help.



I really do not understand why it is behaving this way. I have not been able to test if it is problem specific, because as a student, I do not seem have any access to any benchmarks I can test this against. If anyone has any suggestions or ideas about what could be causing this or anything else I can do to determine the problem, any help would be greatly appreciated.


Regards,
James
Attached Files
File Type: zip ccl.zip (2.6 KB, 1 views)
File Type: zip cfx.mms.zip (89.8 KB, 0 views)
File Type: zip cfx5.tt.zip (45.0 KB, 0 views)
File Type: zip gui-err.txt.zip (418 Bytes, 0 views)
File Type: zip mms.zip (89.7 KB, 0 views)
jgross is offline   Reply With Quote

Old   December 25, 2018, 08:42
Default
  #8
New Member
 
Join Date: Nov 2018
Posts: 1
Rep Power: 0
ansys245 is on a distinguished road
Hi James!
I seem to have similar problem with ANSYS CFX 19.2.
How did you solve this problem?

Thank you in advance!
ansys245 is offline   Reply With Quote

Old   January 10, 2020, 06:38
Default Same Problem Here
  #9
New Member
 
RoWoM
Join Date: Jul 2018
Posts: 5
Rep Power: 7
RoWoM is on a distinguished road
Hi,
We had / have similar problems. Sometimes we can jog it into closing by opening and closing the .dir file a few times ... made us think it was a read / write permissions problem.
Would love this to get solved one day.
RoWoM is offline   Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel Computing for ANSYS CFX R17 Noco CFX 7 January 17, 2018 16:14
RSH problem for parallel running in CFX Nicola CFX 5 June 18, 2012 18:31
CFX parallel multi-node jobs fail w/ SLURM on Ubuntu 10.04 danieru CFX 0 February 17, 2012 06:20
CFX Parallel Setup on windows 7 x64 SlicedBread CFX 1 November 14, 2011 17:06
CFX, NT parallel, Linux, best platform Heiko Gerhauser CFX 1 August 21, 2001 09:46


All times are GMT -4. The time now is 14:03.