CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

Running Parallel in Batch mode

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   June 11, 2014, 16:37
Exclamation Running Parallel in Batch mode
  #1
New Member
 
Luiz Eduardo
Join Date: May 2010
Posts: 19
Rep Power: 7
DudaAPD is on a distinguished road
Hi all,

I am trying to run CFX 5 in parallel in a linux cluster.

I've read that CFX needs to have the master node specified in order to run, however I am not allowed to use it, as it is normally the case in universities. I┤ve seen in a thread in this forum that if I log into one of the nodes to be used, it should work, but it is not working for me. Here is my command file:

Code:
INPUT=./CFXRunBolhaL200m.def
HOSTLIST=host-list

ssh n12 cfx5solve -par-dist $HOSTLIST -def $INPUT -double -chdir /home/luizadler/CFX -start-method \"Platform MPI Distributed Parallel\" -batch
My host list is written as "n12*4,n13*4"


The error:

Quote:
Warning!

Host name lookup failed for host host-list

An error has occurred in cfx5solve:

Unable to find the master host n12.xxxxxxxxxx
(n12.xxxxxxxxxx) in the host list: at least one partition must be
assigned to the master host.
If I write my command as (no host-file):

Code:
ssh n12 cfx5solve -par-dist \"n12*4,n13*4\" -def $INPUT -double -chdir /home/luizadler/CFX -start-method \"Platform MPI Distributed Parallel\" -batch
I have this error:

Quote:
Warning!

Host name lookup failed for host n13

An error has occurred in cfx5solve:

Remote connection to n13 exited with return code 127. It gave the
following output:

sh: rsh: command not found

Check that you have typed the hostname correctly, that you have an account
"xxxxxxx" on the specified host with permission to rsh from this host,
and that (particularly for Windows hosts) it is running an rsh daemon. You
can use the following command to check the connection to a UNIX machine:

rsh n13 uname

or the following command if it is a Windows machine:

rsh n13 cmd /c echo working

An error has occurred in cfx5solve:

The architecture string for host n13 could not be determined.
Any ideas?



Another question is how can I follow the solution once it is running correctly? No need to see the monitors (unless it is easy), but it would be useful to see the residuals.

I've done this kind of stuff with fluent, but now I need to do it with CFX which is quite different....

Thanks for the help!!
DudaAPD is offline   Reply With Quote

Old   July 6, 2014, 10:46
Default
  #2
New Member
 
Luiz Eduardo
Join Date: May 2010
Posts: 19
Rep Power: 7
DudaAPD is on a distinguished road
Would this be an installation problem in the cluster? Or the commands are wrong? Right now we are changing the person responsible for the cluster, hence I can't check if it is an installation problem, but it would helpful to know if this might be the case.
DudaAPD is offline   Reply With Quote

Old   July 6, 2014, 18:25
Default
  #3
Super Moderator
 
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 10,824
Rep Power: 85
ghorrocks has a spectacular aura aboutghorrocks has a spectacular aura aboutghorrocks has a spectacular aura about
Yes, you have not installed distributed parallel for CFX properly. CFX has a separate distributed parallel setup procedure - see the installation guide for details.
ghorrocks is offline   Reply With Quote

Old   July 7, 2014, 08:15
Default
  #4
New Member
 
Luiz Eduardo
Join Date: May 2010
Posts: 19
Rep Power: 7
DudaAPD is on a distinguished road
Thank you!
DudaAPD is offline   Reply With Quote

Old   August 8, 2014, 07:56
Default
  #5
Senior Member
 
Bruno
Join Date: Mar 2009
Location: Brazil
Posts: 236
Rep Power: 12
brunoc is on a distinguished road
Hi Luiz,

There are a few different problems here.
  1. You should ensure that you can log into any of the compute nodes without being asked for a password. Google 'ssh without password' an follow the instructions. Be sure you can do something as 'ssh compute-node-name ls' without needing a password before proceeding.

    After this alone, you should be able to run CFX in serial in another node:
    Code:
    ssh n13 /ansys_inc/vXXX/CFX/bin/cfx5solve -def /pathtodef/deffile.def -chdir /pathtodef/ -batch
  2. You have to tell CFX to use ssh for remote connections. The documentation mentions a few ways to do that. One of them is creating the environment variable 'CFX5RSH=ssh'. Do that for all nodes on your cluster. (How you do this will depend on what linux distribution you use, but google can help you with this to. In my case, with SUSE, I added the line 'export CFX5RSH=ssh' to '~/.bashrc'.)

  3. The file 'ansys_inc/vXXX/CFX/config/hostinfo.ccl' must be properly configured on ALL compute nodes. The documentation tells you how to do that. This file will contain the name of each of the compute nodes and the installation path to CFX in each of them. Something like this:


    Code:
    SIMULATION CONTROL:
      EXECUTION CONTROL:
        PARALLEL HOST LIBRARY:
          HOST DEFINITION: n12
            Installation Root = /ansys_inc/v%v/CFX
            Host Architecture String = linux-amd64
          END # HOST DEFINITION n12
          HOST DEFINITION: n13
            Installation Root = /ansys_inc/v%v/CFX
            Host Architecture String = linux-amd64
          END # HOST DEFINITION n13
       END # PARALLEL HOST LIBRARY
      END # EXECUTION CONTROL
    END # SIMULATION CONTROL

After you get this going we can check for other errors.

Cheers
brunoc is offline   Reply With Quote

Old   August 12, 2014, 08:35
Default
  #6
New Member
 
Luiz Eduardo
Join Date: May 2010
Posts: 19
Rep Power: 7
DudaAPD is on a distinguished road
Hi Bruno,

Thank you for your help!
I've done what you wrote, however I still have the following License problem:

Code:
+--------------------------------------------------------------------+
 | ERROR #001100247 has occurred in subroutine .                      |
 | Message:                                                           |
 |                                                                    |
 | The solver is unable to continue because of licensing problems.    |
 |                                                                    |
 | A license for the following capability level could not be checked  |
 | out:                                                               |
 |                                                                    |
 | ANSYS CFX Solver (Max 128K Nodes)                                  |
 |                                                                    |
 | Please carefully examine the error message output above and check  |
 | that:                                                              |
 |                                                                    |
 | 1) The license server is specified correctly and is running.       |
 |                                                                    |
 | 2) An appropriate license is available for checking out.           |
 |                                                                    |
 | These problems can be checked using the ANSYS Client ANSLIC_ADMIN  |
 | utility.  For further troubleshooting information please consult   |
 | ANSYS, Inc. Licensing Guide.                                       |
 +--------------------------------------------------------------------+

 +--------------------------------------------------------------------+
 |                An error has occurred in cfx5solve:                 |
 |                                                                    |
 | The ANSYS CFX solver exited with return code 1.   No results file  |
 | has been created.                                                  |
 +--------------------------------------------------------------------+
When I run serial in the master node, there is no problem at all. When I try to run in a different node I have this issue. I've changed the hostlinfo.ccl to include a node n05, for instance, and tried to run in serial with the command you showed. License problem occured, as well as when I've changed the node to n06, which it was not included in the hostinfo.ccl file. All serial cases. Hence, I believe the hostinfo.ccl file might not be the issue. Maybe something with the environment setup? (I am not a Linux expert, unfortunately).

Any thoughts?

Thanks again,

Duda
DudaAPD is offline   Reply With Quote

Old   August 12, 2014, 10:57
Default
  #7
Senior Member
 
Bruno
Join Date: Mar 2009
Location: Brazil
Posts: 236
Rep Power: 12
brunoc is on a distinguished road
Hi Luiz,

Both the error cause and the answer are right there at the error message:

Error cause:
| The solver is unable to continue because of licensing problems. |

What you should check:
| 1) The license server is specified correctly and is running. |

The compute-nodes are probably not properly configured to check out licenses from the license server. Apply to the compute nodes the same procedure you did for the master node regarding license configuration. The documentation also explains how to do that.

Abrašo
brunoc is offline   Reply With Quote

Old   August 15, 2014, 07:54
Default
  #8
New Member
 
Luiz Eduardo
Join Date: May 2010
Posts: 19
Rep Power: 7
DudaAPD is on a distinguished road
Thanks again Bruno. I am trying to sort it out, and I will post here again as soon as I have something solved

Abrašos!
DudaAPD is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
simpleFoam parallel AndrewMortimer OpenFOAM Running, Solving & CFD 10 November 12, 2013 18:03
batch mode - parallel run turbotel CFX 2 March 29, 2011 16:53
cfdpost in batch mode taichijulie CFX 1 October 25, 2010 15:29
Grid Check Fails in Parallel Processing Mode askance Main CFD Forum 0 October 20, 2010 10:11
Kubuntu uses dash breaks All scripts in tutorials platopus OpenFOAM Bugs 8 April 15, 2008 07:52


All times are GMT -4. The time now is 13:09.