CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

Remote cluster parallel solve without master, Ansys CFX 14.5

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   May 6, 2013, 17:19
Default Remote cluster parallel solve without master, Ansys CFX 14.5
  #1
New Member
 
Andres M. Aguirre Mesa
Join Date: May 2013
Posts: 1
Rep Power: 0
aaguirre is on a distinguished road
Hi.

I'm currently trying to configure Ansys CFX 14.5 to run on a Linux Cluster (Rocks 6.1). I've already followed all the installation process, including environment variables. The communication via ssh is working, and I'm using Platform MPI. I configured the hostinfo.ccl file. I'm even able to run in distributed parallel mode using this sintax:

cfx5solve -def input.def -start-method "Platform MPI Distributed Parallel" -par-dist "master,node01*2,node02*2"

The problem is that I'm not allowed to run using master node because the cluster belongs to the university I work for.

I've tried to cheat cfx using, for example "master*0" or removing master, but the program fails with the following message:

Unable to find the master host cluster.domain.edu in the host list: at least one partition must be assigned to the master host.

I've also tried launching the run from node01, but I got something like this:

MPI Application rank 0 exited before MPI_Finalize() with status 2
An error has occurred in cfx5remote on compute-2-0.local:

/share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was
interrupted by signal TERM (15)

An error has occurred in cfx5remote on compute-2-0.local:

/share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was
interrupted by signal TERM (15)

An error has occurred in cfx5remote on compute-2-0.local:

/share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was
interrupted by signal TERM (15)

An error has occurred in cfx5remote on compute-2-1.local:

/share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was
interrupted by signal TERM (15)

An error has occurred in cfx5remote on compute-2-1.local:

/share/apps/ansys_inc/v145/CFX/bin/linux-amd64/solver-pcmpi.exe was
interrupted by signal TERM (15)

An error has occurred in cfx5solve:

The ANSYS CFX solver could not be started, or exited with return code 255.
No results file has been created.


Running at least in one processor of the master is our last option. Users are allowed to log in to the master node and launch programs from it, but are not allowed to use master processors.

I also configured APDL and I'm able to do something similar to the above, using this sintax:

ansys145 -dis -b -machines compute-2-0:2:compute-2-1:2 < input.dat > output.out

Is it possible to something similar with CFX?

Regards,


A. Aguirre.
aaguirre is offline   Reply With Quote

Old   August 27, 2013, 07:34
Default
  #2
New Member
 
Anonymous
Join Date: Aug 2013
Posts: 2
Rep Power: 0
flomer is on a distinguished road
Hello!

I just installed Rocks 6.1 on a small cluster to run Ansys CFX and I am having the same problem; how to set up parallel runs without the head node...

Did you ever find a solution to this?

Best regards,

John
flomer is offline   Reply With Quote

Old   August 28, 2013, 13:55
Default
  #3
Senior Member
 
Bruno
Join Date: Mar 2009
Location: Brazil
Posts: 224
Rep Power: 10
brunoc is on a distinguished road
CFX requires that the computer you're logged at be a part of the simulation. There is a way to do what you want called indirect start, but it involves editing some of the files from the CFX setup ('CFX/etc/start-methods.ccl') plus writing some scripts. It can be done, but its a hassle, so skip it.

Instead, just send the solver command though SSH to one of the nodes that belong to the simulation. You'll need an additional option (-chdir) directing CFX to run the solver on a specified path, though, or else it'll run on your home directory. Your command line will be something like this:

Code:
ssh node01 cfx5solve -def input.def -chdir /path/to/deffile -start-method \"Platform MPI Distributed Parallel\" -par-dist \"node01*2,node02*2\" -batch <other_options>
Notice the '\' in front of the quotation marks.

That works fine (I use it here), as long as you've got SSH configured not to ask for passwords (which you probably already do).

Cheers
brunoc is offline   Reply With Quote

Old   August 29, 2013, 02:34
Default Thanks, that worked!
  #4
New Member
 
Anonymous
Join Date: Aug 2013
Posts: 2
Rep Power: 0
flomer is on a distinguished road
Quote:
Originally Posted by brunoc View Post
CFX requires that the computer you're logged at be a part of the simulation. There is a way to do what you want called indirect start, but it involves editing some of the files from the CFX setup ('CFX/etc/start-methods.ccl') plus writing some scripts. It can be done, but its a hassle, so skip it.

Instead, just send the solver command though SSH to one of the nodes that belong to the simulation. You'll need an additional option (-chdir) directing CFX to run the solver on a specified path, though, or else it'll run on your home directory. Your command line will be something like this:

Code:
ssh node01 cfx5solve -def input.def -chdir /path/to/deffile -start-method \"Platform MPI Distributed Parallel\" -par-dist \"node01*2,node02*2\" -batch <other_options>
Notice the '\' in front of the quotation marks.

That works fine (I use it here), as long as you've got SSH configured not to ask for passwords (which you probably already do).

Cheers

Hello, Bruno!

The method you suggested works very well. Thank you!

On our setup the head node is the only node that can see the license server, so using the above method manages to get licenses and then run the parallel computation on the nodes only. This is exactly what I was looking for. Great!

We want to avoid using the head node in the calculations because this node is presumably slower than the compute nodes (head has one E5-2620 @ 2.0 GHz versus dual E5-2643 @ 3.3 GHz on the nodes). We have not been able to do much testing yet, but we suspect that using cores from the head node in the calculations might slow things down. We have set the Relative Speed settings in hostinfo.ccl, but still think it is wise to avoid using the slower cores.

Do you know if this actually makes a difference?

Anyway, not using any cores on the head node gives it more resources to keep Ganglia running smoothly and keeping the disk system happy .

Speaking of Ganglia, do you know a way to get it to keep the head node out of the reporting and resource use displays? The head node is primarily there to help us with interfacing and utilizing the cluster, so I think it is a bit odd that it as default includes its own CPU cores in the statistics and reporting. Not a huge problem, but...

Now I just need to get the InfiniBand network up to speed...

Best regards,

John
flomer is offline   Reply With Quote

Reply

Tags
cfxsolver, clusters, distributed parallel, without master

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Compressible Flow in Ansys CFX bcheruk CFX 11 February 26, 2011 19:40
FSI and parallel processing Jorn CFX 5 June 8, 2007 16:53
Temperature transferring from CFX to ANSYS? Se-Hee CFX 0 November 28, 2006 06:56
CFX - Parallel Problems CFX User CFX 0 November 1, 2004 19:12
ANSYS to acquire CFX Fred CD-adapco 0 February 18, 2003 22:03


All times are GMT -4. The time now is 20:43.