|
[Sponsors] |
Running CFX parallel distributed Under linux system with loadleveler queuing system |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
December 21, 2014, 04:33 |
Running CFX parallel distributed Under linux system with loadleveler queuing system
|
#1 |
New Member
ahmad bakri
Join Date: May 2010
Posts: 20
Rep Power: 16 |
Hi all,
I have a problem, I need to run simulation under linux system and using loadleveler queuing system. I have to use the batch mode and some run file.cmd and then use llsubmit. Any idea how would the batch ( run ) file would look like. I was trying similar scripts like the one below, but got no luck. I am able to run local parallel like this: #@output = $(jobid).out #@ error = $(jobid).err #@ job_type = parallel #@ node = 1 ##@ tasks_per_node = 2 #@ total_tasks=4 ##@ node_usage = shared #@ class = medium ##@ cpu_limit=900:00:00, 900:00:00 #@ queue NP=$(cat $LOADL_HOSTFILE | wc -l) cfx5solve -part $NP -start-method "PVM Local Parallel" -def "NormalMeshCase1.def" However, with distributed parallel using following script, I am not able #@output = $(jobid).out #@ error = $(jobid).err #@ job_type = parallel #@ node = 2 ##@ tasks_per_node = 2 #@ total_tasks=4 ##@ node_usage = shared #@ class = medium ##@ cpu_limit=900:00:00, 900:00:00 #@ queue NP=$(cat $LOADL_PROCESSOR_LIST | wc -l) cfx5solve -part $NP -start-method "PVM Distributed Parallel" -par-dist $LOADL_PROCESSOR_LIST -def "NormalMeshCase1.def" I tried to run directly through solver but I got error msg An error has occurred in cfx5solve: | | | | Remote connection to kuauhccn06 (ku-auh-ccn06) exited with return | | code 1. It gave the following output: | | | | connect to address 172.30.102.6 port 544: Connection refused | | connect to address 172.30.102.6 port 544: Connection refused | | trying normal rsh (/usr/bin/rsh) | | ku-auh-ccn06.kustar.ac.ae: Connection refused | | | | Check that you have typed the hostname correctly, that you have an | | account "ahmad.albakri" on the specified host with permission to | | rsh from this host, and that (particularly for Windows hosts) it | | is running an rsh daemon. You can use the following command to | | check the connection to a UNIX machine: | | | | rsh ku-auh-ccn06 uname | | | | or the following command if it is a Windows machine: | | | | rsh ku-auh-ccn06 cmd /c echo working | +--------------------------------------------------------------------+ +--------------------------------------------------------------------+ | An error has occurred in cfx5solve: | | | | The architecture string for host kuauhccn06 (ku-auh-ccn06) could | | not be determined. | +--------------------------------------------------------------------+ +------------------------------------------------------------------- The second question I have, if am not able to run directly through solver ( interactive method) , would I still have chance to run using batch file mode ? ANY HELP PLS |
|
December 21, 2014, 05:19 |
|
#2 |
Super Moderator
Glenn Horrocks
Join Date: Mar 2009
Location: Sydney, Australia
Posts: 17,862
Rep Power: 144 |
You might need to talk to somebody more familiar with loadleveller to fix this.
But one thing I can think of which is worth looking at is that loadleveller might be running using a different user account name, and that account might not have permissions to run rsh. I would check what account loadlleveller is trying to use when it starts a job. |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RSH problem for parallel running in CFX | Nicola | CFX | 5 | June 18, 2012 19:31 |
Problems on running cfx in parallel | Nan | CFX | 1 | March 29, 2006 05:10 |
Distributed parallel runs on ANSYS CFX 10 | Manoj Kumar | CFX | 4 | January 25, 2006 09:00 |
Distributed parallel error in CFX 5.5.1 | bogesz | CFX | 6 | January 27, 2003 19:22 |
CFX, NT parallel, Linux, best platform | Heiko Gerhauser | CFX | 1 | August 21, 2001 10:46 |