CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   ANSYS (https://www.cfd-online.com/Forums/ansys/)
-   -   Fluent Jobs failed to start on Linux Cluster with PBS (https://www.cfd-online.com/Forums/ansys/223051-fluent-jobs-failed-start-linux-cluster-pbs.html)

schnesd2 December 19, 2019 12:03

Fluent Jobs failed to start on Linux Cluster with PBS
 
Hey there,


I'm currently writing my master's thesis using Fluent and I've come across the following problem:


I would like to run my Fluent simulation from a Windows client via RSM on a remote Linux cluster. The connection between client and cluster is established via a VPN tunnel. Client and cluster access two different license servers. The client has access to a local license server, even if there is no VPN connection. The same applies to the cluster. On the cluster, PBS Pro is used as the Job Management System. The setup of RSM on the head node (Athena) and the client was successful (test successful, data transfer successful, ...). If I want to start a simulation from the Workbench (Solution Settings: Submit to Remote Solve Manager, RSM Queue selected, ...) the job is copied to the staging directory and added to the PBS queue (Assign Job ID, Status Running), but when I connect with the assigned Execution Nodes and take a look at the CPU load and processes, no Fluent process is started...


Unfortunately, I did not come up with a useful solution while checking the log data. Only these entries with the job ID were found in the log directory of PBS:
Path: /var/spool/pbs/sched_logs/20191219
[user@athena sched_logs]# cat 20191219 | grep 61192
12/19/2019 14:12:47;0080;pbs_sched;Job;61192.athena;Consideri ng job to run
12/19/2019 14:12:47;0040;pbs_sched;Job;61192.athena;Job run
In the staging directory I also did not find any information in the files.
Do any of you have any idea why the Fluent process is not running? Where can I find more log data that could help me?


Greetings
Steffen

Светлана December 19, 2019 17:02

I didn't do such installation before but here are a few thoughts about what to check:
- Is rsm running on the execution node?
- If it is, does restarting it help with this issue?
- If it is running and restarting it does not help. Can you telnet to the execute node on port 9192. Does the connection succeed? If not, what error does it output?
- Does it work if you try to run ansys cfx instead of fluent in the case you have it installed? (You do not have to create a cfx case for this - just get it to start with a non-existing file name and it should complain that the file does not exist.)

schnesd2 December 21, 2019 08:26

Thanks for your response,

first i thought of course RSM runs on the execution nodes, but then i checked the service again. And as it seems the RSM service only runs on the head node, but not on the execution nodes...
I really think that this might be the problem.

Unfortunately the network administrator is on vacation until the beginning of next year and I cannot start the service myself...

But as soon as I have news, I will let you know!

Светлана January 23, 2020 23:17

Any news here?


All times are GMT -4. The time now is 22:56.