I am currently trying to integrate a FORTRAN CFD code I have been working on into a cluster system. The code I have runs successfully on a local machine and on a local node on the cluster system. My problems araise when trying to run my code through the queueing system. The cluster uses the standard torque queue and I have been using the qsub command to run it, currently I am only trying to run on a single processor and hence haven't tried any fancy mpi stuff.
The main errors are a tty error (which I believe is to be expected) and a "Event not found" error. The code creates the output files in its working directory which I am thinking is causing the problems although I am not sure.
Does anyone here have much experience in running fortran on a cluster? I can give more details if needed I know I am been a bit vague!
First of all: the code you are running is not multithread right? It works on your machine singlethreaded, correct?
2) is your code available to the node? I assume it will be located in a shared dir served over nfs. ( to check, run a job that list the content of your dir and see if it is there)
Or, if your sysadmin allows it, ssh directly into a node.
3) is your code compiled to run on the node? Sometimes the FE and the nodes have different architectures.
4) can you confirm that your cluster filesystem is Posix compliant? Lustre comes to mind, and I've had issues tring to use it.
5) although i'm not familiar with torque (only sge experience) i assume you have set correctly any enviromental variables. Will a simple hello word job work?
Ok enough questions for now. But try to attach the error message, maybe there is something enlightful hidden in there!
|All times are GMT -4. The time now is 13:48.|