|
[Sponsors] |
big difference between clockTime and executionTime |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
July 31, 2013, 18:21 |
big difference between clockTime and executionTime
|
#1 |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Dear all,
I'm trying to ran a DES simulation using a model with 30 million cells. I'm using a cluster with 6 processors 12 cores each one. I have noticed that there is a big difference between clockTime and executionTime ExecutionTime = 164.45 s ClockTime = 1000 s I've used the scotch method to decompose the domain. Furthermore the system uses infiniband so I don't think it's a connection problem. Anyone can help me? thank you best regards Luca |
|
August 1, 2013, 08:09 |
|
#2 |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
bump....any help??
|
|
August 1, 2013, 10:33 |
|
#4 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
Furthermore I have an other strange problem, after few steps on one of the nodes the program has tried to use more than 12 cpus. =>> PBS: job killed: ncpus 12.92 exceeded limit 12 (sum) do you know now to fix this problem as well? thank you best regards Luca |
||
August 16, 2013, 08:36 |
|
#5 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
@Luca: What Laurence means by IO is hardware related Input-Output, namely the time spent in reading and writing files, as well as data exchange between machines. Now, according to the PDF file you attached, the case was decomposed into 72 processors, not just 12! As for the big time discrepancy, there is also the possibility that you are over-scheduling the machines (more applications running than there are cores available). In other words, there might be more applications running on the cluster, along side your own run. This would explain the big time discrepancy. In addition, if we do the math: 1000/164 ~= 6.1, this means that there are probably 6 times more processes running than there are cores available... which makes some sense, since 6*12 = 72. So, my guess is that the job is being incorrectly scheduled on the cluster. Best regards, Bruno
__________________
|
|
August 16, 2013, 08:50 |
|
#6 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
thanks for the reply. The system that I'm using has 12 processors for each node. If I use only one node, everything is fine. I would like to use 6 nodes (72 processors) then I decomposed the domain in 72 subdomains. I'm sure that is actually using 72 processors as in the output I have somthing like this: nProcs : 16 Slaves : 15 ( "cx1-9-2-2.cx1.hpc.ic.ac.uk.16833" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16834" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16835" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16836" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16837" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16838" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16839" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16840" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16841" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16842" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16843" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16844" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16845" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16846" "cx1-9-2-2.cx1.hpc.ic.ac.uk.16847" ) This outuput is relatet to 1 node with 16 processors, when I try to use 6 nodes with 12 processors each one I have the same output a list of 72 slaves, then I think that the system is actually using 72 processors. What it happens is: 1) clockTime much greater that executionTime 2)after few steps on one of the nodes the program has tried to use more than 12 cpus. =>> PBS: job killed: ncpus 12.92 exceeded limit 12 (sum) The HPC responsible told be that for him the second problem is an OpenFOAM bug and that I scheduled the job in the correct way. I hope that I have explained the situation more clearly. best regards Luca |
||
August 16, 2013, 08:59 |
|
#7 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Luca,
Well, your description continues to indicate that the processes are all being launched on the same machine. To confirm this, what's the output for when you use 72 processors, namely regarding the "nProcs" and "Slaves" output? (and please use the [CODE] marker, as explained in the second link on my signature, for posting the more than 72 lines of output ) In addition, how exactly was OpenFOAM installed on the cluster and with which MPI toolbox? Or in other words, is the installed OpenFOAM using the cluster's MPI? Or using OpenFOAM's Open-MPI version? Best regards, Bruno
__________________
|
|
August 16, 2013, 09:20 |
|
#8 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
I have attached the output and the run script that I used to launch the simulation. I am using the hpc servers of the Imperial College, they have a list of modules available. As you can see from the run script, I have loaded the modules "open foam/2.1.1" and openmpi libraries. best regards Luca p.s.: thank you also for the other replies!! |
||
August 16, 2013, 09:48 |
|
#9 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Hi Luca,
Well, the 72 processes are assigned only to the machine "cx1-2-2-1", according to the output. OK, I did some researching online and since there are several PBS schedulers, I found the following:
Code:
#PBS -l select=6:ncpus=12:mpiprocs=12:icib=true:mem=45000mb Best regards, Bruno
__________________
|
|
August 16, 2013, 09:59 |
|
#10 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
thank you a lot, you are very helpful. I will try to change the run script and I'll let you know. best regards Luca |
||
August 16, 2013, 11:40 |
|
#11 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
I have tried to use mpiprocs=12 but it gives the same error: =>> PBS: job killed: ncpus 13.36 exceeded limit 12 (sum) -bash: line 1: 27581 Terminated /var/spool/PBS/mom_priv/jobs/5074103.cx1b.SC mpirun: abort is already in progress...hit ctrl-c again to forcibly terminate -------------------------------------------------------------------------- mpirun noticed that process rank 54 with PID 27877 on node cx1-2-3-1.cx1.hpc.ic.ac.uk exited on signal 15 (Terminated). Best regards, Luca |
||
August 16, 2013, 11:50 |
|
#12 |
Senior Member
Laurence R. McGlashan
Join Date: Mar 2009
Posts: 370
Rep Power: 23 |
Do you not need to provide a machinefile to mpirun? Normally this is an environment variable filled by the cluster.
Is there not a sysadmin at Imperial that you can ask for help?
__________________
Laurence R. McGlashan :: Website |
|
August 16, 2013, 12:07 |
|
#13 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
@Laurence - quoting Luca:
Quote:
It refers to the explicit usage of: Code:
-hostfile $PBS_NODEFILE Code:
mpirun -np 72 -hostfile $PBS_NODEFILE renumberMesh -overwrite -parallel Code:
export > /full/path/to/your/work/folder/snooping_around.txt
__________________
|
||
August 16, 2013, 12:09 |
|
#14 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
I am sorry but I don't know what is a machinefile to mpirun best regards Luca |
||
August 16, 2013, 12:22 |
|
#15 | |
Member
Luca
Join Date: Mar 2013
Posts: 59
Rep Power: 13 |
Quote:
I mean, I have to write Code:
mpirun -np 72 -hostfile $PBS_NODEFILE renumberMesh -overwrite -parallel Code:
mpirun -np 72 -hostfile $PBS_NODEFILE pisoFoam -parallel >/work/sb3712/Luca/32mm_DES_bo/simulation_bo.log Luca |
||
August 16, 2013, 12:30 |
|
#16 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Quote:
And in case it doesn't work, call that other export command before using the first mpirun: Code:
export > /work/sb3712/Luca/32mm_DES_bo/snooping_around.txt mpirun -np 72 -hostfile $PBS_NODEFILE renumberMesh -overwrite -parallel
__________________
|
||
January 10, 2014, 15:52 |
|
#17 |
Member
Join Date: Dec 2009
Posts: 49
Rep Power: 16 |
Hi guys,
I had pretty much the same problem as Luca (btw, I'm also from Imperial). I can't seemed to get more than 1 node running on the cluster (1 node = 16 cores). I tried running the parallel the dam break case on the cluster using 2 nodes. After meshing and decomposing the domain, running, Code:
mpirun -np 32 -hostfile $PBS_NODEFILE renumberMesh -overwrite -parallel > log.renumberMesh 2>&1 Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 2.2.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 2.2.x-0ee7dc546f1b Exec : renumberMesh -overwrite -parallel Date : Jan 10 2014 Time : 19:27:02 Host : "cx1-11-2-1.cx1.hpc.ic.ac.uk" PID : 20276 Case : /tmp/pbs.6213702.cx1b/damBreak32 nProcs : 32 Slaves : 31 ( "cx1-11-2-1.cx1.hpc.ic.ac.uk.20277" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20278" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20279" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20280" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20281" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20282" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20283" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20284" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20285" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20286" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20287" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20288" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20289" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20290" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20291" "cx1-11-2-4.cx1.hpc.ic.ac.uk.24996" "cx1-11-2-4.cx1.hpc.ic.ac.uk.24997" "cx1-11-2-4.cx1.hpc.ic.ac.uk.24998" "cx1-11-2-4.cx1.hpc.ic.ac.uk.24999" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25000" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25001" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25002" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25003" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25004" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25005" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25006" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25007" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25008" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25009" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25010" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25011" ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking polling iterations : 0 sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). fileModificationChecking : Monitoring run-time modified files using timeStampMaster allowSystemOperations : Disallowing user-supplied system call operations // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Create mesh for time = 0 [25] #0 Foam::error::printStack(Foam::Ostream&)[19] #0 Foam::error::printStack(Foam::Ostream&)[16] #0 [21] #0 Foam::error::printStack(Foam::Ostream&)[22] #0 Foam::error::printStack(Foam::Ostream&)[30] #0 Foam::error::printStack(Foam::Ostream&)[26] #Foam::error::printStack(Foam::Ostream&)[17] #0 0 Foam::error::printStack(Foam::Ostream&)Foam::error::printStack(Foam::Ostream&)[20] #0[24] #0 Foam::error::printStack(Foam::Ostream&)[28] [31] [18] #0 Foam::error::printStack(Foam::Ostream&)#0 [23] #0 [27] #0 [29] #0 Foam::error::printStack(Foam::Ostream&)#0 Foam::error::printStack(Foam::Ostream&)Foam::error::printStack(Foam::Ostream&)Foam::error::printStack(Foam::Ostream&)Foam::error::printStack(Foam::Ostream&)Foam::error::printStack(Foam::Ostream&) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [25] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [16] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/lin in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [22] #1 Foam::sigSegv::sigHandler(int)ux64GccDPOpt/lib/libOpenFOAM.so" [17] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [21] #1 in "/home/ehk112/OpenFOAFoam::sigSegv::sigHandler(int)M/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [29] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2GccDPOpt/lib/libOpenFOAM.so" [24] #1 .x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [31] #1 Foam::sigSegv::sigHandler(int)Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [26] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [30] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [28] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [23] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [19] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [20] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [18] #1 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linuxFoam::sigSegv::sigHandler(int)64GccDPOpt/lib/libOpenFOAM.so" [27] #1 Foam::sigSegv::sigHandler(int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [25] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [16] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [22] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [26] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [17] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [21] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [23] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [29] #2 in "/lib64/libc.so.6" [25] #3 Foam::Time::setTime(Foam::instant const&, int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [24] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [28] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [31] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [19] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [30] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [18] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [27] #2 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [20] #2 in "/lib64/libc.so.6" [22] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [16] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [26] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [17] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [21] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [23] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [24] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [28] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [29] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [31] #3 Foam::Time::setTime(Foam::instant const&, int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [25] #4 in "/lib64/libc.so.6" [30] #3 in "/lib64/libc.so.6" [19] #3 Foam::Time::setTime(Foam::instant const&, int)Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [27] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [18] #3 Foam::Time::setTime(Foam::instant const&, int) in "/lib64/libc.so.6" [20] #3 Foam::Time::setTime(Foam::instant const&, int) in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [22] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [16] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [26] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [17] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [23] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [21] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [28] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [29] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [24] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [31] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [27] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [30] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [20] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [19] #4 in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so" [18] #4 [25] in "/home/ehk112/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/bin/renumberMesh" [25] #5 __libc_start_main While running, Code:
mpirun -np 32 -hostfile $PBS_NODEFILE interFoam -parallel > log.interFoam 2>&1 Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 2.2.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 2.2.x-0ee7dc546f1b Exec : interFoam -parallel Date : Jan 10 2014 Time : 19:28:09 Host : "cx1-11-2-1.cx1.hpc.ic.ac.uk" PID : 20294 Case : /tmp/pbs.6213702.cx1b/damBreak32 nProcs : 32 Slaves : 31 ( "cx1-11-2-1.cx1.hpc.ic.ac.uk.20295" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20296" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20297" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20298" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20299" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20300" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20301" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20302" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20303" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20304" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20305" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20306" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20307" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20308" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20309" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25236" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25237" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25238" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25239" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25240" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25241" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25242" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25243" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25244" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25245" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25246" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25247" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25248" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25249" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25250" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25251" ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking polling iterations : 0 sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). fileModificationChecking : Monitoring run-time modified files using timeStampMaster allowSystemOperations : Disallowing user-supplied system call operations // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time Create mesh for time = 0 [16] [16] [16] --> FOAM FATAL ERROR: [16] Cannot find file "points" in directory "polyMesh" in times 0 down to constant [25] [26] [26] [26] --> FOAM FATAL ERROR: [28] [28] [28] --> FOAM FATAL ERROR: [28] Cannot find file "points" in directory "polyMesh" in times 0 down to constant [28] [29] [29] [29] --> FOAM FATAL ERROR: [29] [30] [30] [30] --> FOAM FATAL ERROR: [30] Cannot find file "points" in directory "polyMesh" in times 0 down to constant [30] [30] From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&) [30] in file db/Time/findInstance.C at line [31] [31] [31] --> FOAM FATAL ERROR: [31] Cannot find file "points" in directory "polyMesh" in times 0 down to constant [31] [31] From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&) [31] [16] [16] From function Time::findInstance(const fileName&, const word&, const IOobject::readOption, const word&) [16] in file db/Time/findInstance.C at line 203. [16] FOAM parallel run exiting Code:
cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-1 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 cx1-11-2-4 Would really appreciated if someone could help me out. Kind regards, katakgoreng Last edited by katakgoreng; January 10, 2014 at 17:15. |
|
January 10, 2014, 17:18 |
|
#18 |
Member
Join Date: Dec 2009
Posts: 49
Rep Power: 16 |
I also ran the testparallel application
Code:
mpirun -np 32 -hostfile $PBS_NODEFILE Test-parallel -parallel > log.Testparallel 2>&1 Code:
/*---------------------------------------------------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: 2.2.x | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ Build : 2.2.x-0ee7dc546f1b Exec : Test-parallel -parallel Date : Jan 10 2014 Time : 21:11:30 Host : "cx1-11-2-1.cx1.hpc.ic.ac.uk" PID : 20659 Case : /tmp/pbs.6213702.cx1b/damBreak32 nProcs : 32 Slaves : 31 ( "cx1-11-2-1.cx1.hpc.ic.ac.uk.20660" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20661" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20662" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20663" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20664" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20665" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20666" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20667" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20668" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20669" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20670" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20671" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20672" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20673" "cx1-11-2-1.cx1.hpc.ic.ac.uk.20674" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25480" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25481" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25482" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25483" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25484" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25485" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25486" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25487" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25488" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25489" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25490" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25491" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25492" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25493" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25494" "cx1-11-2-4.cx1.hpc.ic.ac.uk.25495" ) Pstream initialized with: floatTransfer : 0 nProcsSimpleSum : 0 commsType : nonBlocking polling iterations : 0 sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE). fileModificationChecking : Monitoring run-time modified files using timeStampMaster allowSystemOperations : Disallowing user-supplied system call operations // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // Create time [2] [3] Starting transfers [3] [3] slave sending to master 0 [5] Starting transfers [5] [5] slave sending to master 0 [6] Starting transfers [6] [6] slave sending to master 0 [6] slave receiving from master 0 [7] Starting transfers [7] [7] slave sending to master 0 [7] slave receiving from master 0 [8] Starting transfers [8] [8] slave sending to master 0 [8] slave receiving from master 0 [9] Starting transfers [9] [9] slave sending to master 0 [9] slave receiving from master 0 [12] Starting transfers [12] [12] slave sending to master 0 [12] slave receiving from master 0 [14] Starting transfers [14] [14] slave sending to master 0 [14] slave receiving from master 0 [1] Starting transfers [1] [1] slave sending to master 0 [1] slave receiving from master 0 [4] Starting transfers [4] [4] slave sending to master 0 [4] slave receiving from master 0 [3] slave receiving from master 0 [5] slave receiving from master 0 [10] Starting transfers [10] [10] slave sending to master 0 [10] slave receiving from master 0 [11] Starting transfers [11] [11] slave sending to master 0 [11] slave receiving from master 0 [13] Starting transfers [13] [13] slave sending to master 0 [13] slave receiving from master 0 [15] Starting transfers [15] [15] slave sending to master 0 [15] slave receiving from master 0 [0] Starting transfers [0] [0] master receiving from slave 1 [0] (0 1 2) [0] master receiving from slave 2 Starting transfers [28] Starting transfers [28] [28] slave sending to master 0 [28] slave receiving from master 0 [18] Starting transfers [18] [18] slave sending to master 0 [18] slave receiving from master 0 [20] Starting transfers [20] [20] slave sending to master 0 [20] slave receiving from master 0 [21] Starting transfers [21] [21] slave sending to master 0 [21] slave receiving from master 0 [25] Starting transfers [25] [25] slave sending to master 0 [25] slave receiving from master 0 [29] Starting transfers [29] [29] slave sending to master 0 [29] slave receiving from master 0 [30] Starting transfers [30] [30] slave sending to master 0 [30] slave receiving from master 0 [31] Starting transfers [31] [31] slave sending to master 0 [31] slave receiving from master 0 [16] Starting transfers [16] [16] slave sending to master 0 [16] slave receiving from master 0 [17] Starting transfers [17] [17] slave sending to master 0 [17] slave receiving from master 0 [19] Starting transfers [19] [19] slave sending to master 0 [19] slave receiving from master 0 [22] Starting transfers [22] [22] slave sending to master 0 [22] slave receiving from master 0 [23] Starting transfers [23] [23] slave sending to master 0 [23] slave receiving from master 0 [24] Starting transfers [24] [24] slave sending to master 0 [24] slave receiving from master 0 [26] Starting transfers [26] [26] slave sending to master 0 [26] slave receiving from master 0 [27] Starting transfers [27] [27] slave sending to master 0 [27] slave receiving from master 0 [2] [2] slave sending to master 0 [2] [0] (0 1 2) [0] master receiving from slave 3 [0] (0 1 2) [0] master receiving from slave 4 [0] (0 1 2) [0] master receiving from slave 5 [0] (0 1 2) [0] master receiving from slave 6 [0] (0 1 2) [0] master receiving from slave 7 [0] (0 1 2) [0] master receiving from slave 8 [0] (0 1 2) [0] master receiving from slave 9 [0] (0 1 2) [0] master receiving from slave 10 [0] (0 1 2) [0] master receiving from slave 11 [0] (0 1 2) [0] master receiving from slave 12 [0] (0 1 2) [0] master receiving from slave 13 [0] (0 1 2) [0] master receiving from slave 14 [0] (0 1 2) [0] master receiving from slave 15 [0] (0 1 2) [0] master receiving from slave 16 [0] (0 1 2) [0] master receiving from slave 17 [0] (0 1 2) [0] master receiving from slave 18 [0] (0 1 2) [0] master receiving from slave 19 [0] (0 1 2) [0] master receiving from slave 20 [0] (0 1 2) [0] master receiving from slave 21 [0] (0 1 2) [0] master receiving from slave 22 [0] (0 1 2) [0] master receiving from slave 23 [0] (0 1 2) [0] master receiving from slave 24 [0] (0 1 2) [0] master receiving from slave 25 [0] (0 1 2) [0] master receiving from slave 26 [0] (0 1 2) [0] master receiving from slave 27 [0] (0 1 2) [0] master receiving from slave 28 [0] (0 1 2) [0] master receiving from slave 29 [0] (0 1 2) [0] master receiving from slave 30 [0] (0 1 2) [0] master receiving from slave 31 [0] (0 1 2) [0] master sending to slave 1 [0] master sending to slave 2 [0] master sending to slave 3 [0] master sending to slave 4 [0] master sending to slave 5 [1] (0 1 2) [4] (0 [3] (0 1 2) [0] master sending to slave 6 [0] master sending to slave 7 [0] master sending to slave 8 [0] master sending to slave 9 [0] master sending to slave 10 [0] 1 2) [6] (0 1 2) [7] (0 1 2) [8] (0 1 2) [5] (0 1 2) [10] (0 1 2) [9] (0 1 2) master sending to slave 11 [0] master sending to slave 12 [0] master sending to slave 13 [0] master sending to slave 14 [0] master sending to slave 15 [0] master sending to slave 16 [0] master sending to slave 17 [0] master sending to slave 18[15] (0 1 2) [14] (0 1 2) [11] (0 1 2) [12] (0 1 2) [13] (0 1 2) [0] master sending to slave 19 [0] master sending to slave 20 [0] master sending to slave 21 [0] master sending to slave 22 [0] master sending to slave 23 [0] master sending to slave 24 [0] master sending to slave 25 [0] master sending to slave 26 [0] master sending to slave 27 [0] master sending to slave 28 [0] master sending to slave 29 [0] master sending to slave 30 [0] master sending to slave 31 End Finalising parallel run slave receiving from master 0 [2] (0 1 2) [16] (0 1 2) [17] (0 1 2) [18] (0 1 2) [19] (0 1 2) [20] (0 1 2) [22] (0 1 2) [23] (0 1 2) [21] (0 1 2) [24] (0 1 2) [26] (0 1 2) [25] (0 1 2) [29] (0 1 2) [30] (0 1 2) [31] (0 1 2) [27] (0 1 2) [28] (0 1 2) |
|
January 26, 2014, 10:43 |
|
#19 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,981
Blog Entries: 45
Rep Power: 128 |
Greetings katakgoreng,
I've finally managed to get around to this thread on my to-do list... A few questions:
Best regards, Bruno
__________________
|
|
February 19, 2014, 05:39 |
|
#20 |
Member
Join Date: Dec 2009
Posts: 49
Rep Power: 16 |
Hi Bruno,
Sorry for the late respond. My wife gave birth last month, so I take some time off from work. The following was the PBS job script that I previously use to submit job to the cluster. Code:
#!/bin/bash # # --- SET THE PBS DIRECTIVES #PBS -l walltime=2:00:00 #PBS -l select=2:ncpus=16:mpiprocs=16:mem=4000mb #PBS -e ehk112_err #PBS -o ehk112_out #PBS -m ae #PBS -V echo "=============================================" echo "FOLDER LOCATION AND NAME" echo "=============================================" CASEFOLDER="damBreak32" CASELOCATION="$WORK/CASEFOLDER/" echo $CASEFOLDER echo $CASELOCATION echo "=============================================" echo "SOURCING SYSTEM BASHRC" echo "=============================================" . $HOME/.bash_profile echo "=============================================" echo "SOURCING OPENFOAM 2.2.x BASHRC" echo "=============================================" . /home/ehk112/OpenFOAM/OpenFOAM-2.2.x/etc/bashrc echo "=============================================" echo "COPYING CASE FILE INTO TEMP FOLDER" echo "=============================================" cp -rf $CASELOCATION/$CASEFOLDER $TMPDIR cd $TMPDIR/$CASEFOLDER echo "=============================================" echo "RUNNING OPENFOAM BATCH SCRIPT" echo "=============================================" ./Allrun echo "=============================================" echo "COPY RESULT INTO $WORK/OpenFOAM" echo "=============================================" cd .. mv -f $CASEFOLDER $WORK/OpenFOAM Code:
#!/bin/bash # =============================== # PREPARE CASES # =============================== rm log.* cp 0/backup/* 0/ # =============================== # MESHING # =============================== blockMesh > log.blockMesh 2>&1 # =============================== # SET FIELD # =============================== setFields > log.setField 2>&1 # =============================== # DECOMPOSE DOMAIN # =============================== decomposePar > log.decomposePar 2>&1 # =============================== # RENUMBER MESH # =============================== mpirun -np 32 -hostfile $PBS_NODEFILE renumberMesh -overwrite -parallel > log.renumberMesh 2>&1 # =============================== # RUN APPLICATION # =============================== mpirun -np 32 -hostfile $PBS_NODEFILE interFoam -parallel > log.interFoam 2>&1 1. Sourcing the environmental variables 2. Copying the case folder to the cluster temporary folder 3. Execute Openfoam batch script 4. After finish, copying back the result from the cluster temporary folder I will try the method that you proposed and report back later. Kind regards, katakgoreng |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extrusion with OpenFoam problem No. Iterations 0 | Lord Kelvin | OpenFOAM Running, Solving & CFD | 8 | March 28, 2016 12:08 |
Moving mesh | Niklas Wikstrom (Wikstrom) | OpenFOAM Running, Solving & CFD | 122 | June 15, 2014 07:20 |
Upgraded from Karmic Koala 9.10 to Lucid Lynx10.04.3 | bookie56 | OpenFOAM Installation | 8 | August 13, 2011 05:03 |
Difference between executionTime and clockTime | jml | OpenFOAM Running, Solving & CFD | 1 | December 10, 2008 08:58 |
IcoFoam parallel woes | msrinath80 | OpenFOAM Running, Solving & CFD | 9 | July 22, 2007 03:58 |