CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   OpenFOAM Running, Solving & CFD (https://www.cfd-online.com/Forums/openfoam-solving/)
-   -   Slow Cases using HPC and PBS job scheduler (https://www.cfd-online.com/Forums/openfoam-solving/147606-slow-cases-using-hpc-pbs-job-scheduler.html)

Alhasan January 24, 2015 20:15

Slow Cases using HPC and PBS job scheduler
 
Hey Everyone,

I have been using HPC at a different university for a year !! they used PBS scheduler and I had no issues like this what so ever !!

Now i have moved to this different university and when I am trying to use the HPC and PBS job scheduler again here I am having their weird problem.

- My open foam case runs normally using 64 processors just a 2D simpleFoam case.
- but suddenly some cases run extremely slow.
- so to double check I submitted the same case 5 - 6 times as 5 different jobs exact same case, One or two run normally and the others run extremely slow..???

what might me be happening ? here any tips or help, I have installed OpenFOAM in my own directory could this be causing the problem ?

PS. Now I am running the same exact cases on my Workstation with no issues, so the problem is not with my case setup.

If you require any further information please let me know

Thanks for your time,
Hasan K.J

wyldckat January 25, 2015 10:28

Quick answer: Notes about running OpenFOAM in parallel
Quote:

Diagnosing limitations: Parallel Performance of Large Case post #4

Alhasan January 26, 2015 09:39

Hey Bruno,

Thanks for your reply and sorry If I had the post in the wrong place, I thought it was a PBS problem so posted it there.

Those were very helpful links I learnt and I am stilling learning lot of tricks from those links and well some posts go over my head.

Coming back to my problem, I havent come across any similar problems in those links. (where it is working sometimes and not working sometimes)

Let me be clearer this time.

- I just have a simple airfoil case that runs very well on my Xenon workstation with no issues.

- The same case runs fine on the HPC cluster too with no issues.

- But suddenly the same case starts running extremely slow on the cluster But I also want to say this does not happen suddenly after couple 100 timesteps It either starts very slow from time step 1 or it is superfast like how it is supposed to be. (when i mean super slow 140 timesteps with 64 processors for 8 hrs. and when I mean super fast about 4 time steps per second )

- so just to check if something wrong with my case or cluster I submitted the exact same case with no modifications just copy pasted and changed their names and submitted them as 5 different jobs. And out of the 5 cases, 2 were superfast which is the normal speed and the other 3 were Exremely slow.

- I asked about this to my HPC administrators they have no answer for me on this topic especially withopenfoam and they have not had this issue with anyother software.

- I have also not installed paraView just openFOAM on the cluster on my own personal directory. the only difference in openFOAM between my Xenon and the cluste openFOAM is the installation of paraView in my Xenon and No paraView in the cluster other than that no difference.

- I have no idea what could be even causing this and I dunno how to make the problem stop, only thing that comes to my mind is if I am sharing half the number of processors from one node and the other half number of processor from another node but even then it cant be this slow !!!!

Thanks for your time,
Hasan K.J

wyldckat January 26, 2015 15:55

Hi Hasan K.J,

OK, my guess is that you're tripping over NFS lagging. In other words, there is a delay in communication in some of the situations, because the solvers are stuck waiting for disk feedback.

See subsection "3.2.5 Debug messaging and optimisation switches" of the OpenFOAM User Guide: http://www.openfoam.org/docs/user/co...plications.php - and have a look at the "Optimisation switches" topic therein.

Best regards,
Bruno

Alhasan February 1, 2015 08:53

Hi Bruno,

I did look at 3.2.5, to be honest It is going over my head I did not understand most of it :(

I did go to the WM_PROJECT_DIR/etc/controlDict file and saw there was a list a things and everything had a 0 next to it.

I am seriously lost, I had a look at it couple of different days and couple of different times, I could not come to a conclusion on what I was supposed to do. any other guidance ? I have to sort out this problem to run LES on my university HPC :(

Thanks,
Hasan K.J

wyldckat February 1, 2015 11:07

Quick answer:
Quote:

fileModificationSkew
Atime in seconds that should be set higher than the maximum delay in NFS updates and clock difference for running OpenFOAM over a NFS.

fileModificationChecking
Method of checking whether files have been modified during a simulation, either reading the timeStamp or using inotify; versions that read only master-node data exist, timeStampMaster, inotifyMaster.
Try the following combinations:
  • Code:

    fileModificationSkew  0;
    fileModificationChecking  timeStampMaster;

  • Code:

    fileModificationSkew  120;
    fileModificationChecking  timeStampMaster;

  • Code:

    fileModificationSkew  0;
    fileModificationChecking    inotifyMaster;

  • Code:

    fileModificationSkew  120;
    fileModificationChecking    inotifyMaster;

Whichever works, works.

Beyond this, try editing in your case's folder the file "system/controlDict" and change the respective parameter to this:
Code:

runTimeModifiable false;
Further beyond, try disabling writing time snapshots (set "writeInterval" to a really big number) in order to diagnose if the limitation is related to storing data on disk or not.


All times are GMT -4. The time now is 00:12.