Dear forum, I have some job
I have some jobs running on our Opteron Myrinet cluster. Convergence is fine but jobs are dam slow. I see nearly no speedup to runs on ethernet workstations and I am wondering if OpenMPI uses Mrinet quite fine. I have built it with mx-Support. We have an average of about 5 min/Iteration (about 10-15 Pressure steps per Iterations, fine) whereas FLUENT needs about 30 seconds for an iteration on same CPU-Number and same mesh.
Are you sure the processes wer
Are you sure the processes were distributed to the cluster nodes and they are not all running the node you launched mpirun from?
If this is a linux cluster, log into one of the remote nodes you explicitly told them to run on and check the running processes using top or ps. If you are using a queuing/scheduling software (PBS, SGE, etc) find where it sent the processes and perform the preceding.
Hi Chris, yes I did that. T
yes I did that. They run as they should. The only thing I am doing so far is not distributing the data to the local nodes but running this from a nfs-share. However, this should only influence writing-time of backup-data.
Difference to FLUENT is OpenMPI vs. HP-MPI.
OpenMPI will not have Myrinet
OpenMPI will not have Myrinet support by default. You will have to recompile OpenMPI with Myrinet support for it to work properly. Or just use HP-MPI, that works too (although you will have to buy a licence).
Eugene, I know this I added
I know this I added the OpenMPI-Myrinet Support. If I run on our workstations (no Myrinet) I get an error about missing myrinet-modules. I do not get this error on our cluster where myrinet is present. However, performance is poor and I get not feedback (except missing error-message) if myrinet is used but I suppose not.
Here are zwo Iterations from t
Here are zwo Iterations from the log:
Time = 71
DILUPBiCG: Solving for Ux, Initial residual = 0.000235829, Final residual = 4.2349e-06, No Iterations 2
DILUPBiCG: Solving for Uy, Initial residual = 0.00219142, Final residual = 6.32653e-05, No Iterations 2
DILUPBiCG: Solving for Uz, Initial residual = 0.00128352, Final residual = 1.65462e-05, No Iterations 2
GAMG: Solving for p, Initial residual = 0.00667156, Final residual = 5.49445e-06, No Iterations 9
time step continuity errors : sum local = 3.54209e-06, global = -1.48371e-07, cumulative = -0.00037855
DILUPBiCG: Solving for epsilon, Initial residual = 0.067249, Final residual = 1.9244e-10, No Iterations 1
bounding epsilon, min: -100901 max: 1.417e+09 average: 32636.2
DILUPBiCG: Solving for k, Initial residual = 2.33946e-06, Final residual = 2.33946e-06, No Iterations 0
ExecutionTime = 19028.4 s ClockTime = 19082 s
Time = 72
DILUPBiCG: Solving for Ux, Initial residual = 0.000234464, Final residual = 4.19113e-06, No Iterations 2
DILUPBiCG: Solving for Uy, Initial residual = 0.00216742, Final residual = 6.50005e-05, No Iterations 2
DILUPBiCG: Solving for Uz, Initial residual = 0.00127756, Final residual = 1.62209e-05, No Iterations 2
GAMG: Solving for p, Initial residual = 0.00666254, Final residual = 5.60993e-06, No Iterations 9
time step continuity errors : sum local = 3.61005e-06, global = -1.35679e-07, cumulative = -0.000378685
DILUPBiCG: Solving for epsilon, Initial residual = 0.0692427, Final residual = 2.30982e-10, No Iterations 1
bounding epsilon, min: -46613.7 max: 1.40629e+09 average: 32421.4
DILUPBiCG: Solving for k, Initial residual = 2.45671e-06, Final residual = 2.45671e-06, No Iterations 0
ExecutionTime = 19214.4 s ClockTime = 19268 s
This is more than 3 Minutes for one iteration. Is this ok for a case with about 26 Million cells running on 32 Opteron CPU 2220 with Myrinet-Interconnect. I feel it is much to slow..
Ok, this problem was caused by
Ok, this problem was caused by insufficient solver settings.
BastiL, would you mind sharing
BastiL, would you mind sharing what you had to change in the solver settings?
Yes I had nIterFinestLevel for
Yes I had nIterFinestLevel for the preconditioner set to high value.
|All times are GMT -4. The time now is 07:11.|