MPI Application in HPC
I am running the calculation of our engine simulation using HPC 4 processors (SGI Origin 200). The problem I have now is that after certain time steps (usually more than 1200 time steps), the calculation will be terminated and stopped because of MPICH was killed. The error message for your info can be displayed as follows:
FINISH TIME STEP NO. 1215
CPU TIME IS 14531.15 ELAPSED TIME IS 20020.70 MEMORY (DYNAMIC) U SED IS 41 MBYTES
/opt/STAR-CD/v3.22.001/MPICH/1.2.4/irix64_6.5-mips4-cc_7.3-dso/ch_shmem/bin/mpirun: 962853 Killed
PNP: Received [2005-04-14-01:06:06] SIGTERM from HOST "cngdi", MVMESH process -ID "967826".
PNP: Received [2005-04-14-01:06:07] SIGKILL from HOST "cngdi" TRACKER process -ID "971806". PNP: Saving STAR merged output file "star.run".
PNP: Saving STAR merged output file "star.info".
PNP: Saving STAR merged output file "star.pst".
PNP: Saving STAR merged output file "star.pstt".
PNP: Saving STAR master output file "star.rsi".
PNP: Saving STAR master output file "star.spd01".
PNP: Saving STAR master output file "star.spd02".
PNP: Saving STAR master output file "star.spd03".
PNP: Saving External moving mesh code output file "mvmesh.log".
PNP: Shutdown [2005-04-14-01:08:11] Execution stopped due to process failure (SIGTERM) after 20169 seconds (TOTAL ELAPSED TIME).
For your information to everyone, we are using MPI Application that is MPICH for running this calculation. And our HPC is single node-lock license. But we connect to the other server (SGI Fuel)to display the window message (what I mean is the calculation value, CPU time and etc) instead of data message only. So there is an message passing interface in our case. That's why we use MPI application for this purpose.
I suspected that this problem related to the MPI capability and also depend on the network system and distribution as well as our connection line (cable).
Does anyone have a problem like this?
Please advice and teach me,please........please.......
Any help will be appreciated highly.
Wendy Hardyono Kurniawan
|All times are GMT -4. The time now is 17:55.|