|
[Sponsors] |
Error when running in parallel (> 512 cores) using MVAPICH2 |
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
December 7, 2013, 18:21 |
Error when running in parallel (> 512 cores) using MVAPICH2
|
#1 |
Member
Jack
Join Date: Dec 2011
Posts: 94
Rep Power: 14 |
Hi guys,
I am using a cluster to run OF 2.1.1 in parallel using MVAPICH2. I managed to run the job with upto 512 cores. But when I tried to run it with > 512 cores, I got this error. It is weird, could you plz help? Thanks in advance! Code:
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support. [0] [0] [0] --> FOAM FATAL IO ERROR: [0] error in IOstream "IOstream" for operation operator>>(Istream&, List<T>&) : reading first token [0] [0] file: IOstream at line 0. [0] [0] From function IOstream::fatalCheck(const char*) const [0] in file db/IOstreams/IOstreams/IOstream.C at line 114. [0] FOAM parallel run exiting [0] [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 |
|
December 8, 2013, 10:48 |
|
#2 |
New Member
Jerome Vienne
Join Date: Oct 2013
Posts: 2
Rep Power: 0 |
__________________
Jerome Vienne, Ph.D HPC Software Tools Group Texas Advanced Computing Center (TACC) viennej@tacc.utexas.edu | Phone: (512) 475-9322 Office: ROC 1.455B | Fax: (512) 475-9445 |
|
December 10, 2013, 12:05 |
|
#3 | |
Member
Jack
Join Date: Dec 2011
Posts: 94
Rep Power: 14 |
Quote:
Thanks for this link. I took a look at it but it seems that the guy just can not run parallel job. For my case, I can run parallel job upto 512 cores, but when I try to run the same job with > 512 cores, I got this error..... It is weird. Same error comes up when I use the "sample" ultility in parallel upto 512 cores.. Best regards |
||
December 10, 2013, 16:38 |
|
#4 | |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128 |
Greetings to all!
@ripperjack: The problem might not be with MVAPICH2, but it might be with the case itself. I know I wrote sometime ago a post with a few questions... Ah, here we go: Quote:
Best regards, Bruno
__________________
|
||
April 25, 2014, 12:08 |
|
#5 | |
Member
Jack
Join Date: Dec 2011
Posts: 94
Rep Power: 14 |
Quote:
Thanks for you suggestions. I know it is an old thread, however, the problem has not been solved....frustrated... I just compiled OF 2.2.2 and 2.3 on the cluster and the error was still there. Actually, I started another post before regarding this issue (see here). At that time, I can run jobs with 16, 32, and 64 cores, however, when > 64 cores, there came a error (different from the current one). A guy pointed out that this is a known bug in mvapich2-1.9 version and the bug was fixed in the new version mvapich2-2.0. So I recompiled OF using mvapich2-2.0, and that problem was fixed. However, when I tried to use > 512 cores, I got this error (see my first post). It is really weird, I don't know if this is about MPI or OF. Did anyone manage to run a large job (> 1024 cores) using mvapich2? I need to run some big jobs (~100 million meshes), so 512 cores parallel run is not enough. It will be greatly appreciated if anyone can fix this issue for me, tons of thanks in advance. Best regards, Ping |
||
April 25, 2014, 12:36 |
|
#6 |
Retired Super Moderator
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,974
Blog Entries: 45
Rep Power: 128 |
Greetings Ping,
Since I don't have any experience with MVAPICH, here's what I can suggest. I had a look at their website and my first suggestion would be to contact their mailing list: http://mailman.cse.ohio-state.edu/ma...apich-discuss/ I had a look into their User Guide and the following environment variables seem suspicious (default values of 256 and 512) and might take advantage from using larger values:
Beyond that, I would try to test large values with the any other variables as well. Also, check the chapter "8 Scalability features and Performance Tuning for Large Scale Clusters" in the User Guide. Best regards, Bruno
__________________
|
|
April 25, 2014, 13:29 |
|
#7 |
Member
Dan Kokron
Join Date: Dec 2012
Posts: 33
Rep Power: 13 |
RipperJack
First thing to try is running a simple HelloWorld with 512p. I just confirmed that can scale this simple code to at least 1280p under my build of mv2-2.0b. http://www.dartmouth.edu/~rc/classes..._world_ex.html Dan |
|
June 25, 2015, 03:12 |
|
#8 | |
New Member
Lee Howe
Join Date: Dec 2014
Posts: 2
Rep Power: 0 |
Quote:
|
||
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
OF211 with mvapich2 on redhat cluster, error when using more than 64 cores? | ripperjack | OpenFOAM Installation | 4 | August 30, 2014 03:47 |
Problem in Running OpenFoam in Parallel | himanshu28 | OpenFOAM Running, Solving & CFD | 1 | July 11, 2013 09:19 |
problem with running in parallel | dhruv | OpenFOAM | 3 | November 25, 2011 05:06 |
Statically Compiling OpenFOAM Issues | herzfeldd | OpenFOAM Installation | 21 | January 6, 2009 09:38 |
Kubuntu uses dash breaks All scripts in tutorials | platopus | OpenFOAM Bugs | 8 | April 15, 2008 07:52 |