CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

MPI and parallel computation

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   April 13, 2004, 05:35
Default MPI and parallel computation
  #1
Wang
Guest
 
Posts: n/a
Hi All,

When I run my code on Origin 3700/2000 manchine, everything is ok if total cells were below 200x200x200. However, when the size is 256x256x256, the code is crashed. Analysis using Totalview shows that the crashed point was at MPI_Wait for non-block message passing. If using block message passing, the crashed point was at MPI_Recv. The error information is as followings:

MPI: Program ./lbm, Rank 15, Process 11849507 received signal SIGSEGV(11)

MPI: --------stack traceback------- 11849507(5):

0xaf82b50[MPI_SGI_stacktraceback]

0xaf82f98[first_arriver_handler]

0xaf83228[slave_sig_handler]

0xd9f7ff4[memmove]

0x67[memmove]

FATAL: Protocol version of Server /merged/2.9.1 does not match version of Client /merged/2.1 Your command referenced dbx but env var TOOLROOT is set to

/sanopt/dbx/7.3.3 Perhaps try $TOOLROOT/usr/bin/dbx MPI: dbx version 7.3.2 73509_May21 MR May 21 2001 17:15:31

MPI: -----stack traceback ends----- MPI: Program ./lbm, Rank 15, Process 11849507: Dumping core on signal SIGSEGV(11) into directory /sanhp/scrijw/lbm/lbm16mar MPI: MPI_COMM_WORLD rank 15 has terminated without calling MPI_Finalize() MPI: aborting job MPI: Received signal 11

I can not understand the outfile about the case. I attached the outfile of the result. Could you explain it for me if you have similar experience?

Thank you very much in advance
  Reply With Quote

Old   April 13, 2004, 05:44
Default Re: MPI and parallel computation
  #2
Tom
Guest
 
Posts: n/a
Hi,

A very simple question. Are you trying to use more memory than is available? the error messages can be quite criptical in that case.

/Tom
  Reply With Quote

Old   April 13, 2004, 06:08
Default Re: MPI and parallel computation
  #3
Wang
Guest
 
Posts: n/a
Hi Tom,

Thank you very much for your reply. I am confusing about that. If it is memory problem, the code should be crashed when the matrices are initialised. However, now the crash occurs after all of the matrices have been initialised. In addition, I can not understand what means the error information. Could you help me to explain it?
  Reply With Quote

Old   April 13, 2004, 07:55
Default Re: MPI and parallel computation
  #4
Tom
Guest
 
Posts: n/a
Hi,

The error does not have to be related to MPI. The first error message says there is some segmentation violation (bad memory access). It can be some bug in the program. Try some array checking or use the command stop to find out where the error occurs. You can also the command size to see if the executable is not too large for the machine.

/Tom
  Reply With Quote

Old   April 14, 2004, 07:34
Default Re: MPI and parallel computation
  #5
Wang
Guest
 
Posts: n/a
Hi Tom,

Thank you very much for your help. I try to debug using Totalview. Before MPI_Send or MPI_Recv, the array variales are ok. The code can not pass the MPI_Send or MPI_Recv. That is to say, the code stopes during message passing. Have you some idea how to check the variables like this case? Furthermore, how to see if the executable is too large for the machine?
  Reply With Quote

Old   April 14, 2004, 11:35
Default Re: MPI and parallel computation
  #6
Guest
Guest
 
Posts: n/a
Perhaps it would be worth seeing if the looping structures are looking to send more informaiton can is available. See what the arguments that go into MPI_SEND nad MPI_RECV are, and if they are what you expect. It possibly stops when there is informaiton that is being sent or received that does not 'exist'.
  Reply With Quote

Old   April 14, 2004, 12:17
Default Re: MPI and parallel computation
  #7
tippo
Guest
 
Posts: n/a
i would suggest the problem is either a memory issue or as 'guest' says a problem with the array size that is being sent/recieved...the problem is how to distingush between these 2 problems.

i would say you should make one of your 225x225x225 dimensions smaller ie. 225x225x20. this will make the problem much smaller and elminiate memory as an issue.

if you are sending/recieving information seperately one dimension at a time make the problem 225x225x20. you can then see if the send and recieve works in each direction.

if you are sending information as one 225x225x225 block you can try sending many 225x225x20 blocks at once i.e. 12 of them, to check the buffer size is above your 225x225x225 limit.

  Reply With Quote

Old   April 15, 2004, 12:25
Default Re: MPI and parallel computation
  #8
Wang
Guest
 
Posts: n/a
Hi All,

Thank all of you very much for your help. Now this problem have solved. It is the machine memory problem. When the array size is smaller and the code is no problems.

After I rm some the arraies and optimise the memory, the code is running.

  Reply With Quote

Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in the CFX12.1 parallel computation BalanceChen ANSYS 2 July 7, 2011 11:26
HP MPI warning...Distributed parallel processing Peter CFX 10 May 14, 2011 07:17
Is Testsuite on the way or not lakeat OpenFOAM Installation 6 April 28, 2008 12:12
MPI and parallel computation Wang Main CFD Forum 4 April 3, 2005 07:40
PROBLEM IN PARALLEL PROGRAMMING WITH MPI Niavarani Main CFD Forum 1 April 20, 2004 07:51


All times are GMT -4. The time now is 02:30.