CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   CFX (https://www.cfd-online.com/Forums/cfx/)
-   -   Distributed parallel error in CFX 5.5.1 (https://www.cfd-online.com/Forums/cfx/19445-distributed-parallel-error-cfx-5-5-1-a.html)

bogesz January 25, 2003 09:21

Distributed parallel error in CFX 5.5.1
 
Dear All, I'm trying to run in distributed parallel mode on two pc-s (dual P4 2.4 GHz 2GB RAM each, Suse 8.1 Linux), using 8 partitions (4 on each machine), with a quite large model at the second step solver gives the following error time and time again. What should be done? Already had succesful calculations with the same conditions and a bigger model (more mesh elements)

OUTER LOOP ITERATION = 2 CPU SECONDS = 1.44E+03

----------------------------------------------------- |Equation| Rate | RMS Res | Max Res | LinearSolution +----------------------+------+---------+---------+ +-------------------------------------------------- | ERROR #001100279 has occurred in subroutine ErrAction. | | Message: | Floating point exception: Type Unknown | +-------------------------------------------------+ +---------------------------------------------+ | ERROR #001100279 has occurred in subroutine ErrAction. |Message: | Stopped in routine c_fpx_handler | +---------------------------------------+ An error has occurred in cfx5solve:

The CFX-5 solver has terminated without writing a results file.

End of solution stage.

This run of the CFX-5 Solver has finished.

Any help would be appreciated

Mike January 25, 2003 12:35

Re: Distributed parallel error in CFX 5.5.1
 
This doesn't look like anything to do with running in parallel, the solver has just overflowed. If you already have a solution on a finer mesh, then i'd suggest interpolating that solution onto your courser mesh (Tools > Interpolate i think from the Solver MAnager menu). This will give you a much better initial guess and it's much less likely that the solver will fail. Mike

bogesz January 25, 2003 16:26

still Distributed parallel error in CFX 5.5.1
 
still the same... any idea to solve this? THX What I don't understand is that the model is almost the same only with minor geometry modifcations as the other one. with the other one had no problems at all.

================================================== ==

OUTER LOOP ITERATION = 2 CPU SECONDS = 1.29E+03

| Equation | Rate | RMS Res | Max Res | Linear Solution |

+----------------------+------+------

Parallel run: Received message from slave

-----------------------------------------

Slave partition : 5

Slave routine : ErrAction

Master location : RCVBUF,MSGTAG=1033

Message label : 001100279

Message follows below - :

+------------------------------------------

| ERROR #001100279 has occurred in subroutine

ErrAction.

| Message: |

| Floating point exception: Type Unknown |

|

+-----------------------------------------+

Parallel run: Received message from slave

-----------------------------------------

Slave partition : 5

Slave routine : ErrAction

Master location : RCVBUF,MSGTAG=1033

Message label : 001100279

Message follows below - :

+----------------------------+

| ERROR #001100279 has occurred in subroutine ErrAction. |

Message: |

| Stopped in routine c_fpx_handler

+-----------------------+

An error has occurred in cfx5solve:

The CFX-5 solver has terminated without writing a results file.

End of solution stage.

This run of the CFX-5 Solver has finished.

bogesz January 26, 2003 11:05

Re: still Distributed parallel error in CFX 5.5.1
 
OK I found it... Tell me, who's the stupid me or CFX: I had to slightly modify the geometry. I had thin surfaces, build generates 2 entries (for "both sides") for thin surf-s, as one can check it in post. But how come after my modifications - wich had nothing with the actual thin surfs -, it associates two absolutely diffrent surfaces for the second entry of the original thin surface...????? ... wich were actually exterior walls by default

anyway...

Pascale Fonteijn January 26, 2003 15:19

Re: Distributed parallel error in CFX 5.5.1
 
Can you explain me why you use more partitions (8) than you have in processors (4)?

Pascale

Bob January 27, 2003 05:46

Re: Distributed parallel error in CFX 5.5.1
 
Yeh I didn't follow that part either. I always thought you had one partition per CPU ?? is this incorrect thinking ??

bogesz January 27, 2003 18:22

Re: Distributed parallel error in CFX 5.5.1
 
physically 1 CPU is logically 2. I don't understand it (this is our experience both on Suse Linux & WinXP) either (or does anyone?) and we experienced it a bit faster with 2 partitions/1physical processor

and a question about that: under win NT 4, with a P4 1.8 GHZ and 2 GB RAM, solver says "the problem does not fit in memory" when I start a model with 2,8 millions of mesh elements in SERIAL. Starting in LOCAL PARALLEL with 2 partitions it goes well though doesn't exceed the physical memory limit (using 1.8 GB out of 2) any explanation?

lot of THX

Bog


All times are GMT -4. The time now is 13:39.