Distributed parallel error in CFX 5.5.1
Dear All, I'm trying to run in distributed parallel mode on two pc-s (dual P4 2.4 GHz 2GB RAM each, Suse 8.1 Linux), using 8 partitions (4 on each machine), with a quite large model at the second step solver gives the following error time and time again. What should be done? Already had succesful calculations with the same conditions and a bigger model (more mesh elements)
OUTER LOOP ITERATION = 2 CPU SECONDS = 1.44E+03 ----------------------------------------------------- |Equation| Rate | RMS Res | Max Res | LinearSolution +----------------------+------+---------+---------+ +-------------------------------------------------- | ERROR #001100279 has occurred in subroutine ErrAction. | | Message: | Floating point exception: Type Unknown | +-------------------------------------------------+ +---------------------------------------------+ | ERROR #001100279 has occurred in subroutine ErrAction. |Message: | Stopped in routine c_fpx_handler | +---------------------------------------+ An error has occurred in cfx5solve: The CFX-5 solver has terminated without writing a results file. End of solution stage. This run of the CFX-5 Solver has finished. Any help would be appreciated |
Re: Distributed parallel error in CFX 5.5.1
This doesn't look like anything to do with running in parallel, the solver has just overflowed. If you already have a solution on a finer mesh, then i'd suggest interpolating that solution onto your courser mesh (Tools > Interpolate i think from the Solver MAnager menu). This will give you a much better initial guess and it's much less likely that the solver will fail. Mike
|
still Distributed parallel error in CFX 5.5.1
still the same... any idea to solve this? THX What I don't understand is that the model is almost the same only with minor geometry modifcations as the other one. with the other one had no problems at all.
================================================== == OUTER LOOP ITERATION = 2 CPU SECONDS = 1.29E+03 | Equation | Rate | RMS Res | Max Res | Linear Solution | +----------------------+------+------ Parallel run: Received message from slave ----------------------------------------- Slave partition : 5 Slave routine : ErrAction Master location : RCVBUF,MSGTAG=1033 Message label : 001100279 Message follows below - : +------------------------------------------ | ERROR #001100279 has occurred in subroutine ErrAction. | Message: | | Floating point exception: Type Unknown | | +-----------------------------------------+ Parallel run: Received message from slave ----------------------------------------- Slave partition : 5 Slave routine : ErrAction Master location : RCVBUF,MSGTAG=1033 Message label : 001100279 Message follows below - : +----------------------------+ | ERROR #001100279 has occurred in subroutine ErrAction. | Message: | | Stopped in routine c_fpx_handler +-----------------------+ An error has occurred in cfx5solve: The CFX-5 solver has terminated without writing a results file. End of solution stage. This run of the CFX-5 Solver has finished. |
Re: still Distributed parallel error in CFX 5.5.1
OK I found it... Tell me, who's the stupid me or CFX: I had to slightly modify the geometry. I had thin surfaces, build generates 2 entries (for "both sides") for thin surf-s, as one can check it in post. But how come after my modifications - wich had nothing with the actual thin surfs -, it associates two absolutely diffrent surfaces for the second entry of the original thin surface...????? ... wich were actually exterior walls by default
anyway... |
Re: Distributed parallel error in CFX 5.5.1
Can you explain me why you use more partitions (8) than you have in processors (4)?
Pascale |
Re: Distributed parallel error in CFX 5.5.1
Yeh I didn't follow that part either. I always thought you had one partition per CPU ?? is this incorrect thinking ??
|
Re: Distributed parallel error in CFX 5.5.1
physically 1 CPU is logically 2. I don't understand it (this is our experience both on Suse Linux & WinXP) either (or does anyone?) and we experienced it a bit faster with 2 partitions/1physical processor
and a question about that: under win NT 4, with a P4 1.8 GHZ and 2 GB RAM, solver says "the problem does not fit in memory" when I start a model with 2,8 millions of mesh elements in SERIAL. Starting in LOCAL PARALLEL with 2 partitions it goes well though doesn't exceed the physical memory limit (using 1.8 GB out of 2) any explanation? lot of THX Bog |
All times are GMT -4. The time now is 13:39. |