CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > CFX

Parallel startup trouble!

Register Blogs Members List Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Display Modes
Old   April 15, 2005, 23:19
Default Parallel startup trouble!
  #1
John
Guest
 
Posts: n/a
Hello all!

I have a definition file made using CFX-5.6 that has run successfully on a Sun SMP machine using the 5.6 solver. Now I am trying to run it on a Linux cluster with the 5.7.1 solver, default PVM executable. The solver seems to have trouble starting. I notice by top-ing compute nodes that the CPUs are surging up and down. This all occurs after the mesh is partitioned with no problems whatsoever.

The compute processes (pvmsolve) die off one by one, and the output file ends before reporting the first iteration results.

I thought that this might be due to a poorly defined mesh or partition, but I can't understand why it would run on one machine and not another. Any clues as to the nature of this problem would be greatly appreciated!!

Thanks,

John
  Reply With Quote

Old   April 17, 2005, 15:06
Default Re: Parallel startup trouble!
  #2
Moyo
Guest
 
Posts: n/a
I think your cluster/pvm has problems. I had a problem like this before when I meshed using build in parallel, it started fine and then each nodes dies off. Try this in linux for each of your nodes, type 'top', it should list all the processes and memory usage, see if pvmsolver is there and see how long before it is removed, I bet it won't be long.
  Reply With Quote

Old   April 17, 2005, 18:25
Default Re: Parallel startup trouble!
  #3
Glenn Horrocks
Guest
 
Posts: n/a
Hi,

Moyo is right, there is something wrong in your machine. If the problem was a poor mesh or incorrect problem setup it would either give an error message or diverge.

Glenn Horrocks
  Reply With Quote

Old   April 17, 2005, 21:35
Default Re: Parallel startup trouble!
  #4
John
Guest
 
Posts: n/a
I agree with both assertions... it seems unlikely that it would run on one machine and not another if there were a bad mesh or definition file.

On the other hand, other .def files of similar size run without any problems whatsoever, where top shows pvmsolve taking up ~99% CPU on all involved nodes, steadily. In this case, however, I can top a node and the CPU fluctuates between 1% and 90%. It did not behave this way on the multi-processor Sun machine, however.

BTW, this 5.7.1 installation is over the new ROCKS distro. Are there any known bugs specific to implementations of CFX on ROCKS? The ROCKS usergroups are, of course, filled with instances of network troubles... after all, what do you want for free??

Thanks!!

John

  Reply With Quote

Old   April 18, 2005, 03:11
Default Re: Parallel startup trouble!
  #5
Santhosh
Guest
 
Posts: n/a
Hi,

What's the error message that you get when CFX exits?

I experience the same problem i.e. "CPU surges up & down" and then CFX exists with a return code 255. This happens during the 1st iteration and most of the time a bad mesh is the cause. I use WinXP.

Santhosh

  Reply With Quote

Old   April 18, 2005, 09:54
Default Re: Parallel startup trouble!
  #6
John
Guest
 
Posts: n/a
That's the trouble... I get no error message at all. Like your case, it happens during the 1st iteration.

John
  Reply With Quote

Old   April 18, 2005, 10:31
Default Re: Parallel startup trouble!
  #7
Santhosh
Guest
 
Posts: n/a
You might have to wait for an awful long time to get that error message. I usually kill the job while this happens, especially if in the 1st Iteration and get back to meshing as soon as I can. Have you tried to improve your mesh?

Santhosh

  Reply With Quote

Old   April 18, 2005, 21:13
Default Re: Parallel startup trouble!
  #8
John
Guest
 
Posts: n/a
No, I haven't tried any mesh improvement. Is mesh improvement available in CFX-Build?

John
  Reply With Quote

Old   April 22, 2005, 11:52
Default Re: Parallel startup trouble!
  #9
Julian
Guest
 
Posts: n/a
John,

Just a thought. Have you installed CFX5.7 and the pvm libraries on all of the nodes in the linux cluster or are you getting it from a server? (ie could it be a dodgy NFS share?)

Have you created the machines file (or is that for MPI?) in your home directory listing all of the machines that the code can run on? I ask this, as you say that it does the partitioning (serial) and then fails in iteration1 (parallel) If pvm is not on every machine then it can't communicate with itself and will fail when it tries to.

Have you looked in the /tmp or /var/tmp directories of each machine in the cluster to see if there are any messages either relating to cfx or pvm?

Julian
  Reply With Quote

Old   April 22, 2005, 20:17
Default Re: Parallel startup trouble!
  #10
John
Guest
 
Posts: n/a
Thanks to everyone for their input. I believe that Santhosh has the best answer, it was most likely due to a poorly constructed mesh. Some planar surfaces were not well parameterized. I added some new edge points and turned these ugly surfaces into more, but better looking, planar surfaces. I still get some CPU "surging", i.e. the processors do not run steadily at near 100%, but the solution does come out and the forces do converge rapidly. Now I'll try more mesh controls and up the element count in the wake region until the solution appears to be mesh independent.

Thanks again, everyone!

John
  Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel INIT UDF trouble mil3st3g Fluent UDF and Scheme Programming 2 January 6, 2011 15:07
Parallel Trouble: CFX 11 XP64 - Help? bbmorales CFX 3 December 5, 2009 05:59
Trouble with parallel runs cfdmarkus OpenFOAM Running, Solving & CFD 9 February 27, 2009 04:59
Parallel Fluent: trouble going from 2 to 4 CPUs Mario FLUENT 6 August 24, 2006 00:17
Parallel interfom trouble in execution mer OpenFOAM Running, Solving & CFD 6 October 18, 2005 05:45


All times are GMT -4. The time now is 04:07.