CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Main CFD Forum

EM64t w/STAR problems

Register Blogs Community New Posts Updated Threads Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   July 29, 2005, 14:09
Default EM64t w/STAR problems
  #1
Paulh
Guest
 
Posts: n/a
My company recently opened up their frugal coffers and began leasing 10 Dell Blade EM64t 2-cpu nodes, running Redhat Enterprise 3.0as update 5, as an addition/replacement to my CFD cluster. It has been anything but a turnkey solution.

Since the release of STAR v3.2, the solver has become significantly more sensitive to the hardware and operating system. The current problem I'm struggling with is this – STAR is timing out on ssh and terminating mid-way through the simulation. The problem is intermittent and not specific to any one node.

CD-adapco support gave us a band-aid with the –notracker option. This seems to reduce the amount of ssh'ing STAR does during a simulation. It has allowed some of my analyses to run to completion but others are now failing at startup.

I don't believe that it's necessarily a STAR issue because we can get ssh, and rsh, to hang by just typing in the command at the command prompt. Just as within the STAR environment, these failures are random.

Does anyone have any thoughts on how to overcome this problem? Any suggestion would be very much appreciated.

Enthusiastically Dejected Paulh
  Reply With Quote

Old   July 29, 2005, 16:59
Default Re: EM64t w/STAR problems
  #2
andy
Guest
 
Posts: n/a
On our cluster the default Linux configuration (presumably set for interactive graphics use) was actually unstable on about half our headless nodes. Obvious because half the nodes were hot when idling. This was was fixed by passing a kernel parameter at boot time. The real point here is that the default configuration for Linux from most distributors is often inappropriate for a cluster.

What does your blade server use for an interconnect? Our cluster uses gigabit ethernet and the default parameters in the ethernet driver were also spectacularly bad for cluster use. That is, reasonable performance with lots of small messages rather than streaming large files. A few minutes experimentation brought large improvements.

A few minutes chatting to someone familiar with setting up linux clusters for numerical simulation might cure your problems. Does your supplier have this knowledge? Be careful because what is appropriate for numerical simulation does not follow from what is appropriate for data farms and other common uses of clusters.

If you want to post questions about the setup and configuration this is probably a good place:

http://www.beowulf.org/mailman/listinfo/beowulf
  Reply With Quote

Old   July 30, 2005, 15:49
Default Re: EM64t w/STAR problems
  #3
steve
Guest
 
Posts: n/a
When we have had rsh hangs it was often due to the fact that the user's home directory was not mounted or could not be mounted (ie because of nfs or automount or rarely nis problems). In other cases the user was accessing something in his .cshrc file somewhere and that file couldn't be mounted (again because of nfs or automount). ssh is even worse because the keys that it needs to pass back and forth are usually stored in the home directory so if they are not there (because the home isn't mounted) then it almost always hangs. You could log into a node in advance as root and see if the home node gets mounted or not as the users tries to rsh/ssh in. You could also strip the .cshrc to nothing and maybe start building it up again until you see the problem.
  Reply With Quote

Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ICEM] Problems with coedge curves and surfaces tommymoose ANSYS Meshing & Geometry 6 December 1, 2020 11:12
Needed Benchmark Problems for FSI Mechstud Main CFD Forum 4 July 26, 2011 12:13
Two-phase air water flow problems by activating Wall Lubrication Force challenger85 CFX 5 November 5, 2009 05:44
Some problems with Star CD Micha Siemens 0 August 6, 2003 13:55
Inverse problems Aleksey Alekseev Main CFD Forum 0 May 12, 1999 15:38


All times are GMT -4. The time now is 20:10.