CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Main CFD Forum (http://www.cfd-online.com/Forums/main/)
-   -   hardware (http://www.cfd-online.com/Forums/main/2066-hardware.html)

Guus Jacobs April 28, 2000 15:35

hardware
 
Hello,

Our cfd-lab is renovating its hardware and planning on buying new hardware efficiently (who isn't). Considering the following given, could you provide some advice:

We generally do large calculations on parallel machines (IBM sp) and thinking of composing a parallel machine of pc's instead of buying a small SGI, because of its economic advantage. Considering the stability of pc's is this a wise idea? Are there other considerations?

Could somebody name (and quantify where possible) the main advantages of workstations over pc's?

How well does windows work in a unix environment and vice versa?

Guus


Alton Reich, PE April 28, 2000 16:07

Re: hardware
 
Our office is slowly but steadily moving toward Intel PCs. I personally am using a P3-700 running LINUX. Some folks are using WinNT on their PCs. I prefer Linux because I'm often running one simulation and setting up another one. LINUX is MUCH better for multi-tasking than WinNT. Trying to do two things at once on Windows seems to slow both tasks to a crawl.

We are also moving toward using PC clusters for running big jobs. We currently have clusters of DEC Alphas, but the next cluster will be a bunch of fast PCs running LINUX.

The obvious advantage of PC is that they are cheap. In a year I will feel no shame in walking into my boss' office and telling him that I'm going to throw my P3-700 out a window because I want to buy a new P3-1500 (or whatever will be out at that point). Since my machine cost under $2k, it's possible to use it for 18 months or 2 years and toss it. If you buy an $8K workstation you have to keep it much longer to get your money's worth. That workstation may be a little faster than my PC now, but just wait a year...

Alton

John C. Chien April 28, 2000 17:10

Re: hardware
 
(1).It is an ongoing interest to most people. (2). I can only say that if you are writing your codes, then you should be able to make the program run on PC. (3). But if you are trying to run some third party codes, then the hardware alone is not enough. (4). You will have to consider the hardware including the graphic board, drivers, operating system, the window interface, the specific third party software, the programming language support for the users, Fortran, C, C++, (5). The speed and cost alone may not be enough. Before you buy, check it out first.

Guus Jacobs April 28, 2000 17:43

Re: hardware
 
We are writing are own codes, but depend on other parties software when it comes down to pre- and postprocessing.

2) We want to use parallel computing for running the codes. Our main doubt about using the option of clustering pc's for running the code is its stability: We do DNS calculations that can take up to three days and use 160 MW. What do you think?

3)+4) We are thinking of using NASA's postprocessing program. This would indicate Silicion Graphics, UNIX operating system. Still it would be interesting to know how wel UNIX or LINUX would work on a pc, as this would prevent us from buying workstations and save some money in the process.

Guus

Guus Jacobs April 28, 2000 17:46

Re: hardware
 
Do you feel your clustered pc system is competitive to a sytem/6000 Scalable POWERparallel (SP) machine from IBM?

Guus

Jonas Larsson April 29, 2000 05:53

Re: hardware
 
We've had a cluster of 30 PC's running production CFD simulations 24/7 for more than a year now. Performance, stability and total cost blows away all other alternatives. We run standard RedHat Linux on the cluster.

Before we bought this cluster we did the majority of our CFD simulations on a parallel HP V-class compute-server. Noone wants to use the HP V-class now. We still do post-processing on HP J-class workstations though. I'm not sure if PC/Linux is ready to replace those desktops yet.

Don't worry about stability. We were also a bit worried about this at first - after all, this is just hardware designed to run MS office a couple of a hours a day and software written by a Finnish student. However, the stability of our cluster has been tremendous, better than any other Unix machine we've had! With 30 computers running for more than a year we've only had one hard-disk crash in the beginning (bad delivery) and one node that hanged (just had to reboot it)... other than that all nodes have stayed up month after month after month. The cluster is also heavily used.. load-average is more than 80% and it quite often happens that people start two jobs causing nodes to start swapping etc.


John C. Chien April 29, 2000 09:14

Re: hardware
 
(1). So, I think the message is PC/Linux works for number crunching with almost no system failure for a year. (2).The graphic post-processing is still done separately on HP/Unix workstations. Am I right?

andy April 29, 2000 13:54

Re: hardware
 
Comparing an SGI parallel machine (e.g. Origin 2000) and a PC cluster for running large codes developed yourself.

The parallel hardware and software of the SGI is substantially better. You will get sequential codes running in parallel in a few hours unlike a PC cluster. However, if your codes are already parallelised for a distributed memory machine then this point may not be relevant. For tens of processors the relatively slow communication of the PC cluster can seriously hurt implicit programs (lots of communication). Largely explicit programs like most high speed codes will not suffer in the same way. It depends on the types of codes you are running and the size of the predictions you need to perform.

Workstation hardware bought through an approved source plugs together and works and has done for many years - compatabilty is a non-issue. PC components bought from the larger manufacturers are also pretty stable in my experience when considered in isolation and running MS operating systems (for which they get tested). However, unix tends to use more of the functionality of the hardware giving rise to components which work for MS but not for unix. A bigger problem is that various PC components that should plug together and work simply do not. Since the components are being replaced very rapidly it is not an easy problem to keep up with.

My experience of putting together a small test PC cluster has been bad. I took recommendations from a supplier and checked the components were OK from linux postings on the web. Tcp refused to work and I lost a ridiculous amount of time trying to find the cause and swapping hardware and software components that all worked individually. The suppliers were most unhelpful because I was running linux and not MS with which they were familiar. Eventually the problem was traced to a design defect in the motherboard (2 year old design!) which the manufacturer admitted to and supplied updated bios to "resolve". The result: a different set of, less serious, problems! I currently lack the strength/will to finally sort the bloody machine by spending yet more money replacing the motherboards with more expensive ones.

Parallel Linux is not as stable as mythology would have one believe. TCP has performance issues for small packets and SMP (shared memory) has been unstable for a while. I have not had a simulation run for more than a couple of days without crashing with IO problems. I believe both have been widely discussed in the linux community and addressed recently but cannot confirm this from experience.

For the linux PC cluster I can think of only two significant advantages: price/performance and the control to fix all software problems. For the type of work you describe I believe they easily outweigh the disadvantages but would strongly advise treating PC hardware with caution.

I have used Windows emulators occasionally on Suns for 10 years or so. They are usually slow, a bit fiddly assigning devices like mouse and ports, have files that keep growing and have bits missing like sound support. I needed them initially to run PC based transputer software. It worked but the fiddling was irritating. I cannot comment on the Linux equivalents. I am not aware of a unix emulator in Windows but why would you need one? Have I misunderstood the question?.


John C. Chien April 29, 2000 15:19

Re: hardware
 
(1). PC Window emulator on SUN is a joke on a network, it's slow. It can be very, very ,very slow when a large job is running on your workstation. But sometimes, people need to use the MS word processor on their workstation. (2). I think, based on my experience, a stand-alone PC running under Windows can be fairly reliable (the system hangs up once in a few months, depending upon what I was doing. So, it is not 100% reliable) (3). The problem here is parallel computing on PC cluster. In this case, both the PC hardware and the system software must work together, in order for the parallel computing of application software to work. I am not ready for it yet. (4). I would say that, stand-alone PC application is all right. Windows emulator on SUN is no fun. Parallel computing on PC cluster for CFD applications is reserved for the superman with knowledge at least at the PC mother board level. (5). I guess, no one is designing PC mother board for parallel computing? Another question is even if a PC cluster works for parallel computing, can one up grade it easily?

Jonas Larsson April 29, 2000 17:16

Re: hardware
 
Yep, we do both post and pre-processing (plotting and meshing) on HP workstations. We haven't looked into Linux/PC as an alternative. The problem is that when you do post and pre processing on large models you need a lot of memory. This used to be difficult to get in a PC, but things will change when Intel introduces the 64 bit Itanium processor. Next time we upgrade our desktops we will certainly consider Linux/PC.

Jonas Larsson April 29, 2000 17:31

Re: hardware
 
Just a note - the TCP performance problems affected older kernels (below 2.2.12). It has been fixed for some time.

If you need SMP you should probably be carful with Linux, I agree. All our codes use domain-decomposition and distribute different parts on different compute nodes. This works great on Linux. If you need better scaling and lower-network latency you can buy a Myrinet network which is significantly faster, but also significantly more expensive. Most people will not need this - standard fast ethernet is good enough.

You must have some fundamental problem with your system. I have never had stability problems with Linux. Some time ago I ran a research 3D unstead stator-rotor-stator case including cavities below the gas channel. The simulation ran in parallel on our Linux cluster for more than 14 days without any problems.


Jens Bennetsen May 1, 2000 04:07

Re: hardware
 
Hi,

I think this discussion is not all about just comparing Unix / Linux vs. NT. But also the fact the the mass markede for PC is developing much faster than the workstation markede due to lower prices and MUCH larger markede segements.

If one want to use Beowolf systems (parallel cluster of PC's and Linux) there are a growing number of suppliers which does sell these kind of system at a very good price. And furthermore they do it WITH support. And more and more CFD software companies produce codes for these systems as well.

Another reason for choosen Linux and a cluster of PC is that they are availble almost anywhere. Most companies/Institutes with more than 5-10 people have a network and a server of some kind. The software for building a Beowolf system is free or can be obtained at a low cost. You can be special compiler/load balacing software in order to get higher performance. So trying is cheap.

An nice exampel is using parallel CFD code within an institute that are working with many different things. (Only a few people are running/using CFD) The PC's are running a dynamical load balaning software that enables using a PC whenever the screensaver is started and is unloaded whenever the employies has returned and begins to work on the PC again. In this way you will get the max use of the PC within the department at a cheap price.

Regards

Jens

Alton Reich, PE May 1, 2000 09:23

Re: hardware
 
That depends on how you define competitive.

If you take an 8 processor, purpose built parallel machine and run performance benchmarks against 8 PCs running in parallel, the purpose built machine will be better every time.

However, I can put together a fairly impressive 8 machine cluster for less than $20K. In a year, I can dismantle that cluster and put the machines on individual engineers desks and put together new cluster for about the same price.

Compare that to the cost of your dedicated parallel machine. How long would you have to keep it before you can justify replaceing it? What can you do with it when you decide to replace it?

The advantage of a cluster is not in the performance you get now, but in the performance you would get when the dedicated machine is about half way through its useful life. At that point you'd be using the same machine, but the cluster would be replaced with new hardware.

Alton

John C. Chien May 1, 2000 12:01

Re: hardware
 
(1). The reliability of individual PC hardware should be all right. PC servers have been running day and night with no problems. The only mechanical parts is the hard drive, and the cooling fan. So, as long as the machine is properly cooled in the environment where it is running, the CPU should be all right. But, UPS is required if the network itself is not stable for any reason. (2). Unix is a general name, so, it is important to check out the particular version for a particular machine. Since there is no general standard for graphics, the selection of machine depends on the application software to be used. (3). For university or non-profit research organizations, I guess, it is all right to experiment with all the possible approaches. But, in a large company, with a very large network, it is hard to optimize the usage of computers. In this case, one user per large machine, or one user per a cluster of computer would be the most efficient way to get the work done. If one can not get the job done in time and if the solution is not accurate, the saving in the hardware cost has no meaning at all. (4). My personal experience is that one user per large machine is the most efficient way to get the job done. And if you replace the large machine by a cluster of machines, the same principle applies. (5). But, if the time schedule is not critical, then it is a different story.

Tim Franke May 2, 2000 03:02

Re: hardware
 
Hi Jonas,

you said that you are using a cluster of 30 PC's in your departement. Do you run simulations in parallel mode on all 30 nodes ? We want to install a PC-cluster in our office and preliminary tests have indicated that with more than 16 nodes your rise of performance is quite small. Do you use a normal 100MBit Ethernet connection with TCP/IP protocol ? How does your cluster (and software) scale if you run a job on the max. amount of nodes ?

Thanks for your comments !

Tim

Jonas Larsson May 2, 2000 06:24

Re: hardware
 
We run standard 100 mbit fast ethernet with dedicated standard switches using TCP/IP in our current cluster.

The scaling depends very much on the individual case. If you run a large case with say more than 3 million cells then scaling is usually good up to all nodes. If you run a medium case with say 0.5 million cells then scaling is good up to 10 or 15 nodes. If you run a small case with 0.1 million cells then scaling is only good on a few nodes.

These numbers are of course also very dependent on the geometry - if you have a case with a geometry suitable for domain decomposition then things will scale better.

Joern Beilke May 2, 2000 10:48

Re: hardware
 
Most commercial cfd-codes are sold on a per processor basis for parallel runs. So one has to decide if the code or the hardware vendor will receive the additional money :)

John C. Chien May 2, 2000 11:55

Re: hardware
 
(1). For commercial cfd code users, it has been a problem when someone is running parallel processing on multiple processors. Suddenly, there is no available license for other engineer to run the code.

andy May 2, 2000 13:06

Re: hardware
 
Can I ask if you are running a largely explicit high speed code? If so, the amount and type of communication (few large chunks rather than lots of small chunks) is probably reasonable for a parallel computer with a poor communication/number crunching ratio. Any implicit smoothing? Any multigrid?

Have you run an incompressible FLUENT prediction on all processors?


Jonas Larsson May 2, 2000 14:23

Re: hardware
 
We run two codes, an in-house explicit code with multi-grid acceleration parallelized with PVM and a mainly implicit commercial code (Fluent) parallelized with MPI. Both codes use domain-decomposition and they have similar scaling, although it is difficult to compare - we don't use them for the same type of applications and it depends on details of the load-balancing we get in the specific case with our in-house code.

About Fluent - I haven't personally run any incompressible cases on all nodes. I have run several compressible cases on all nodes though, using both the segregated solver and the coupled solver. For large cases (more than 4 million cells or so) scaling is reasonable, sometimes very good. If you run combustion/discrete-phase scaling goes down, general (sliding) interfaces also slow down scaling. Pure aero scales well.

As a side note - our Linux cluster scales as well or better than our HP V-class compute server - this is an SMP machine with an internal cross-bar swith with almost 1 Gbit capacity I think. I don't really understand why the HP doesn't scale much better.


All times are GMT -4. The time now is 12:34.