CFD Online Discussion Forums

CFD Online Discussion Forums (http://www.cfd-online.com/Forums/)
-   Main CFD Forum (http://www.cfd-online.com/Forums/main/)
-   -   How many CPU's in your cluster (http://www.cfd-online.com/Forums/main/3160-how-many-cpus-your-cluster.html)

Charles Crosby February 22, 2001 01:58

How many CPU's in your cluster
 
I am in the planning stages of putting together a cluster computer (probably using a number of AMD Thunderbird CPU's) for CFD work at our company. To pre-empt John Chien, one of the main objectives is to get enough "GRUNT" available so that we can check that we really are getting mesh-independent results ;-) However, I need to motivate to management that what I am proposing is in line with practice at other aerospace companies. So the the question to those out there using parallel computers for CFD is : How many CPU's of what type are you using?

Thanks in advance.

Charles

Mark Render February 22, 2001 03:38

Re: How many CPU's in your cluster
 
We have a PC-Cluster consisting of 8 nodes, which gives a speed-up of approx. 6.5 . In the near future we'd like to add some PCs to that cluster but I'm not quite sure if I will gain more performance then. From my point of view there will be a limit of max number of nodes at which the 100 MBit-Ethernet connection becomes a bottleneck. Does someone tested out this limit ? What possibilities do I have to increase my number of nodes and to overcome the problem with the bottleneck ?

Regards,

Mark

Charles Crosby February 22, 2001 04:37

Re: How many CPU's in your cluster
 
Thanks for your input Mark. I guess it's probably also important to specify which CFD code is being used on the cluster, as I imagine that there are different ways of parallelizing a CFD solver. In our case, we use CFD-Fastran, which is a multi-block structured solver, and I imagine that speed-up factor will stop increasing once you have more processors than blocks ...

andy February 22, 2001 08:38

Re: How many CPU's in your cluster
 
It is not really the way codes are parallelised that counts but the type of code being parallelised. Explicit codes (e.g. most compressible codes, most particle codes) parallelise well because there is relatively little interprocess communication (ditto all those Mandelbrot demos). Implicit codes (e.g. low speed codes) require much more interprocess communication and, just like in the days of vectorising for Crays, modifying the solution algorithms can bring big benefits for large numbers of processors. The NAS parallel CFD benchmarks are good practical guide to this sort of thing. Although the best is always to run your particular code first.

Concerning the PC clusters using ethernet. Small numbers of processors seem to work reasonably well (e.g. 8) for most applications particularly if you can avoid switches and use channel bonding. They are also about the right size for most real engineering jobs. Very few predictions that require 100 processors are used to solve immediate engineering problems in industry. Most computers of this size spend most of their time running smaller jobs anyway (but have the ability to run big jobs occasionally). The key questions here would seem to be:

(a) Do I have a real requirement for running 1 job on a 100 processors.

(b) Are the codes for the big jobs in the very efficiently parallel class (very few commercial CFD package).

If the answer is yes + yes you may get away with a PC cluster and switches but I would suggest this is rare for CFD applications.

If the answer is yes + no then you will have to spend more money on the communications infrastructure than on the processors and memory to produce a suitable machine. You may be better of purchasing a "proper" parallel machine.

If the answer is no + yes/no then small PC clusters are probably the way to go if you can solve the junk/hype PC problem (applies mainly to those of coming from a long stay in the world of "proper" computers who do not understand the current rules).


George Bergantz February 22, 2001 11:27

Re: How many CPU's in your cluster
 
I started a discussion of some of these issues in the PHOENICS discussion forum starting Spet. 16, 2000 that you may find useful, especially Steven Beale's comments near the end of that thread. He provides a plot of speed-up and discusses various issues related to communications hardware.

My experiences are much like his, about 7 times speed-up on 8 processors using LAM-MPI. Bigger jobs see better speed-up. It is true that it allows me to do grid checks that I couldn't do before.

But these things are rarely a turn-key affair. Getting all the communications issues sorted out can be vexing indeed.

John C. Chien February 22, 2001 14:15

Re: How many CPU's in your cluster
 
(1). A custom made code can be 12 to 17 times smaller than a commercial CFD code in terms of the RAM memory usage. (2). For a mesh size of one million, a commercial code normally requires one Giga Bytes of RAM, while a custom made code can reduce that all the way to 60Meg~80Meg Bytes of RAM. (3). So, a great deal of RAM requirement comes from the use of commercial codes. In other words, 12 to 17 times more RAM resources are required to do the same task if you are using a commercial code instead of a custom made code. (4). So, my suggestion is: if you are interested in technology, then invest in the custom made code for efficiency. If you are interested in the final solution and don't want to invest in the technology, then just get the solution from the vendor or the consulting company, without worrying about the software licenses and the PC cluster hardware setup money. (5). If the purpose is "do it yourself", then I guess, a PC cluster setup is a good fun project to do.

Charles Crosby February 22, 2001 17:13

Why we do CFD work despite John's "Don't Bother"
 
John,

I have no problem with the first part of your reply, where you recommend the use of custom-written code. I have written CFD codes, I have also used somebody else's custom-written research code (FUN2D, for those who may be familiar with it), and yes, these codes are very efficient. However, they are also very limited. The question also arises how one obtains a custom written code. If I was working for the US government, or similarly had access to the US government sponsored codes, the problem would tend to go away. Writing a custom code yourself in a small CFD department is not really a very good option. Manhours are expensive, and need to be used productively.

Also, farming the work out to consultants is not a particularly productive approach in a development environment. Much of why we do CFD is to gain understanding of complex flows, and to identify promising ideas to pursue further in the wind tunnel. It is very hard to gain this kind of understanding when you farm out all the CFD work, or spend most of your time writing custom codes. Generating accurate, mesh-independent, CFD polars at the rate of one data point per several hours, when the wind tunnel will generate SEVERAL polars per hour is also somewhat meaningless.

The purpose is to gain maximum insight, and the fact is that commercial codes can be very effectively used for this purpose. An affordable but more powerful computer is also very effective for allowing one to investigate more promising design solutions.


George Bergantz February 22, 2001 17:28

Re: How many CPU's in your cluster
 
John:

I appreciate your points, and often one does do a custom code if they work on a narrow range of problems, such as creeping flows or porous media or only hi-Re, etc. But even with these cavets the requirements for a comprehensive simulation can be extreme. I say this with full awareness of the limitations and short comings of CFD in design and decision.

I might add that most all innovations in CFD, including new ways to represent complex constitutive relations, improvement in algorithms, etc., originate in under-funded academic and government institutions. One does not have the resources to hire the professional expertise outside, and one quickly learns that this professional help is often not as useful as their hourly rate might suggest; in fact almost never. Yet it is not always appropriate to start building a custom code from scratch for every nuance of application, especilly when alternating between classroom based approaches and a full reserach level need.

The issue is not just RAM, but also processor speed and the insightful distribution of load across processors that leads to speed-up. Size of code has little to do with this.

PC clusters are not fun to do. They are a pain in the kester. This is not some hobby that we use in substitute for a video game. It is a contrived and imperfect attempt to respond to many of the issues of grid independence for one, that you often preach to this forum (and which I endorse). It was with a heavy-heart that I went the cluster route, but I am glad that I did. It may not be right for most people though. But writing their own code from scratch will not likley provide a reasonable, timely solution to the issues of resolution. But in some cases it might.


John C. Chien February 22, 2001 19:35

here are some links you can check out
 
(1). http://beowulf.org/ (2). http://beowulf-underground.org/ (3). http://beowulf.gsfc.nasa.gov/ (4). http://www.extreme-machines.com/x-links.html (5). http://www.extremelinux.org/ (6). http://www.dnaco.net/~kragen/beowulf-faq.txt (7). http://www.epm.ornl.gov/pvm (8). http://www-unix.mcs.anl.gov/mpi/mpich/index.html (9). http://www.mpi.nd.edu/lam/ (10). http://www.cacr.caltech.edu/research...orial/beosoft/ (11). "how to build a beowulf" MIT press (12). http://www.scyld.com/clustering_overview.html (13). http://www.mathcs.emory.edu/ccc97/tutorials.html (14). From these places, you can find more information about parallel computing and cluster computing. It was reported that in the summer of 1994, Thomas Sterling and Don Becker built a cluster computer consisting of 16 DX4 processor connected by channel bonded Ethernet. They called their machine Beowulf. Most of information are out there, I don't think we will be able to add anything here, except to say that it is not plug-in-and-run situation. (15). Forum by definition is a place where you don't have to listen to anyone message posted. So, there is no warranty issue. Even for a commercial software you purchased, there is no warranty at all. It's free.

John C. Chien February 22, 2001 20:04

Re: How many CPU's in your cluster
 
(1). I am sure that you understand my position that I don't view the question and answer as my personal interest. So, any comment is a good comment. (2). I am writing a program in VC++, and I am looking at the issue of illegal copying. It seems to me that by making my code very large, it will have advantage over a small and compact program. This is because, it will take longer to copy, download, or even to run a large case. (this is not a cfd program I am writing right now) (3). So, from the code developer point of view (same as the vendor's view point), the program (the code) should be bigger. (that is, it should require more memory to run, more steps to converge, etc...) And ideally, it should be on several CD's, say at least 6. In this way, it will promote the sales of RAM and HD or CD-ROM/R-W. (4). By the way, I am doing only the Internet right now, but my system is already using some 80Meg Bytes of RAM. I don't think, it is very efficient. (I am using Window98) (5). So, I think we are in the new era and the bigger the better.

John C. Chien February 22, 2001 20:12

error correction,(1). http://www.beowulf.org
 
(1). Should be: http://www.beowulf.org/

Greg Perkins February 22, 2001 22:25

Re: Why we do CFD work despite John's "Don't Bothe
 
I think Charles makes a fundamental point - well done!

John C. Chien February 22, 2001 23:50

Re: Why we do CFD work despite John's "Don't Bothe
 
(1). In that case, we will have to make sure that this forum is always available in order to support his software problem and the hardware questions. (2). The more serious problem will be the Internet reliability in the future. That will add to the uncertainty to his system. (3). My free Internet time is already being cut down due to .com business in general. When the free Internet time is no longer available, I guess, you are not going to see my message.

Bart Prast February 23, 2001 04:42

Re: How many CPU's in your cluster
 
There might be an other problem with commercial codes. We use CFX 5 (coupled solver). This should work great on clusters (as it requires not much communication between partitions). However if you apply for a multi processor license then the cost you save on hardware (PC instead of server like IBM's or SUN's) is doubled in your license fee's. The latter goes up per processor. There fore I could imagine that a cheap linux cluster of say 10 PC's requires a license fee of over 200.000 dollars each year. Normally the hardware is the cheapest factor when using commercial codes.

John C. Chien February 23, 2001 14:19

Re: How many CPU's in your cluster
 
(1). The number is about right for Fluent, it was based on number of processors used. Each counts as a license. So, 8 processors would cost 160,000 US dollars a year in license fee alone. (2). In addition to that, it is hard for other engineers to get on the computer when multi-processor is being used. So, the management of the computer became a routine headache for both types of users. When the multi-processor user is not running his job, the computer is idling. When he is using the computer, others are having hard time to get on the computer. This has been a real problem. (3). Ideally, you would like to have a dedicated multi-processor system to solve your problem, but in reality, no one company can afford that. (people would also complain about using a color printer to print document with a just a few line of color text) (4). As a company is moving into Load Sharing Facility to optimize the usage of computers, finding a large number of processors to run a cfd job at any time is not very practical.

Greg Perkins February 23, 2001 21:05

Re: How many CPU's in your cluster
 
The costs per processors do seem rather large don't they. These days most interesting CFD problems require some sort of multi-procssor capability.

While commercial codes seem to be pretty good, when you factor in these costs per processors they go up pretty qucikly and they are on an annual basis.

I suppose its just another commercial decision.

Fortunately I'm attached to a university and so we have access to 40 Fluent licenses and some high performance computers a 64 processor SGI, 68 processor IBM etc.. For small jobs though, you can sometimes wait longer in the batch queues than it takes to solve the problem on a smaller workstation! Go figure!

You can never get enough can you!

Greg

clifford bradford March 25, 2001 00:11

Re: How many CPU's in your cluster
 
When I was at Penn State University we have one in our depart with 50 CPUs a couple others with around 32. mostly running research codes. Very often then codes werre taken of SMP machines like Origins and whats was found was that the codes were slow because the original writers expected SMP speed communication. I don't know whether the commercial code folks are writing good cluster softe ware as cluster code has to be different from SMP (Origins, SP2s etc) code. I'd avoid any pressure based code like the plague on a cluster just because of all the matrix work. I know other people have used huge >100 cpu clusters as well.

George Bergantz March 25, 2001 15:08

news not too bad
 
My experience with the usual FVM approach on an 8 node 800 MHz Beowulf cluster (messages, e.g. LAM MPI, not SMP) has been better than I had hoped. This has been discussed in previous threads both here and on vendor discussion groups, it seems that things scale very well up to 8 nodes, decent up to 16 nodes, and it falls off after that point, where the expensive network hardware such as Myrinet starts to become a needful tool.

It all comes down to a balance of network latency and processor speed. I have been surprised how well these balance to give almost linear scaling up to 8 nodes.

It seems that SMP is indeed limited and not obviously the best way to go. One now sees mixed SMP and message passing, that is multiprocessor boards networked. I don't personally have any experience with that, and would like to hear from others if that gives roughly same run times as a pure cluster.

What we really need are smart compilers so that embedding message passing commands becomes transparent to the end user so that codes can be truly portable. The PGI compilers are okay, but not great.

Joern Beilke March 26, 2001 10:07

Re: How many CPU's in your cluster
 
The SP2 is not an SMP machine at all. It is "shared nothing". Codes like Fluent or StarCD do the parallelisation via domain decomposition and MPI. So it does not matter if they run on a cluster, SP2 or Origin ... but only in terms of computational speed :)

It is much more comfortable to use an Origin if you also think about all the file handling ...



All times are GMT -4. The time now is 18:38.