|
[Sponsors] |
September 24, 2022, 04:07 |
A newbie has plans and questions
|
#1 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hello people!
My name is Maik and I am a typical pen&paper-Mathematician. My professions lie in theory of partial differential equations and Analysis in general. But I am also interested in technology and simulation yet I don't usually work there. Recently a friend acquired an old HP DL1000 G6 Server and that got my attention. Since he didn't have any tasks to perform on this machine he gave it to me and suddently I found out that Linux is actually cool and servers are fascinating things. A couple of years ago I kind of promoted OpenFOAM as a good simulation software for fluid dynamics and now I thought "maybe you can use the server for that software...". Anyways, I bought another two servers, a nice Switch, installed Ubuntu 20.04 on all machines, Nfs, etc. and installed OpenFOAM on my laptop (yeah, I know...) that is currently working as the head node. Now my configuration is - HP DL385 G6 (2x AMD Opteron 2435; 48GB DDR2) - HP DL580 G5 (4x Xeon 7340; 64 GB DDR2) - HP DL1000 G6 (8x Xeon 5520; 96 GB DDR3) - HP 2824 Gb-Switch - Lenovo T440s (i7 4600; 12 GB DDR3) - total number of CPUs: 14 - total number of physical cores: 60 - total number of logical cores: 92 I guess HPC is the correct term for what I want to have. I know it is not much, but it was also very cheap and it makes a lot of fun. Still I would like to hear your criticism and your opinions on my setup. Furthermore since I have no practical experience with OpenFOAM I would like to know what I can do with my hardware setup (what is possible and what isn't) and how this whole thing "works" essentially. Best regards from Germany Maik |
|
September 24, 2022, 06:32 |
|
#2 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Do you want the brutally honest opinion, or should I try to sugar-coat it as much as possible?
If you enjoy tinkering with ancient server hardware, rediscovering its quirks and how to work around them, then by all means go have fun. Just a word of warning from a fellow German: watch your electricity bill. |
|
September 24, 2022, 07:34 |
|
#3 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hi there and thanks for your reply!
Yes, just hit me with your brutal honesty I don't mind and I giggled when you said "ancient" and yes, I know that these dinosaurs will eat power plants for breakfast. That's why they aren't running day and night. Mostly on the weekend and only as long as I need them (a few hours in total I guess). Oh and just about the applications for the whole thing: I would like to do some geophysical simulations if that is possible, because in my spare time I work in a little group for climate change stuff. Hope that helps and now back to your brutal honesty |
|
September 24, 2022, 12:18 |
|
#4 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
All right, you have been warned. Again, if poking around in old servers is your thing, don't let me tell you otherwise
I would try to get rid of these servers as soon as possible. Because whoever gave/sold them to you did just that: getting rid of them without having to pay for recycling. There is merit in used server hardware for scientific computations. It usually offers a better price/performance ratio than the new stuff. But there is a cutoff point where the old hardware is just too slow and inefficient to offer any value. And failure rates for components like PSUs and motherboards go through the roof. In my opinion, this cutoff is currently at Intels Haswell generation (Xeon E5-26xx v3). Maybe one generation earlier if you have cheap electricity. This stuff is already fairly cheap, any further savings on the initial cost do not warrant the lower performance and efficiency of even older hardware. Your cluster right now is severely heterogeneous. Not only do the cores have different performance, but also different architecture. This will cause headaches for code compilation, and you need to account for different speeds with advanced load balancing. Otherwise the slowest cores will hold back the rest. Low performance per core will be a limiting factor e.g. for mesh generation for OpenFOAM, or anything else that isn't perfectly parallelized. And the Opteron CPUs are a whole can of worms just by themselves. If you really want to keep the systems, at least ditch the Opterons. Running these servers in Germany will quickly rack up electricity costs that could have paid for a more modern system. One with similar peak performance that is much easier to leverage. |
|
September 24, 2022, 14:05 |
|
#5 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hi Alex!
See that wasn't so hard, was it? Your advices have been duly noted and I will keep them in mind when it comes to buying hardware again. And if I do, I would go with the Opteron-Server first. Until that time, I have to live with this system. But I see other people on youtube with equal or worse hardware doing CFD and they are... lets say as satisfied with the results as one can be with ancient hardware. You mentioned load balancing: What software do I use for doing that? I know I could just "duckduckgo" that but maybe you have an educated hint for me? Thanks again and I appreciate your criticism! Maik |
|
September 24, 2022, 14:29 |
|
#6 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
For CFD solvers like OpenFOAM, load balancing can be done during domain decomposition.
In case you aren't familiar with that: Let's say you have a mesh that consists of one million cells, and you want to run your simulation on 20 cores. The default setting for domain decomposition is to split the mesh into 20 chunks of approximately equal number of cells. 50000 in this case. Each core then computes the solution on its own chunk of 50000 cells, and MPI handles communication for data that has to be transferred between adjacent cells on different cores. This works well when all cores compute at similar speed. Now what happens when one of those 20 cores is much slower than the others, like taking twice the amount of time for the same workload? The faster cores have to wait idle for the slowest core, because the computation can not proceed with the next iteration until all cores arrive at the synchronization barrier. Which effectively cuts performance in half, despite only one of 20 cores being slower. Load balancing through domain decomposition assigns smaller sub-domains to the slower cores. In our example, the slower core would get a sub-domain with ~25.6k cells, the faster cores get ~51.3k cells each. For this to happen properly, you first need to characterize your compute nodes to get their actual relative performance. These numbers can then be fed into the domain decomposition. How this is actually done for OpenFOAM... that would be a great question for the folks over in the OpenFOAM section of our forum. I don't use OpenFOAM, and I avoid unequal load balancing whenever possible. Quote:
|
||
September 24, 2022, 17:51 |
|
#7 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hi Alex,
thanks for the explanations, that helps - believe me! Oh you don't use Open FOAM? I thought you did? So what are you using for your tasks? About the dampener on my enthusiam: Really, don't worry, this is just a pure fun thing. I understand you completely and I am about the same on my more serious enterprises (i. e. my activities in a little research group for climate change, etc.). That doesn't mean, I regard the CFD thing as being not important or anything. But I am a "Quereinsteiger" and completely new to the subject and I neither need all this for my job profession, nor do I have big plans moneywise for it being a hobby. All that does not mean I don't care nor that I don't want to see results. I am in direct contact with someone who has his own private server for NAS and other things and the money he put into this dwarfs my investments by far. That's okay and I do understand where he's coming from. If possibilities are opening up for me I would indeed upgrade my hardware without hesitation, but I feel I have already been blessed with nice start up stuff for really few money and that's okay for me. I think my wife is correct in saying "if you make it work on this level then that would be a start" (she is the IT-guy in our home ) Wrote much but still I am willing to learn from people like you and there are 92 logical cores being waiting working for me How is this load balancing being done in general? What software is there? What CFD-programs are you using and why? Best regards and have a nice evening! Maik |
|
September 24, 2022, 18:37 |
|
#8 | |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
Well I occasionally run OpenFOAM, but only for the benchmarking thread we have going here: OpenFOAM benchmarks on various hardware
I don't produce CFD results with it. In my day job, I mostly use CCM+ for CFD these days. It's the standard for my department. On the side, I am involved with the development and application of a CFD solver based on the Lattice Boltzmann Method. Primarily on the pre- and postprocessing side of things. And anything involving parallelization of the solver. Quote:
One of the optimization goals for these methods can be imbalance, i.e. how evenly the graph vertices (=cells) are distributed among the partitions. This is what produces balanced partitions. And how you can force an imbalance to match different execution speed on different cores. But don't let that overwhelm you for now. Getting it to run on all machines simultaneously is the more important step, performance optimizations come later. |
||
September 25, 2022, 06:07 |
|
#9 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Thank you again, Alex, very helpful!
Okay so I will first focus on getting all the hardware to work parallel in general. Do you have any further tipps for a beginner like me? What should I not do? Best regards, Maik |
|
September 25, 2022, 20:00 |
|
#10 | ||
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
For CFD, your DL1000 G6 is the most interesting machine. It's total bandwidth to memory is +- 200 GB/s which is nice and high. This is an important quantity for CFD because of the long vectors. This machine alone should outperform the high end laptop of your friend. I think you might finish well below 150 seconds on all 32 cores. Looks like this is a chassis with four dual E5520 xeons, so in effect this is a four machine cluster. If I were you, I would install linux and openfoam on that machine and run the openfoam benchmark on that machine alone. This is a fun and manageable task. Ubuntu has precompiled versions of openfoam that you can install with sudo apt install openfoam... You can also go to openfoam.com or openfoam.org and follow instructions. The benchmark files are here: Quote:
For good performance, each cpu should have all three memory channels active (This means an RDIMM in each channel, preferrably 1066 MHz.) The manual for this machine should instruct you about the optimal memory configurations. Your current configuration could be optimal already with three 4GB or 6 2GB RDIMMs installed per processor. Check that the RDIMMS are installed in a regular pattern. The same pattern for each CPU. Just as for the cluster, the performance of cpu and memory should be equal across the machine. Otherwise, the fast cpus must wait for the slow one. In linux, you can use dmidecode to read what RDIMMs you have installed and their properties. Make sure the speeds are all 1066 MHz. In the benchmark, you should run 1 core, 2, 4, 8 etc out to 32. There will be a result for meshing and another result for the flow calculation. You should publish your result here: OpenFOAM benchmarks on various hardware to impress us all. |
|||
September 26, 2022, 03:27 |
|
#11 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hi Will,
and thanks for your reply! Okay so there is hope for my server situation after all? That would be awesome Now I have some "dumb" questions: 1. If I want to run OpenFOAM (or other CFDs for that matter...) on a cluster, do I have to install it on every node, or just the head node? 2. You are correct - the DL1000 has four chassis (DL170 G6) with two CPUs each. They are connected via the Switch. That leads back to question 1. 3. You did not say anything about the DL580 G5... What about its performance? Is it any good? I mean this is the most expensive Server in my setup back when it was new and it is from the same year as the DL1000 (the production date of my machine is June 2009). So what about it? 4. Alex said, the Dl385 is "its own can of worms"... What about its performance? Thanks in advance and have a nice day! Maik |
|
September 26, 2022, 05:45 |
|
#12 |
Super Moderator
Alex
Join Date: Jun 2012
Location: Germany
Posts: 3,427
Rep Power: 49 |
I feel the need to clarify my previous statements.
This was never about the theoretical peak performance of these systems. That is there to some extent. But it is hidden away behind two hurdles: 1) We already talked about efficiency. The amount of electric energy these systems need to produce a CFD result is huge compared to more modern hardware. 2) The time investment needed for a beginner to get these systems running CFD jobs in parallel. As a beginner who wants to produce CFD results with OpenFOAM you have two immediate tasks at hand: learning the basics of CFD, and how to run OpenFOAM. What this cluster does is put another hurdle in front of you: how to set up a cluster for OpenFOAM. Which is far from trivial. To put it very bluntly: how much do you value your time? Sure, the computers were cheap, but they are also a time sink that slows down your process of learning what you actually wanted to do. |
|
September 26, 2022, 06:36 |
|
#13 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hi Alex,
you don't need to worry about the electric energy consumption, that is entirely my problem All I wanna know is performance and how to do the things you mentioned further: to deal with OpenFOAM on a cluster and to get the system working. Yes, I am a beginner but I am also used to deal with hurdles and therefore am very patient when it comes to resolve problems. Time is not the issue here, since I don't have a schedule to keep. I would like to know how to set up a cluster for OpenFOAM in addition to the questions of my former posting Cheers, Maik PS: For me "the way is the goal"! |
|
September 26, 2022, 16:58 |
|
#14 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
Answers: Q1: You install the same operating system, file structure and openfoam on each cluster node. Except that the filesystem for running the case must be shared so that all nodes have access to the same input and results files. Q2: Your 1 Gb/s switch should be fine for a small cluster. Q3: You have three dissimilar machines. I picked the DL1000 because it has the most cpus and DDR3 memory instead of DDR2. For these reasons, the DL1000 should outperform the others by a large margin. I did not make a comparison between a single node in the DL1000 versus one of the others. However, DDR2 memory indicates to me an older generation cpu. Q4: The DL385 G5 may have performance comparable to a single node in your DL1000. Probably a bit less for CFD actually. |
||
September 26, 2022, 23:47 |
|
#15 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Some results for my r710:
Quote:
With your four nodes, you can probably get a solution in 70-80 seconds, which is decent. |
||
September 27, 2022, 04:29 |
|
#16 | |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Quote:
Hi Will, thanks that helps a lot! Ad Q1: I installed OpenFOAM10 on all six nodes (if I only run the four nodes from the DL1000 then this is still possible, but I have at least the possibilities to run the program on all six if I want to). I would also like to compare the results of the four nodes with the results of all six nodes working together. About the file system: I have an NFS created with the Laptop (the headnode) before you told me to install OpenFOAM on all machines. Do I have to make adjustments to this NFS now? If so, what do I have to do? Ad Q2: Okay, I am glad! Ad Q3: Yes, it is an older CPU-Generation, yet there are four of them on one motherboard. Does that change anything? Ad Q4: Hmm, the CPU to RAM ratio on this machine is better than on the others, I thought, that might make a difference? Still, it is DDR2 Memory, because those are AMD Processors... Any opinion? I add a new question: How do I run this benchmark test of yours? What do I have to do? Thanks in advance for your help! I really appreciate it! Regards from Germany, Maik |
||
September 27, 2022, 13:45 |
|
#17 |
Senior Member
Join Date: Oct 2011
Posts: 242
Rep Power: 17 |
Hello,
If you have a working nfs folder in hands this is a good starting point. Then you will need to setup passwordless ssh if not already done such that mpi can communicate across the nodes and the head. Thus you also need to install mpi on all nodes, this can be either mpich or openmpi, depending on what recommends openfoam. Then without going deeper with a proper scheduler, mpi can be run with a simple "machine file" where you give some details on the distribution across nodes. Let's say you want to run a simulation with 30 cores, with node 1 having 10 and node 2 having 20 cores. You need to edit a text file like this simple one: node1: 10 node2: 20 and run your application this way: mpirun -n 30 -machinefile machinefilename There are some resources online, looking for "mpi beowulf cluster linux". This link looks pretty decent for example: https://www-users.cs.york.ac.uk/mjf5...f_cluster.html |
|
September 27, 2022, 17:43 |
|
#18 |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
For the benchmark files see my post above. Search for:
The benchmark files are here: Quote: And click on the link to go to that prior post. That post has an attachment that contains the files. |
|
September 27, 2022, 18:05 |
|
#19 | |
Senior Member
Will Kernkamp
Join Date: Jun 2014
Posts: 372
Rep Power: 14 |
Quote:
Ad Q1: You need only one shared file system. If that is on the laptop that is fine. Ad Q3: I could not find much info on the cpu, so hard to predict performance. Let me know how the DL580 does. My recommendation is to get the DL1000 operational first. Ad Q4: The processor has just two memory channels versus the 5500/5600 series Xeon three each. The can of worms with the opterons is that they have a shared floating point processor per pair of cores. The cpu to ram ratio is not a meaningful performance measure. All DIMM slots filled is typically best for maximum memory performance. However, on the DL170h units, your best config would be two DIMMs per channel, leaving the third DIMM empty for the two channels that have that extra slot. The reason is that you want all channels to have identical memory. With two DIMMs per channel, and a 5600 series CPU, you should be able to run the memory at 1333 MHz. |
||
September 28, 2022, 12:54 |
|
#20 |
New Member
Maik
Join Date: Sep 2022
Posts: 12
Rep Power: 4 |
Hi guys!
@naffrancois: Thanks for the info man! I'll look into that! BTW: OpenMPI seems to come already with Ubuntu 20.04, because when I wanted to install it, Ubuntu told me it is already installed. @Will: I think your quotes didn't work, I can't see them in your posting. Cheers, Maik |
|
Tags |
hpc, openfoam, server, setup |
Thread Tools | Search this Thread |
Display Modes | |
|
|