CFD Online Logo CFD Online URL
www.cfd-online.com
[Sponsors]
Home > Forums > General Forums > Lounge

Would Apache Hadoop be useful for CFD?

Register Blogs Community New Posts Updated Threads Search

Like Tree1Likes
  • 1 Post By wyldckat

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old   March 14, 2014, 05:11
Default Would Apache Hadoop be useful for CFD?
  #1
New Member
 
Peetak Mitra
Join Date: Jul 2012
Posts: 19
Rep Power: 13
peter.pan is on a distinguished road
Hi,

I stumbled across some open source software called Apache Hadoop. Wanted to know if any member here has experience with that thing. Apparently it is a 'software for reliable, scalable, distributed computing'.

Is it worth giving it a try?

Thanks,
peter.pan is offline   Reply With Quote

Old   March 16, 2014, 09:04
Default
  #2
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings Peetak,

AFAIK, Hadoop was developed for distributed computing of a very different kind of computing, when compared to CFD. It was developed for maintaining web-based platforms, such as social websites, financial stock dealing platforms and other high complexity inter-relational metadata.

Using it for CFD would likely only be useful as a highly distributed job scheduling system (http://en.wikipedia.org/wiki/Job_scheduler), interconnecting millions of clusters around the world, to solve independent problems on each cluster, cataloguing each simulation performed and then gathering the inputs, outputs and post-processing as a big gigantic library of results, including an inter-relational connection between those simulations. Sort-of like a very big fingerprint database.

If you only have access to one or two clusters, it's really a massive overkill to use Hadoop, unless you want to build a platform for a University or some other teaching facility, where the platform can point out to students whether a particular simulation will never work, as other students in previous years had already attempted to perform and fail in the past.

Best regards,
Bruno
wyldckat is offline   Reply With Quote

Old   March 18, 2014, 06:37
Default
  #3
New Member
 
Peetak Mitra
Join Date: Jul 2012
Posts: 19
Rep Power: 13
peter.pan is on a distinguished road
Dear Bruno,

Thanks for your reply.

Well if I understand it correctly, does it mean I can actually use the resource to access remote HPC clusters? Currently my rights to access some of the HPC clusters that I had previously used is over and hence I am finding it hard to run simulations in parallel.

So if HADOOP could actually allow me to access and use multiple clusters or even one cluster, that would be immensely beneficial in my research I guess.

Thanks,
Peetak
peter.pan is offline   Reply With Quote

Old   April 5, 2015, 17:39
Default Hadoop
  #4
New Member
 
Clive DaSilva
Join Date: Apr 2015
Location: Toronto, Canada
Posts: 1
Rep Power: 0
clived is on a distinguished road
Send a message via MSN to clived Send a message via Skype™ to clived
I came across this forum quite by chance as a result of your hadoop questions. I am a hadoop newbies and am looking for a hadoop related forum here.

Any suggestions would be appreciated



Clive
clived is offline   Reply With Quote

Old   April 11, 2015, 12:51
Default
  #5
Retired Super Moderator
 
Bruno Santos
Join Date: Mar 2009
Location: Lisbon, Portugal
Posts: 10,975
Blog Entries: 45
Rep Power: 128
wyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to allwyldckat is a name known to all
Greetings to all!

So I found out recently, thanks to Lorena Barba re-tweeting about this, that MPI is apparently getting too old and that Hadoop/Spark is just one of a few of the technologies that are likely to replace MPI sometime in the future:
In addition, those blog posts refer to Chapel, which is a programming language that has already found it's way to here on the forum: http://www.cfd-online.com/Forums/mai...languages.html

Therefore, I'm posting about the original post here on this thread:
Quote:
Originally Posted by peter.pan View Post
Apparently it is a 'software for reliable, scalable, distributed computing'.

Is it worth giving it a try?
I still stand by my post #2, namely that Hadoop would only be worth for helping managing the execution of simulations, using already existing applications. But with this new information, I can write a bit more on this topic.

Essentially, Hadoop/Spark is pretty much a platform in its own right. It's mostly written in Java, which is a language that (AFAIK) is rarely used for programming CFD software, simply because Java is an interpreted language and won't be as run-time efficient as C/C++/FORTRAN. But as the blog post defends, with today's CPUs and how things have evolved, this language overhead might not be what's stopping us any more, it's actually how long things take to code. In fact, there are already optimization strategies embedded into these languages, that we are unlikely to be able to reproduce with C/C++/FORTRAN without some considerable effort (or at least a matter of searching for the right library).
Then there is the other detail: at least in theory, to make the most of the Hadoop platform, it's best to create the source code for the CFD software directly in Java and directly linked to Hadoop's libraries, which in most cases, implies having to re-write the whole code.
Using C++ and other languages to connect to Hadoop is also possible, but after a quick search, it seems that it requires some investigation into what should be really used as the base library for making the connection; MapReduce-MPI, Hadoop Pipes and MR4C (Google's implementation) are just to name a few, over the few dozens that already exist.

Then there is also complete alternatives to any of the above, such as:


All of this just to say that using Hadoop as a building block for creating CFD applications is something that perhaps might happen in 3-5 years from now, or be used in the back-office in cloud services that provide CFD software as an online service, without us even knowing about it.

Best regards,
Bruno
s1291 likes this.
wyldckat is offline   Reply With Quote

Reply

Tags
apache hadoop


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 13:15.