CFD Online Discussion Forums

CFD Online Discussion Forums (https://www.cfd-online.com/Forums/)
-   Hardware (https://www.cfd-online.com/Forums/hardware/)
-   -   The next best CFD processor might be a laptop CPU (https://www.cfd-online.com/Forums/hardware/122422-next-best-cfd-processor-might-laptop-cpu.html)

kyle August 19, 2013 14:47

The next best CFD processor might be a laptop CPU
 
Intel is releasing a chip that has 128mb of RAM on the CPU, ostensibly to give its integrated graphics some fast memory. Fortunately for us, this 128mb isn't actually graphics memory. It's just a gigantic L4 cache that can be used for anything. With this much cache and an efficiently stored mesh, you could probably eliminate 80%+ of traffic through the system memory bus.

Unfortunately for us, Intel is only releasing this as a laptop processor. Worse yet, they may be limiting this CPU to "Ultrabooks," so even if some manufacturer wanted to slap this thing on a ITX/mATX motherboard with PCI-e, Intel wouldn't let them. This is probably some scheme to avoid cannibalizing their high-margin Xeon business.

So there is a 47W quad core processor out there that trounces everything other than the six-core 130W CPUs, but Intel won't sell it to you.

Tech Report did the only CFD benchmark I can find. It's at the bottom of the page here. The laptop chip is the i7-4950HQ. Notice it barely loses to the $1000 six core i7.

evcelica August 21, 2013 22:07

Thats interesting, the L4 cache seems like it helps a great amount with memory bandwidth intensive applications. Check this ArkIntel site where it lists the maximum memory bandwidth at 76.8 GB/s, even though its only two channel. That is exactly 1.5 times the SB-E XEONs 51.2 GB/s with four memory channels. That must have something to do with the L4 cache since the math doesn't add up anymore. I wonder if Haswell-E will feature a L4 cache? doubtful?

http://ark.intel.com/products/76085/...up-to-3_60-GHz

kyle August 21, 2013 23:13

If I know Intel, outside of laptops this feature will be exclusive to multi-thousand dollar Xeon CPUs.

wyldckat August 22, 2013 06:01

Perhaps Apple will jump on this for their next generation of MacMinis.

Because no matter which hardware they launch this CPU on, I'm guessing it'll be the next PS3-cluster-like craze we've seen some years ago. Or even low budget consoles... I wonder if Valve will pick these up for their own Steam based console :D

But still, 128MB of RAM should require some special attention on the CFD software side, for properly scheduling memory accesses.

rmh26 August 24, 2013 14:21

http://www.anandtech.com/show/6993/i...950hq-tested/3

The first page has some nice latency and bandwidth numbers. The large data range bandwidth falls in line with the other data. It is only a small data size range( larger than L3 cache ~ smaller than 128Mb) that this shows improvement. Would be interesting to see some real number crunching data though. Most of the article is focus on gaming and that seems to be why they introduced this extra cache anyways. This is an OEM only part though and it seems like it will be awhile before they try introducing this on desktop/server parts if they decide to do that at all. It also seems like DDR4 won't show up until after broadwell comes out which is the die shrink of haswell.

What I would like to see most is an increase in memory channels. One channel per core would be nice on the larger Xeons. The new 12 core Xeon E5's only have four channels. While it probably isn't necessary for most of their market the HPC market needs more bandwidth. A larger cache isn't going to fix this problem when your running program which Gb footprints

kyle August 25, 2013 14:19

Just because the cache isn't multiple Gb doesn't mean it isn't a huge benefit for cases that are that large. Most of the data travelling over the memory bus is being accessed multiple times, and a cache this large could almost eliminate redundant memory accesses. The first time you look up some variable for a cell it would be limited to the memory speed, but as long as your mesh is well organized all subsequent lookups would be at the cache speed.

JuPa December 17, 2013 05:02

I'm sorry to bump this thread. This CPU interests me.

I want a new laptop for medium - light CFD work (using Ansys CFX). I don't care if it's an ultrabook or what-ever book.

Would there be any issues running a laptop for 3 to 5 days continuously with this chip?

Thank you :)

wyldckat December 28, 2013 16:33

Quote:

Originally Posted by RicochetJ (Post 466777)
Would there be any issues running a laptop for 3 to 5 days continuously with this chip?

In theory, any laptop or desktop computer can work run for 3 to 5 (or more) years straight, 24/7, without any break downs, if these steps are taken into account and not ignored:
  1. Proper cooling is provided to the laptop. For example, one of those laptop stands with a 120mm or 140mm fan underneath it.
  2. Dust is not allowed to gather around or inside the laptop.
  3. There is no major factory build error or manufacturing flaw.
  4. Is not subjected to steep temperature changes nor humid environments.
  5. You don't install the operating system know as "Windows". :D
    • Well, Windows Server perhaps... ;)
    • Even so, best use a stable operating system, not some experimental BSD or Linux distribution.
  6. You don't install crap-ware.

kyle January 16, 2014 13:40

There actually is a Steambox coming out with a similar chip, the i7 4770R. There are conflicting reports on whether this has 64mb or 128mb of cache.

Actually, it just showed up on Newegg... http://www.newegg.com/Product/Produc...56164012&nm_mc

So who is going to buy one and benchmark it? Unfortunately no PCI-e slot, so no Infiniband.

wyldckat January 16, 2014 14:51

Quote:

Originally Posted by kyle (Post 470394)
Unfortunately no PCI-e slot, so no Infiniband.

Well... if USB3 and SATA 6Gbps or iSATA are available, I'd say that some sort of hack could be done, to at least handle MPI in a different way...
Wait, wait... mini-PCIe?
Quote:

1 x Half-size mini-PCIe slot occupied by the WiFi+BT card
Goodbye WiFi+BT! Hello IB :D

kyle January 17, 2014 05:41

Yeah I saw that. Mini PCIe is just 1x, or 4 gigabit max. Most Infiniband cards use 8x. You might be able to use an adapter and get it to work at a reduced speed, which would still be better than ethernet.

And I don't think anyone has ever got RDMA working over USB or SATA.

wyldckat January 17, 2014 16:01

Quote:

Originally Posted by kyle (Post 470484)
Yeah I saw that. Mini PCIe is just 1x, or 4 gigabit max. Most Infiniband cards use 8x. You might be able to use an adapter and get it to work at a reduced speed, which would still be better than ethernet.

Mmm... maybe some really old IB cards?
A quick search at eBay indicates that PCIe 4x IB cards are available... older than that, it's PCI-X for servers, which are special editions of the old PCI protocol (before PCIe was invented).

Quote:

Originally Posted by kyle (Post 470484)
And I don't think anyone has ever got RDMA working over USB or SATA.

It would have to be an old school MPI system, namely a virtualized file based system.

siefdi January 29, 2014 03:44

Quote:

Originally Posted by RicochetJ (Post 466777)
I'm sorry to bump this thread. This CPU interests me.

I want a new laptop for medium - light CFD work (using Ansys CFX). I don't care if it's an ultrabook or what-ever book.

AFAIK, currently only this laptop has i7-4750Q, and maybe more will come, but I don't know.



Quote:

Originally Posted by kyle (Post 470394)
There actually is a Steambox coming out with a similar chip, the i7 4770R. There are conflicting reports on whether this has 64mb or 128mb of cache.

Actually, it just showed up on Newegg... http://www.newegg.com/Product/Produc...56164012&nm_mc

So who is going to buy one and benchmark it? Unfortunately no PCI-e slot, so no Infiniband.

Looking at that Gigabyte Steambox, I can't help but to think about Apple Mac Mini. If their haswell version come to the surface (soon?) with this type of processor, maybe one can make use the speed of that Thunderbolt-2 for connectivity(?)

manyu882 April 2, 2014 01:57

Anyone crunched some numbers on the 4770R yet??

kyle April 2, 2014 12:25

I actually bought one but I've been so swamped I haven't had time to properly test it. So far it seems like maybe a 10%-20% speedup over a regular 4770.

manyu882 April 3, 2014 05:24

Quote:

Originally Posted by kyle (Post 483486)
I actually bought one but I've been so swamped I haven't had time to properly test it. So far it seems like maybe a 10%-20% speedup over a regular 4770.


Cool. Did you use a pair of 1333 or 1600mhz ramstick? 10-20% speedup is definately worth looking into.

kyle April 3, 2014 10:14

I actually got 1866mhz.

Shawn_A July 17, 2014 23:56

Any word yet on the speedup for the 4770R? For what type of simulation, and how big is the mesh? I suppose any other relevant information would be good too (memory speed, latency, etc.)

Thanks :)

kyle July 18, 2014 11:08

I didn't get around to doing any rigorous testing. I did a quick comparison to a i7 2600k with the OpenFOAM Motorbike example at various cell counts, and it seemed to be around 15%-20% faster than that. That isn't very promising seeing as the 2600k is two generations older and at about the same frequency.

It may be necessary to toy around with the cell ordering to really make use of this L4 cache. This could be something as simple as just decomposing the mesh into chunks that will fit on the cache and then merging them back together.

Even if there is some way to optimize it, almost certainly this doesn't make sense as it is today. You can only get this chip in an iMac or the Gigabyte barebone system, both of which are a lot more expensive than a comparable system with an i7 4770k. Plus you're limited to two channels of 1866mhz DDR3 laptop memory. We are weeks from having access to systems with 4 channel DDR4 at 3000mhz+, which would blow any current system with Iris Pro out of the water.

Shawn_A July 18, 2014 11:15

Hi Kyle,

Thanks for the info. 10%-20% over a 2600k doesn't seem like a huge bump. I suppose if the job isn't sufficiently large you could get a decent performance improvement (like the E3D case run at TR mentioned above) but once the jobs get bigger there's no substitute for more cores and bandwidth.

Thanks :)


All times are GMT -4. The time now is 03:47.