Name: LINPACK: Intel's Nehalem versus AMD Shanghai
Item: LINPACK: Intel's Nehalem versus AMD Shanghai
Author: Johan De Gelas

LINPACK: Intel's Nehalem versus AMD Shanghai

by Johan De Gelas on 11/28/2008 12:00 AM EST

Posted in
IT Computing general

Post Your Comment
Please log in or sign up to comment.

Comments Locked

60 Comments

Back to Article

ZootyGray - Tuesday, December 2, 2008 - link
First you bollox the test w a bios floppp. convenient/
Then you can't afford enough DDR3.
Nobody seems to know if bozo turbo is off.
You don't know that a real IT would not use turbo.
Your readers all hate you for being so smug.
You have reinforced why I don't come here anymore.
You are conning and lying about AMD.
You use pro-Intel conditions and pretend you are fair.
You discredit a very detailed test posted with real facts and also it bears the names of the testers - and their source. That test, in the 'comments' is far better than anything on this website.
You insult people who refuse to consume the manure you spread.
You treat us all like mushrooms.
You will refute all of the above.
In short, you suck so bad.

ANYONE SEEKING REAL INFO RE AMD SERVERS OR DESKTOPS GOTO AMDZONE.

and if I wanted to know about intel I would not come here for that either.

Do you actually expect any better treatment than this?

I am sure also that you will tell us all that
the i7 TLB BUG IS JUST OK TOO.

You will probably del this post. Like I care. Ship of fools.
ZootyGray - Tuesday, December 2, 2008 - link
Oh I forgot/
running the amd with old slow ram - that's rich.

Thx for revealing yourself. You wanna call me an amd fanboy? It's really true.

Your forum kikked me for being pro-amd. or was it my bad attitude.

- spit
.

I find it really hard to even be here. I love truth. out.
s1ugh34d - Saturday, November 29, 2008 - link
Toms hardware has gotten better, but 25 page reviews that fit on two pages printed, thats still pushing it especially for the ads. Anandtech is better on my phone( which I am posting from)anyhow, and has a heck of a lot less pages and ads.

I see the issue everyone has with the recent Intel king of the hill performance. But I hate to be the one throwing laymen terms out, Amd is not as good as intel. Multi socket systems aside, every bit of Intels consumer market outdue AMD.

Server tests also take alot more time, and more documentation, before they can be put into a chart for our eyes. I think this was mainly focused on the HT differences in the intel chip, and AMD scores were posted as refrence points.

The DDR3 itself being unregistered(as noted) should show the readers that this is not an apples to apples compairson. I'm just lost on why i7 chips are almost strictly consumer right now, so why not just wait for the server chips and server memory.

I can't believe that the new Phenom II X4 hasn't even got a simple blog post about its ridiculous OC claims on the net. It's a week in and nothing even small shows up here.
wwswimming - Saturday, November 29, 2008 - link

i like websites that have 1 or 2 ads per page.

TH has one of the most ad-bombarded visual presentations i've
ever seen on a website. ads between posts in the forums.

if this was TH, there would be ads here in the "post comments
to blog" section !
BLaber - Friday, November 28, 2008 - link
Please let us all know whether the TURBO BOOST feature was in use when you were performing the above tests.Thanks
thebeastie - Friday, November 28, 2008 - link
Despite the countless millions of dollars companies like Intel have injected into the Linux kernel over the years a simple Linpack test shows its flaws as it can't deal with the extra HT logical cores.

It shows too much money is injected into one spot and should be more evenly spread into other OS projects like FreeBSD. Just because one project has a nicer installer shouldn't be its justification.
Griswold - Saturday, November 29, 2008 - link
Do you have any evidence to back your claims up or do you (foolishly) expect everyone to take your word for granted?
thebeastie - Monday, December 1, 2008 - link
LOL there is no doubt its a flaw in the Linux kernel, thats not to say they wont fix it.
I would be pretty sure it will be fixed pretty soon, its just that MS had it all going fine straight out of the bat.

I can't believe you would even think its anything else then the Linux kernel poorly dealing with the HT, what would you think it is? the power of God?
narlzac85 - Friday, November 28, 2008 - link
Did you disable the nehelem turbo boost or whatever it was called? If you are only using 1 core (not a realistic scenario I assume), then nehelem will clock that core 1 or 2 multipliers higher. However, that is unlikely to be happening in a heavily loaded server right?
JohanAnandtech - Saturday, November 29, 2008 - link
LINPACK is extremely well threaded. That means that all cores are used to their full potential. But to be sure we disabled turbo booster, as we speculate it will probably not be used on the server products. That is why I labeled the Core i7 at it's clockspeed. (and also because I absolutely hate Intel's and AMD's numbering systems)
befair - Monday, December 8, 2008 - link
Well threaded!!?? Ever heard of MPI? MPI processes are *not* threads!
befair - Friday, November 28, 2008 - link
ok .. getting tired of this! Intel loving Anandtech employs very unfair & unreasonable tactics to show AMD processors in bad light every single time. And most readers have no clue about the jargon Anandtech uses every time.

1 - HPL needs to be compiled with appropriate flags to optimize code for the processor. Anandtech always uses the code that is optimized for Intel processors to measure performance on AMD processors. As much as AMD and Intel are binary compatible, when measuring performance even a college grad who studies HPC knows the code has to be recompiled with the appropriate flags

2 - Clever words: sometimes even 4 GFLOPS is described as significant performance difference

3- "The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized." - So ... MKL use is justified because Intel processors need optimized libraries for good performance. However, they dont want to use ACML for AMD processors. Instead they want to use MKL optimized for Intel on AMD processors. Whats more ... Intel codes optimize only for Intel processors and disable everything for every other processors. They have corrected it now but who knows!! read here http://techreport.com/discussions.x/8547">http://techreport.com/discussions.x/8547

I am not saying anything bad about either processor but an independent site that claims to be fair and objective in bringing facts to the readers is anything but fair and just!!! what a load!
JohanAnandtech - Saturday, November 29, 2008 - link
It is not that black and white.

Please read the article that I linked ( http://it.anandtech.com/IT/showdoc.aspx?i=3162&...">http://it.anandtech.com/IT/showdoc.aspx?i=3162&... ) and you will see that the AMD performs better with the Intel MKLs if you use relatively low matrix sizes. It is only at high matrix sizes that the ACML libraries give the K10 architecture a real advantage.
BlueBlazer - Saturday, November 29, 2008 - link
Are you using Linux 64-bit for this test? What about differences with Linux 32-bit?
JohanAnandtech - Monday, December 1, 2008 - link
64 bit... I don't see why we would use 32 bit? Linux 64 bit is the best platform for any of these kinds of tests.
BlueBlazer - Saturday, November 29, 2008 - link
No matter where you turn or whichever review website visited, you will see Intel outperforming your precious on many tests. If you are tired at looking at them or watching disappointments after disappointments with your precious, why not shutdown your PC and go out and enjoy life.

On the other note, ACML may perform worse than MKL
http://ixbtlabs.com/articles3/cpu/phenom-x4-matlab...">http://ixbtlabs.com/articles3/cpu/phenom-x4-matlab...

And it happens Intel has still the best compilers around, try using GCC to compare, you'll find even the Intel compiled version works better on non-Intel processors.
LawJikal - Friday, November 28, 2008 - link
There are many documented scenarios in which the HyperThreading serves as a detriment (single-threaded scenarios):
http://hothardware.com/articleimages/Item1232/3dv....">http://hothardware.com/articleimages/Item1232/3dv....
http://techgage.com/reviews/intel/core_i7_launch/c...">http://techgage.com/reviews/intel/core_i7_launch/c...
http://hothardware.com/articleimages/Item1232/lame...">http://hothardware.com/articleimages/Item1232/lame...
http://hothardware.com/articleimages/Item1232/et.p...">http://hothardware.com/articleimages/Item1232/et.p...

There are also instances in which a 3.0 GHz QX9650 offers greater performance (the few instances in which operating code benefits more from 12MB of split L2 cache than 8MB of shared L3)
hyc - Friday, November 28, 2008 - link
Can you repeat these tests using ACML?

http://developer.amd.com/cpu/Libraries/acml/Pages/...">http://developer.amd.com/cpu/Libraries/acml/Pages/...

LINPACK really isn't a great code base for testing these types of systems anyway...

http://www.netlib.org/lapack/">http://www.netlib.org/lapack/
JohanAnandtech - Monday, December 1, 2008 - link
Your wish is my command :-)

http://it.anandtech.com/weblog/showpost.aspx?i=529">http://it.anandtech.com/weblog/showpost.aspx?i=529
BlueBlazer - Saturday, November 29, 2008 - link
ACML may perform worse than MKL
http://ixbtlabs.com/articles3/cpu/phenom-x4-matlab...">http://ixbtlabs.com/articles3/cpu/phenom-x4-matlab...

Of course, its interesting to see ACML in this test.
befair - Friday, November 28, 2008 - link
No no no .. they dont want to do anything that can show that AMD performs better than Intel. Hey .. where do you think you are? This in Intel land dude!
BlueBlazer - Saturday, November 29, 2008 - link
Then go back to your fantasy land. Your irrational and useless comments are not required here, they are just more of less like one of those everyday spam.
psychobriggsy - Friday, November 28, 2008 - link
Any timeframe for Nehalem 4 socket (16 core, 32 threads) comparisons against Shanghai 4S or 8S?

A major advantage of Shanghai is that it is a drop-in replacement for a couple of years' worth of S1207 server infrastructure, which is quite useful in these credit restricted times in my opinion.
Clauzii - Sunday, November 30, 2008 - link
Also the power usage is a positive factor. I think AMD have played their cards right by maybe not making the fastest CPU out there overall, but hit a sweet spot with a moderate speedbump and less power needed at a rather good price.

It seems like it will be really sweet for those already running Barcelonas.
TruePath - Friday, November 28, 2008 - link
Well yes, HT seems to slow LINPACK down but given your comments about memory it's not clear that it would still slow things down if more memory was present.

I mean LINPACK is heavily optimized and no doubt behaves differently depending on the number of cores it sees. Given that HT appears to add a core it will likely behave as if it has access to two cores and thus run into the memory barrier you mentioned above. I'd be curious to see how HT performed with more memory.

Also, I suspect that *highly* optimized SMP code will fairly frequently show reduced performance with HT. In particular if the code reacts to the number of processors and chooses the implementation/algorithm accordingly it shouldn't be uncommon to select an implementation for 2 processors that's actually more wasteful of FLOPS/IOPS than the single processor code. After all if you use 1.8 times the operations required for 1 CPU to compute the result evenly divided between 2 processors you've still finished faster. However, HT doesn't really give you twice the computational power so it will very likely fool this kind of optimization.

Is there a way that heavily optimized code like LINPACK could recognize that it's not really two processors but actually HT and react accordingly?
BlueBlazer - Saturday, November 29, 2008 - link
I also would guess that since LINPACK was so optimized, it would keep the FPU/SSE unit busy full time. With HT that single FPU/SSE unit will be shared between 2 threads, thus queueing occurs. This would delay somewhat if one thread had to rely on the other thread to finish a calculation to continue (dependancy). And if both of those threads are on the same core, that would be some delay. In other applications like those 3D renderers, each thread is highly independant (no dependancy on other threads) and that would make a difference.

My 2 cents anyway.
JohanAnandtech - Saturday, November 29, 2008 - link
Ah! It has been staring me right in the face, thanks for pointing that out. Linpack is known to have ultra high IPC (I believe it is probably between 2 and 3, will give it a shot). There is simply "no room" anymore for a second thread: the other thread is fully using the FP ports.
Pelle1948 - Saturday, November 29, 2008 - link
They have several tests showing where HT off is better:

http://techreport.com/articles.x/15818/12">http://techreport.com/articles.x/15818/12
Mathos - Friday, November 28, 2008 - link
On the other hand, once AMD releases the new platform for their server chip it should shore up the performance gap. Especially if it's the additional bandwidth from DDR3 that's causing it, or the difference between HT2.0 and QPI speed.
duploxxx - Friday, November 28, 2008 - link
actually I was going to say the same, knowing that i7 performance is here a bit high due to mem config i think shanghai actually does well and will only close the gap more when ht3 + ddr3 is released after 2p nehalem launch.

Won't be the only test that we will see negative HT influence, big questions will rise with virtulization, the old pentium xeon architecture was also way better off with HT off on that.
Darkness Flame - Friday, November 28, 2008 - link
Wait a sec, I though the only Xeon Nehalem cores that are supposed to use Fully Buffered RAM are the 8 core Beckton processors. Weren't the Gainestown processors supposed to use triple channel DDR3, and support 2 socket systems? (Hence the 2 QPI links). I would figure only the Beckton cores would scale to 4 socket systems, as they have 4 QPI links.

Regardless, though; I would definitely like to see more comparisons between Nehalem and Shanghai; especially in the database benchmarks.

Also, like what duploxxx said, I don't think we'll see a really comparison between the two, in bandwidth at least, until AMD moves to HT3 and DDR3.
BlueBlazer - Friday, November 28, 2008 - link
Can tell us how many processors or cores are used on the Opteron system?

Is that "8384" a typo? Or should it be "2384"?
JohanAnandtech - Friday, November 28, 2008 - link
No, that is 8384 CPU. But we use only one.
BlueBlazer - Saturday, November 29, 2008 - link
Thanks. What speeds are those DDR3 on Core i7 machine?
JohanAnandtech - Saturday, November 29, 2008 - link
1066 MHz DDR-3 7-7-7
BlueBlazer - Saturday, November 29, 2008 - link
Would you retry those tests on DDR3-1333 and DDR3-1600? Like to see how memory bandwidth affects these tests.

Thanks.
swhibble - Friday, November 28, 2008 - link
Shanghai has been out for the best part of... what... 2 weeks now? And all you've managed to come up with is some database testing and a one page comparison to Nehalem.

COME ON ANAND!! Nehalem got a full review as soon as it came out, why is it taking so long to do a full review of Shanghai?
Vinvin - Saturday, November 29, 2008 - link
I'd like to see compairisons with Dunnington too (6, 12 and 24 cores ...)
joshuamora - Saturday, November 29, 2008 - link
Hi.

Here you can see some 4 core runs on 2384 with DDR2-800 using only 4 cores within 1 socket.

For N=18000 I get 35.49GFLOPs which has efficiency of 82.1% a bit low but much better than the 32GFLOPs reported at efficiency of 75% by Anandtech.
For larger N and multiple of the NB you can achieve better efficiencies:
For N=28224 I get 36.47GFLOPs which has efficiency of 84.4%.

8core runs on 2 socket are within same levels of efficiency (~84.5%)
I have used for all these runs ACML 4.2 (single threaded), PGI 7.2-4 compiler and hpmpi2.2.7,binding of MPI processes only on cores of first socket.

I don't see the reason for comparing a 1 socket system against 2 socket system.
I don't see the reason for using DDR2-533 on Shanghai.

Bottom line, the AMD runs reported by Anandtech are low in terms of efficiency due to not using the appropriate library and blocking factor. I do not understand the comparison of these two very different systems offering very different features at very different prices. It would not make also sense to use 2 of the Intel systems to compete against 1 AMD system because of the big difference in pricing provided they had similar performance.

Below I provide the logs of the runs.

Best regards,
Joshua Mora.

/opt/Benchmarks/hpl-2.0/bin/AMD_ACML_HPMPI # more 4core.log
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N : 18000 21504 28224
NB : 168
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Left Crout Right
NBMIN : 8
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : no-transposed form
U : no-transposed form
EQUIL : yes
ALIGN : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0

================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L8 18000 168 2 2 109.55 3.549e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0038256 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2C8 18000 168 2 2 110.06 3.533e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0045786 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2R8 18000 168 2 2 110.11 3.531e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0045956 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2L8 18000 168 2 2 109.67 3.546e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0049196 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2C8 18000 168 2 2 110.03 3.534e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0044894 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2R8 18000 168 2 2 110.09 3.532e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0043481 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2L8 18000 168 2 2 110.08 3.532e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0042594 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2C8 18000 168 2 2 110.12 3.531e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0043521 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2R8 18000 168 2 2 109.54 3.550e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0045002 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L8 21504 168 2 2 186.98 3.546e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0038828 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2C8 21504 168 2 2 187.03 3.545e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047606 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2R8 21504 168 2 2 187.09 3.544e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0037397 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2L8 21504 168 2 2 187.03 3.545e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0038828 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2C8 21504 168 2 2 186.95 3.546e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0047606 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2R8 21504 168 2 2 187.07 3.544e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0036661 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2L8 21504 168 2 2 187.05 3.545e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0038828 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2C8 21504 168 2 2 187.07 3.544e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0037164 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2R8 21504 168 2 2 186.93 3.547e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0036661 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L8 28224 168 2 2 411.27 3.645e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0032718 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2C8 28224 168 2 2 411.02 3.647e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0032735 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2R8 28224 168 2 2 411.16 3.646e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0031464 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2L8 28224 168 2 2 411.05 3.647e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0032718 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2C8 28224 168 2 2 411.09 3.646e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0034905 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2R8 28224 168 2 2 411.06 3.647e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0031464 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2L8 28224 168 2 2 411.06 3.647e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0032718 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2C8 28224 168 2 2 411.16 3.646e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0034905 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2R8 28224 168 2 2 410.97 3.647e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0031464 ...... PASSED
================================================================================

Finished 27 tests with the following results:
27 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N : 43008
NB : 168
PMAP : Row-major process mapping
P : 2
Q : 4
PFACT : Left Crout Right
NBMIN : 8
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : no-transposed form
U : no-transposed form
EQUIL : yes
ALIGN : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0

================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L8 43008 168 2 4 727.71 7.288e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0029853 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2C8 43008 168 2 4 727.58 7.290e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0029481 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2R8 43008 168 2 4 727.62 7.289e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0026779 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2L8 43008 168 2 4 727.19 7.293e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0033299 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2C8 43008 168 2 4 727.92 7.286e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0030322 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10C2R8 43008 168 2 4 727.63 7.289e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0030579 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2L8 43008 168 2 4 727.90 7.286e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0030296 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2C8 43008 168 2 4 727.53 7.290e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0031404 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2R8 43008 168 2 4 727.24 7.293e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0031886 ...... PASSED
================================================================================

Finished 9 tests with the following results:
9 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
=============================================================================
==
BlueBlazer - Saturday, November 29, 2008 - link
How much RAM did you use on those systems?

The reason for the size is due to "We had to test with a matrix size of 18000 (2.5 GB of RAM necessary), as we only had 3 GB of DDR-3 on the Core i7 platform."
joshuamora - Saturday, November 29, 2008 - link
You can make the math yourself.
Take N*N*8 and that will be the number of bytes.
Divide by 1024^3 and you'll get the GBytes.
N=43008 for 8 cores is about 85% of 16GB, ie. 2 GB per core which is reasonable for a 2 socket quad core system.
I understand the test is done with small amount of memory: 18K /4 cores is ~ 600MB. For HPL you want to have as much memory as possible but I am not doing 4GB per core, just something reasonable.
3GB for a system with 4 cores is a bit low.
Now all depends what you do with the system.

Best regards,
Joshua
BlueBlazer - Saturday, November 29, 2008 - link
Another thing, which operating system was your test run on?
joshuamora - Wednesday, December 3, 2008 - link
Sorry for the late replay.
The runs were done on SLES10sp2. No special configuration whatsoever, just default packages, init 5.

Joshua
Trisagion - Friday, November 28, 2008 - link
I'll say. I don't want to say that there's a considerable amount of Intel bias here. You've had a full Nehalem review BEFORE the chip launched and three further reviews discussing everything from the L2 cache to QPI, but not a single Shanghai review.

What gives?

Don't tell me you haven't completed it yet or you have to get motherboard or BIOS updates...
befair - Friday, November 28, 2008 - link
hey guys ... its clear and crisp! Anandtech has been continually favoring Intel products. They should rename is AnIntelTech.com

Wow .. product after product after product ... Intel is the king, Intel wow, Intel rocks ... wow! cant believe a site that built itself on being an objective reporter of products now wants to be goody goody with Intel across the board.

AMD comes up with a product, not even a mention. Intel says "hmm .maybe after 10yrs, we will release a new cpu" and the site goes "wow! this is salvation to the human race as we know it!"
MamiyaOtaru - Saturday, November 29, 2008 - link
You know why? Because Intel products are better now. End of story. If you have some sort of weird loyalty to a company that doesn't know you exist and doesn't care, feel free to buy inferior products. But you're really delusional if you expect everyone to share your misplaced loyalty.
Griswold - Saturday, November 29, 2008 - link
Not in the IT segment. They havent been hands down better in *every* aspect for the past 5 years and now with the arrival of Shanghai, you'll have to look even closer to justify buying a Xeon based system in many more situations (shanghais efficiency blows many of Intels Xeon systems out of the water) - but surely not in all of them. It will be a while before Intel rectifies the situation with nehalem based systems and even longer in the multi-socket arena - unless Intel changes their plans drastically due to Shanghais qualities to take back marketshare AMD lost recently.

Yes, we're not talking about gaming systems here, in case you missed it.

With that said, its truly high time AT lives up to their standards and presents an exhaustive review.
BlueBlazer - Saturday, November 29, 2008 - link
Tool, you must be referring to AMD's own published Xeon results compared to Shanghai's. However that wasn't even the top score for the Xeon system.

Others take it with a huge grain of salt...
http://www.formortals.com/Home/tabid/36/EntryID/13...">http://www.formortals.com/Home/tabid/36/EntryID/13...
formulav8 - Sunday, November 30, 2008 - link
Are you anandtech's personal body guard? It seems that negative or even slightly negative posts you start calling people Tools or Retards like a 12 year old. There isn't anything wrong with their posts either. I'm sure anandtech can take care of themselves and don't need a fanny publicists.
BlueBlazer - Monday, December 1, 2008 - link
Here's another review http://techreport.com/articles.x/15905/6">http://techreport.com/articles.x/15905/6 which proves that Shanghai didn't "blow all Xeon systems out of the water".
BlueBlazer - Saturday, November 29, 2008 - link
Go spam somewhere else.
JohanAnandtech - Friday, November 28, 2008 - link
Before you got out and lynch Anand, know that this *it*.anandtech.com.
Anand has never received the Shanghai systems, we have. (That is Jason/Ross and me, Johan). And what is so bad about Anand continously publishing reviews about Nehalem? The more info the better I say.

So different people, different benchmarks and generally we are slower. My Dunnington review was a little bit later than planned too.

Anyway, I got the Quad Socket system. So the original plan was to run ESX etc. on it, like we have done on Dunnington. However, the BIOS is beta, and is not able to get ESX installed and power measurements are also not accurate. So I had to leave all the benchmarking we have done so far and go to plan B.
mkruer - Friday, November 28, 2008 - link
To quote Dean Yeager: Your theories are the worst kind of popular tripe, your methods are sloppy, and your conclusions are highly questionable. You are a poor scientist, Dr. Venkman!

But seriously. Anandtech has become the next Toms Hardware. (The irony is that Toms Hardware has gotten better) Don't believe me look at the lat ATI vs Nvidia review. ATi had a card that was 85-90% as fast as Nvidia and cost half as much. The conclusion... Nvidia was better.
BlueBlazer - Saturday, November 29, 2008 - link
If you have nothing useful to inquire, comment or contribute here then STFU! Please keep your rants to yourself and with your other retarded buddies in your own asylum.
formulav8 - Sunday, November 30, 2008 - link
Grow-up, He posted nothing improper like you did.
JohanAnandtech - Monday, December 1, 2008 - link
No he only accussed us of bias without giving any proof. Why is it so hard for some people to distinguish between a blog post - which is meant to give first impressions - and a full blown review?
MonkeyPaw - Monday, December 1, 2008 - link
Don't worry, I've been reading anandtech for a while, and I don't see this big bias theory. You have products to compare, which means you have to compare products to rivals and then pick a winner. That's not always easy or popular. Regardless, true tech enthusiasts want as much (accurate) info as they can get as soon as possible, that way we can make our own conclusions and figure out what product best meets our needs. A good tech site worries more about the details and less about the winner. From what I can see, anandtech uses much more space talking details, and very little space saying "and the winner is..."
Griswold - Saturday, November 29, 2008 - link
I hope you'll join them there - its definitely where you belong to.
BlueBlazer - Saturday, November 29, 2008 - link
MKruer, a moderator of AMDZone bragging about his "exploits" here...
http://www.amdzone.com/phpbb3/viewtopic.php?f=52&a...">http://www.amdzone.com/phpbb3/viewtopic.php?f=52&a...

Well I guess that proves how childish those braindeads are... that's why they need an asylum like AMDZone.
ruiner5000 - Monday, December 1, 2008 - link
Ahh, 11 years of fun. Of course we will be around for longer than Anandtech whether you get it or not.

I for one miss the Ace's Hardware Johan, but I still love Johan.

Screw the haters Johan. These guys have no idea what goes into doing a review. Ignorance is bliss.

Old bluebawlz must have a Core i7 TLB bug jacket.

LINPACK: Intel's Nehalem versus AMD Shanghai

Post Your Comment

60 Comments

Back to Article

ZootyGray - Tuesday, December 2, 2008 - link

ZootyGray - Tuesday, December 2, 2008 - link

s1ugh34d - Saturday, November 29, 2008 - link

wwswimming - Saturday, November 29, 2008 - link

BLaber - Friday, November 28, 2008 - link

thebeastie - Friday, November 28, 2008 - link

Griswold - Saturday, November 29, 2008 - link

thebeastie - Monday, December 1, 2008 - link

narlzac85 - Friday, November 28, 2008 - link

JohanAnandtech - Saturday, November 29, 2008 - link

befair - Monday, December 8, 2008 - link

befair - Friday, November 28, 2008 - link

JohanAnandtech - Saturday, November 29, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

JohanAnandtech - Monday, December 1, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

LawJikal - Friday, November 28, 2008 - link

hyc - Friday, November 28, 2008 - link

JohanAnandtech - Monday, December 1, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

befair - Friday, November 28, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

psychobriggsy - Friday, November 28, 2008 - link

Clauzii - Sunday, November 30, 2008 - link

TruePath - Friday, November 28, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

JohanAnandtech - Saturday, November 29, 2008 - link

Pelle1948 - Saturday, November 29, 2008 - link

Mathos - Friday, November 28, 2008 - link

duploxxx - Friday, November 28, 2008 - link

Darkness Flame - Friday, November 28, 2008 - link

BlueBlazer - Friday, November 28, 2008 - link

JohanAnandtech - Friday, November 28, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

JohanAnandtech - Saturday, November 29, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

swhibble - Friday, November 28, 2008 - link

Vinvin - Saturday, November 29, 2008 - link

joshuamora - Saturday, November 29, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

joshuamora - Saturday, November 29, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

joshuamora - Wednesday, December 3, 2008 - link

Trisagion - Friday, November 28, 2008 - link

befair - Friday, November 28, 2008 - link

MamiyaOtaru - Saturday, November 29, 2008 - link

Griswold - Saturday, November 29, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

formulav8 - Sunday, November 30, 2008 - link

BlueBlazer - Monday, December 1, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

JohanAnandtech - Friday, November 28, 2008 - link

mkruer - Friday, November 28, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

formulav8 - Sunday, November 30, 2008 - link

JohanAnandtech - Monday, December 1, 2008 - link

MonkeyPaw - Monday, December 1, 2008 - link

Griswold - Saturday, November 29, 2008 - link

BlueBlazer - Saturday, November 29, 2008 - link

ruiner5000 - Monday, December 1, 2008 - link

Log in

Don't have an account? Sign up now