Core 2 Duo and the future of Intel

Robert Hallock (Thrax) According to Intel marketing, the new Core 2 Duo architecture is the wave of the future, and we have no doubt that this capable CPU design is here to stay. We take a closer look at what it offers and what's to come.

November 5, 2006 10:17 PM ET in Articles, ,

Out with the old, in with the new… as fast as humanly possible.
That’s Intel’s take on the age-old adage as they quickly accelerate the deployment
of the fastest x86 chip on planet earth. According to Intel marketing, the
new Core 2 Duo architecture is the wave of the future, and we have no doubt
that this capable CPU design is here to stay. Released on July 27th, 2006,
the Merom, Allendale, Conroe, Conroe XE and Woodcrest series
of CPUs jump-started an ailing Intel Corporation and sparked Intel’s return
to the performance race in a big way; claiming a 40% power reduction and speed
improvement over Netburst-based Pentium 4 derivatives, studies concluded
that not only did Intel trump themselves, they handed AMD a heavy blow.

The
power, thermal and speed enhancements that helped the “Core Microarchitecture”
rocket past its predecessors also helped chips based on the Core
silicon leave their AMD counterparts in the dust. In each market segment,
be it mobile, desktop, workstation or server, Core-based components
are faster than the AMD parts which once held the speed crown, and there seems
no sign of slowing down.

Make no amends, when AMD released their K8 in late September
of 2003, Intel was caught off-guard. Coming off an oscillating battle between
the Pentium 4 and the Athlon XP series of CPUs, Intel’s heavy investment in
the gigaherz race left them unprepared for the savage beating that AMD would
deal to the gentlemen in Santa Clara, CA with the Hammer and its
successors.

Unlike Intel, which opted to produce exceptionally warm-running
CPUs with very error-prone pipelines of absurd lengths, AMD took the route
of efficiency, and produced CPUs which could do more per megaherz of clockspeed
than Intel had dreamed of. Despite attempts at fixing the Netburst design, regarded as broken in its conception, with large amounts of cache,
hyperthreading and even longer pipelines, the sole hope that Intel
could compete with AMD in the desktop and server arenas came in the form of
a little mobile CPU known as the Banias.

The Banias CPU was explicitly designed from the ground
up with mobile applications in mind, and was released in March of 2003 under
the Centrino name. The Centrino, which has since become the title
for the package of technologies that comprise Intel’s flagship mobile offerings,
has thus far featured a series of four generational CPUs with extremely low
voltage requirements, but abnormally high levels of productivity and floating-point
strength. When the original Banias was released in March of ‘03,
enthusiast websites began to notice that the CPU was a real workhorse. As
the K8 could surpass a Netburst-based CPU even with an 800-1000MHz
clock deficit, Banias-based Pentium M CPUs were quickly recognized
as being able to blow past K8s even with a 400-600MHz clock deficit. As time
progressed, the clamor for porting the Banias to the desktop grew
stronger, and came to a head when the Banias bowed out to the newly-fabbed
.09nm Dothan. It was with the Dothan in May of 2004 that a company called
FPU debuted an ATX motherboard that brought desktop amenities, like dual channel
memory and faster frontside bus speeds; apples-to-apples benchmarks concluded
that the Dothan was a screamer, handing numerous defeats to the vaunted
K8 architecture.

There was, however, a problem with the conclusion drawn by enthusiast
websites at the end of 2005’s summer months: Intel was sternly entrenched
in that gruesome gigaherz race, dumping billions of dollars in R&D into
a performance-hemorrhaging line of desktop CPUs. Since 2001, Intel had hammered
the idea of high clockspeeds into the minds of John Q. Public. Bigger numbers
were better, more cache was better, virtual cores were better. More, more,
more! More of everything but efficiency. Unfortunately for Intel,
theOpteron kept running circle around any and all flavours of Xeons. It completely
shamed the Itanium name by offering 64bit computing backed by the power of
x86 knowledge and compatability. But perhaps the most damning example of the
Pentium 4’s utter failure to perform was in a 2005
study
that concluded it would take a whopping 5.2GHz Pentium 4 based on
the Prescott core just to rival the Athlon 64 in enthusiast applications.
We estimate that around that time, Intel hit the breaking point; recognizing
that they had perhaps lost it all with the abysmal performance of Netburst-based
Pentium 4 and Xeon chips, Intel called its engineers to the table and quietly
acknowledged AMD’s engineering principles as the right engineering
principles. By July of 2006, Intel would unify the architectures of their
desktop and laptop offerings for the first time since the Coppermine
Pentium III.

To bring about this unification, Intel had to reverse gears
and set their sights on an ultra-efficient CPU that ran at low temperatures
and required very low voltages. With this in mind, the first test of Intel’s
new product goals came in the form of Yonah, the first product designed
on the Core series of architecture which would evolve into the Core
2
design we use today. Replacing the Dothan under the Centrino
label, the Yonah-class CPU came in single and dual-core variants
with a .065nm fabrication technique, and would firmly make its mark on mobile
workspace. The chip was everything laptops needed: Fast, low-voltage and cold.
Witnessing their master plan unfolding with the wild success of the Core Solo
and Core Duo chips, led solely by the Yonah, Intel continued to beaver
away at bringing the powerhouse technology to the desktop. Perhaps by the
time of the Core architecture’s release with the Yonah chip
in January of ‘06, Intel had finally envisioned its successor by the name
of Core 2, the the umbrella name for a whole host of CPUs designed
for every market segment.

Taping out and entering initial fabrication in the spring of
2006, the Core 2 processors were unleashed to the masses in July
of this year. The Core 2 technology itself was designed explicitly
to supplant the Pentium 4, bidding a farewell to the Pentium as Intel’s
primary brand name since 1993. Owing its heritage to the successes in the
Yonah, and tracing its roots back to codename P6 — the
Pentium Pro, the Core 2 series of processors is the culmination of
nearly three years in mobile engineering to produce Intel’s broadside volley
at AMD’s domination of the performance charts.

What makes the Core 2 Duo so Fast?

Boosting Efficiency

The Core 2 series of chips, in representing a significant
departure from the Netburst-based chips of old, are designed to maximize
Instructions Per Cycle
(IPC), or the number of tasks the CPU can perform per cycle of the clock (1MHz
represents a cycle). The Core 2 chips are estimated at four IPC,
to the K8’s three, to the Pentium 4’s two. While IPC is not a precise measurement
of a processor’s speed, it has a significant impact, and we can give rough
theoretical numbers: The Core 2 series of CPUs are approximately
50% faster than Pentium 4s at the same speed, and 33% faster than
Athlon 64s at the same speed. Practical measurement pegs these synthetic measurements
at about 10-15% too high, and this is due to the various optimizations the
respective chips have received.

As far as architectures are concerned, let’s quickly sum up
the differences between the latest generations (All dual-core) of Netburst,
K8 and Core 2 from a sky-high perspective of general design:

Intel Core 2
AMD K8
Intel Netburst
Fabrication Technique (nm)
65 / Conroe
90 / Windsor
65 / Presler
Socket
LGA 775 (“Socket T”)
Socket AM2 (940 Pins)
LGA 775 (“Socket T”)
L1 Cache
64k Exclusive Per Core
128k Exclusive Per Core
24-32k Exclusive Per Core
L2 Cache
4MB Shared
512k/1MB Exclusive Per Core
2MB Exclusive Per Core
Bus Speed
1066MHz – PC2-6400
800MHz – PC2-6400
800MHz – PC2-6400
Pipeline Length
14 Stages
12 Stages
31 Stages
SSE Engine Width (In Bits)
128
64
64
Max Memory Bandwidth to CPU
10.6GB/s
6.4GB/s
8.5GB/s
L2 Cache Addressing Width
256 bits
128 bits
256 bits
L1+L2 Cache Latency
~11-14 Cycles (L1 = 3 Cycles)
12 Cycles (L1 = 3 Cycles)
~16 Cycles (L1 = 4 Cycles)

Going from the table, the Core 2 clearly has massive
bandwidth between the processing cores and the cache in the form of a 256
bit cache width with a median access time of thirteen cycles. What this means
is that the Core 2 can access twice as much cache information at
on average of one cycle slower when pitted against the newest Windsor-class
Athlon 64 X2 chips. In the real world, the Core 2 delivers two times
as much information to the CPU cores as the Windsor, while the L2
cache itself is about two and a half times faster thanks to the Core
2’s
ingenius cache design. When placed against the Presler, the
Core 2 can access the same amount of data but do it up to 25% faster.

Supercharging Common Tasks

Another advantage of the Core 2 comes in the form of the width of
the SSE engine. Many applications today make use of the SSE registers to do complex mathematical tasks for media encoding, gaming, 3D rendering,
audio, and a whole raft of enthusiast, prosumer, and even enterprise-class
processing tasks. A register is a stream-lining of commonly used algorithms
in processing, and in the specific case of SSE, the SSE register simplifies
multimedia tasks which would otherwose gobble up large chunks of CPU time
to do things we consider very simple, like video encoding. Without delving
deeply into the intracacies of media encoding, it is a purely mathematical
task which analyzes, resizes and compressess thousands of sequential images;
without things like SSE, these tasks would take days, maybe weeks, not hours.
Getting back to the Core 2’s implementation of the SSE engine, we
see that it is a 128 bit width. This width is significant because SSE registers
are 128 bits in length. Prior to the advent of the Core 2, each SSE
register that was called was spread across two clock cycles, meaning that
the maximum number of usable SSE registers for a CPU was one half of the CPU’s
given clock speed. In the Core 2’s case, it can gobble up SSE registers
twice as fast as its competitors, able to process a register for every megahertz
the chip has backing it. In the real world, this means that applications that
heavily rely on SSE1/2/3/4 could be accelerated by as much as 50%, on top of the boost in speed granted by cache speed and bandwidth.

But it only gets better, as while the K8’s Hypertransport
architecture previously provided unprecedented levels of memory bandwidth
to the CPU, even AMD’s most recent 1GHz implementation of Hypertransport is
not enough to stave off Intel’s 40% better memory bandwidth. Additionally,
while AMD has long-dominated the CPU <-> Memory latency to the tune
of 47 nanoseconds (The Pentium 4 is 100% slower), we are beginning to understand
that this lead has shrunk to only about 17% thanks to the Core 2’s design.
This is not the only design improvement Intel has done with CPU-to-RAM communication,
and the second comes in the Core 2 Duo’s intense bus design.

Making a Better Pipeline

Next on the list is a technique that Intel calls Macro-Ops
Fusion.
While this is a very fancy name, it allows Intel to do something
very remarkable with their CPU, and that’s combining complex processing tasks
(Each processing task is known as an instruction) so a single processor cycle
can compute them. To elaborate, let’s say a user calls a function
in a program, like opening a picture; beneath the goal of opening the picture
is a programming language that drives the task, and the programming language
itself is ultimately decoded by the CPU. In this case, the CPU receives the
user’s request to open a picture via the underlying code of the program, and
the CPU must then compute the proper instructions to make that picture happen.
Sometimes the code powering the task requires multiple instructions to be
run, for example one instruction sequence to decode the image, another instruction
sequence to process the menu’s style and function. All of these instructions
enter an instruction queue, and it is the job of a piece of the CPU
called an x86 decoder to understand what queued instructions are being called
by the programs, and to translate those into strings of instructions that
can be processed more efficiently — these efficient strings are called micro-ops.

Not only does the Core 2 have the most x86 decoders in the history
of desktop computers at four (Compare this to the Presler’s one and K8’s three), it has the capability to combine micro-ops into something
Intel calls a macro-op. Traditionally speaking, there can be one
instruction per decoder per cycle, but Intel has given the Core 2 the ability to recognise micro-ops that can be fused together as a single
output of an x86 decoder, and to go ahead and combine them for processing
in the pipeline — your picture opening. While this may seem insignificant,
it’s one of the most crucial keys to the Core 2’s design prowess.
Combining micro-ops into a single macro-op gives the CPU the effect of a fifth
x86 decoder 10% of the time, according to Intel. This too may seem insignificant,
but consider that it’s a 10% throughput advantage that other CPUs just
don’t have,
on top of a 400% improvement in instruction throughput just
over its immediate predecessor.

When a macro-op enters the pipeline, it has a two-fold benefit
for processing time: The first is that an entity known as the Out of Order
Buffer,
or a section of the CPU that corrects mistakes in the order in
which micro-ops are entering the pipeline, has one less micro-op to reorder
if necessary. The second advantage is lower overhead in a section of an x86
CPU called the backend, or the part of the pipeline that determines
precisely when an op is entering the pipeline, shoving it into the pipeline,
getting it processed, and moving it out of the line. Like the reduction in
the OoO Buffer’s overhead, the scheduler suddenly has one less instruction
to keep track of because it’s been combined with a brother. The time savings
are enormous.

One of the last big features, aside from some generic improvements
we’ll touch on in a moment, is something called micro-ops fusion. To
put it much more simply than macro-ops fusion, micro-ops fusion allows very
long and complex instructions to be shuffled to other parts of the CPU and
processed in one micro-op. The effect is simple: Tasks which would require
two micro-ops have been designed as one micro-op since the days of the Banias
the Core 2’s heritage reveals itself! These micro-ops give
the backend and the OoO less of a headache and increase CPU efficiency. This
is a jovian accomplishment, as such a task would previously destroy a CPU’s
potential upper headroom. And while the description we have given here is
a gross simplification of the real effect, it serves its purpose to illustrate
the point.

In the K8, on the other hand, it’s a bit of a tug-of
war. As we mentioned, the K8 has three x86 decoders known as complex decoders
compared to the Core 2’s three simple and one complex. A complex
decoder handles x86 instructions that produce multiple micro-ops, and a simple
decoder handles x86 instructions that produce a single micro-op. The advantage
of the K8’s three complex decoders is that at any one time, it can
handle three times as many complex x86 instructions as the Core 2
chip, but each complex-decoded instruction must be passed to a sequencer which
leads to computational overhead and delayed processing time. So, in effect,
the K8 can handle more of the complex tasks, but at the
expense of speed. The result is that the K8 is faster in the presence
of extreme amounts of complex instructions, but when the complex instruction
queue is shallow, the Core 2 blows past it by chunking simple instructions
through simple decoders without overhead. Unfortunately, however, Intel’s
implementation of macro-op fusion doesn’t exist in current AMD chips, and
Intel’s implementation of micro-op fusion is faster than AMD’s.

Last, and certainly least, the Presler comes in dead
last with only one complex decoder, coupled to sequencer overhead, which dumps
into a backend that can’t possibly hope to fill the 31 stage pipeline. The
unfortunate result of this boneheaded CPU design is a CPU that only benefits
from consistent and predictable input, like media-encoding, which is a constant
stream of the same functions over and over for hours. In the grander scheme
of things, the Presler is just choking from a lack of data to keep
itself going. Attach this abysmal bottleneck to a pipeline which is so long
that it often has to abort decoded ops due to computational errors or mispredictions
in what the user was going to do next, and you have a dud of chip that finally
won in the war against heat more than a year too late for it to matter. Furthermore,
released less than a year prior to the Core 2 microarchitecture,
the Presler will be unceremoniously relegated to extreme budget
applications starting with the early quarters of 2007.

As far as the generic improvements are concerned, the Core
2
line brings faster ALUs and FPUs, both of which increase the speed
at which the Core 2-based CPU lines can crunch numbers. Over all,
it was the goal of the engineers at Intel to make each standard component
of an x86 CPU faster than any previous processor. At every turn, cache is
faster, pipelines are faster, decoders are faster, SSE is faster, registers
are faster, instructions are faster. Intel has succeeded in going for more,
more, more
without being a laughing-stock as they were with the Pentium
4 line.

Intel of Yesterday and Tomorrow

The Workstation and Enterprise Market

With today’s Core 2 architecture bringing the first
significant jump in x86 power since the K8 came to dethrone Netburst,
Intel has a full-featured product range that spans notebooks, mainstream,
enthusiast, workstation and server ranges. At last we’re going to touch on
what Intel has to offer, and take a look at what segments the products fill,
and where Intel is headed with their products. First up at the top of Intel’s
range is the Xeon CPU, which has existed as a name since the days
of the Pentium II. Often boasting larger cache sizes than its desktop brothers,
the Xeon is positioned to fulfill the needs SMP systems, be they rackmount,
workstation, or clusters. The Xeon has reasonably shadowed the evolution of
Intel’s desktop line, and today there are four different flavours of Xeons
floating around: Two from the Netburst era and two from the Core
2
era. On the horizon, there are a series of three separate classes of
Xeons designed to fulfill Intel’s stratification goals within the Xeon line,
so let’s take a look:

Codename
Release Date
Market Name(s)
Cores / SMP
Die Size (nm)
Socket
Frequencies
Voltages
TDP
Dempsey (Netburst) May 23, 2006 Xeon 5030-5080 2 / Yes (2 CPU) 65 LGA 771 2667-3733MHz
95/130w
Woodcrest (Core 2) June 26, 2006 Xeon 5110-5160 2/ Yes (2 CPU) 65 LGA 771 1600-3000MHz
65/80w
Conroe (Core 2) June 26, 2006 Xeon 3040-3070 2 / No 65 LGA 775 1866-2667MHz
65w
Tulsa (Netburst) August 27, 2006 Xeon 7110N/ M-7140N/M 2 / Yes (2-8 CPU) 65 Socket 604 2600-3500MHz
95/150w
Kentsfield Est 4Q06/1Q07 Xeon X32xx 4 / No 65 LGA 775
135w (Est.)
Clovertown Est 2H07 Xeon E/X53xx 4 / No 65 LGA 771
80w (Est.)
Tigerton Est. 2H07 Xeon ???? 4 / Yes (2-8 CPU) 65 LGA 771
80-130w (est.)
Harpertown Est. 2008 Xeon ??? 8? / ??? 45

It is important to know, as we mentioned above, that Intel stratifies
their Xeon line into three separate segments: Workstation, two CPU and multi-CPU.
The workstation line is frequently a clone of the highest-performing desktop
CPU at the time, and we see this is the case with Intel’s Xeon 32xx series.
The 32xx series currently features Conroes with a Xeon name, and
will eventually feature a rebadged version of Kentsfield, Intel’s
upcoming desktop quad-core chip of two Conroe CPUs in one processor
package. The second line of chips that Intel offers is the Xeon series starting
at 51xx, which unlike the Xeon 32xx line, features more cache and higher FSBs
than desktop counterparts, but more importantly, 2P support. Lastly, comes
Intel’s grand offering of the Xeon MP 71xx line (Noted with the -N
suffix if it has a 667MHz FSB, -M with an 800MHz FSB), which are Xeon CPUs
capable of working in SMP configurations.

Intel has something of a problem with the Tulsa (Xeon
71xx) and Woodcrest (Xeon 51xx) being on the market at the same time,
and the issue actually lies in what the Woodcrest can’t
do: It’s only dual-processor capable. While the Woodcrest is profoundly
faster than Tulsa, Intel only has the Tulsa to compete with
the Opteron in the very important 4P+ processor segment.
That means until Tigerton ships in 2007, Intel will have no answer
to the dominance of the Opteron in the highly-lucrative four-way or eight-way
CPU market. This situation is also compounded by socket disparity in the Tulsa
and Woodcrest. Companies looking to make an initial investment
in the Tulsa for a 4P or 8P system are stuck with an outdated Socket
604 platform while Intel hustles LGA771 to the Woodcrest-and-beyond
crowd. It’s not as though you can buy Tulsa now and jump to Tigerton
in 2007, which is something you could do if you bought a Woodcrest-powered
2P system, but that wouldn’t fit your CPU requirement. Intel can’t
get Tigerton out fast enough, and it knows it, which is why it’s
up and down the trade-shows with 4P/16C Tigerton boxes blazing away.

Beyond Tigerton, the situation is very hard to determine:
Harpertown is mentioned as the server version of Yorkfield
which we will discuss below, but the information available on Harpertown
suggests that it is just Yorkfield with more cache and a better FSB
as has been the case with virtually every Xeon in the last five years. We’ll
touch more on the disparity in the desktop section. Names also floated in
the last twelve months include Dunnington as a successor to Tigerton
and Gainestown which we were unable to find concrete information
for.

In the desktop end of things, the situation is significantly
more clear, as Intel is vastly less mum about what they intend to do with
upcoming CPUs. The flagship force of Intel’s desktop line comes in the form
of Conroe, to be joined by the quad-core Kentsfield very
soon. The Kentsfield is comprised of two Conroe chips wedged
into one CPU package. From now until approximately 2009, the Conroe and
its closely-related successors will be Intel’s mainstay for most of us. In
2009, however, Intel is expected to produce the first post-Core 2
architecture in the form of Nehalem for their Centrino line, and
the desktop segment will closely shadow the release of Nehalem with
the Westmere, a desktop version.

Mainstream and Enthusiasts

In the desktop, the socket choice is much simpler: LGA 775 until
2009. Gone are the days of Intel shuffling sockets every time a new Pentium
4 revision hit the shelves. LGA 755 is a forward-thinking interface backed
by the power of some of the best chipset engineers in the business, and until
post-Core 2 chips are floating about LGA775 is a socket that’s here
to stay. So, with that said, let’s take a look at what’s being offered for
the desktop:

Codename
Release Date
Market Name(s)
Cores
Die Size (nm)
Socket
Frequencies
Voltages
TDP
Allendale July 26, 2006 Core 2 Duo E6300 / E6400 2 65 LGA 775 1860 / 2133MHz
95/130w
Conroe July 26, 2006 Core 2 Duo E6600 – X6800 2 65 LGA 775 2400-2930MHz
65w
Kentsfield Est 4Q06/1Q07 Core 2 Quad Extreme QX6700/Q6600 4 65 LGA 775
2667/2400MHz
135w (Est.)
Wolfdale Est. 2H07 Core 2 Duo ???? 2 45 LGA 775
Yorkfield Est. 2H07
4 45
LGA 775
Westmere Est 2H08/2009
32

Now, we originally talked about the Yorkfield in the
Xeon section, and we’d like to come back to it at last. The information on
the Yorkfield is very contradictory, and the information we have
placed in the table is what’s considered the safest expectation of the core.
Here is what we do know about Yorkfield: It’s going to be
the successor to Kentsfield, it will be a 45nm design, it will be LGA775 and
it will have atleast 4 cores. Where the conflict comes into play is with Yorkfield’s
core-count, which has been said to be a minimum of four, a possible of
eight, or a maximum of thirty-two. It is the desktop version of the
Xeon’s Harpertown chip, however we don’t expect either the Harpertown
or the Yorkfield to be more than four cores; we suspect these chips
will be positioned as die-shrinked derivatives of their predecessors.

Perhaps a more interesting processor is the Westmere,
the first desktop processor that will feature architecture not explicitly
based on existing Core 2 silicon. For over two years, Intel has had
the Nehalem on the roadmap for mid-2009 in the mobile segment. The
Nehalem is a 45nm mobile part which we’ll discuss later, but suffice
it to say what was known for years as the Nehalem-C for the desktop
version is now known as Westmere. Westmere is expected to
be introduced as the first desktop chip at 32nm as a die-shrunk derivative
of the Nehalem, possibly featuring eight cores in an unknown socket.
The Westmere is also one of the first desktop chips expected to use
Intel’s CSI, or Common Systems Inteface, a technology that
will first be debuted in the Tukwila-class Itaniums in mid-2008 and
trickle down from there. The CSI is Intel’s replacement for the Front Side
Bus, allowing cores to communicate with one another directly without moving
communication out to the northbridge. Furthermore, CSI may also include the
introduction of an on-die memory controller which will seal Intel’s transition
to a Hypertransport-like bus architecture.

Our prediction is that Yorkfield will either be a 45nm
die shrink of the Kentsfield (More likely), or it’ll rear
its head as a “Native” quad-core component to battle with AMD’s
K8L platform to be released at roughly the same time. In the approximately
eighteen months between Yorkfield and Westmere, we can envision
an unroadmapped octal-core design featuring two Yorkfields in a single
package similar to the Kentsfield harboring two Conroes
today. This seems to follow Intel’s track record of condensing dies in a package,
such as the Tulsa being supplanted by the Woodcrest. What
happens after Westmere is anyone’s guess.

Continuing to float the main-stream in the form of dual-core
throughought 2007, Intel will die-shrink Conroe and call it Wolfdale.
Allendale, we presume, will be phased out in favour of the old 65nm
Conroes, while Wolfdale will continue on as the flagship
CPU, with Yorkfield succeeding Wolfdale by a just a month
or two forr the more-money-than-sense enthusiast crowd. Topping out the range
of mainstream CPUs, the Wolfdale-class dual core chips will be joined
by Kentsfield CPUs not bearing the Extreme Edition flag,
thus bringing the price down. Bringing up the rear, CPUs we have not included
on the roadmap include Wolfdale-L and Conroe-L, with the
“L” denoting low-cost. Intel plans to push single-core 65nm Conroes
under the Pentium and Celeron names as budget options, and when the die is
shrunk, the same will continue to occur with Wolfdale-L.

Out and About: The Mobile Market

Lastly, we have the mobile segment, which has played a curious
role in the development of the Core 2 series. Unlike AMD which produces
its mobile lines as an afterthought to the desktop and enterprise segments,
Intel spends copious amounts of time specifically-engineering mobile CPUs
that aren’t cut from the same cloth as its bigger brothers. As we mentioned
in the start of the article, the Banias mobile CPU was a complete
180 from the direction Intel was headed with their Pentium 4 line, but the
little CPU that could became the inspiration for today’s Core 2 components,
and represented the idea that Intel wasn’t just a stale CPU company clinging
to doping the uneducated masses. We see further elements of mobile-creates-desktop
in the Penryn, itself a die-shrunk 45nm mobile-specific version of
the Conroe. Clearly illustrating that Intel has a special care for
their mobile line, it will test 45nm on Penryn and then use that
practice to make the Nehalem.

The Nehalem itself will be the first post-Core
series architecture, drawing only the 45nm fabrication technique from its
predecessors. Not much is actually known about the Nehalem, but it
has been mentioned in association with Gainestown, Bloomfield, Gilo and
Beckton. Furthermore, the Nehalem will first be seen on
the Centrino platform and then be ported to the desktop as 45nm product
as the Westmere in 2009. Lastly, to round out Intel’s offerings, a brand new
architecture out of Intel’s 32nm fabs in 2010 will be known as the Gesher.
Nothing is known about it other than its size.

Respecting Moore’s Law: Intel’s New Engineering Cycle

With the release of the Core 2 Duo and its counterparts,
Intel has adopted what they call the “Tick-Tock” approach to CPU
development. What this means is that every two years, Intel will have a die-shrink
for an existing processor design within the first year, and a new generation
of architecture by the end of the second. We can see this in the diagram below
as the Intel Core microarchitecture in the form of Yonah, as
a shrink from the 90nm Dothan. By 2006, at the end of the cycle starting
in 2004, Intel debuted the new Core 2 architecture. By the end of
2007, Nehalem is born for laptops, and shrunk to 32nm for the desktop
version a year later. In this way, we can begin to see that Intel ticks
with the mobile platform, and tocks with a shrunk destkop version.

tick_tock_method

Lastly, we have a very rough roadmap gleaned from the copious
amounts of research conducted for this guide:

roadmap_thumbnail

As you can see, there are obvious gaps in what is to succeed
the Kentsfield and Conroe chips in the low end workstation
space; the gap indicates that Yorkfield may be moved
up, which is the most likely option. We also see what can either be seen as
a dual-core gap after Wolfdale in the middle of 2008, or the final
elimination of dual-core CPUs. Judging by the year, the latter of the two
options may be more likely. The last sore spot in the roadmap comes in the
form of Clovertown’s exceptionally long life. We have a hard time
believing that something won’t come to usurp its role in the Xeon DP space
after roughly the middle of 2008. We feel as though a server version of Westmere
may be very likely.

All in all, Intel has done nothing short of a remarkable job
with the Core 2 line, and we appreciate Intel’s acknowledgement that
the Netburst era was a dark and sour one for the chip giant. From
this point forward, Intel demonstrates a steady course of die shrinks and
innovation with their tick-tock engineering, and we can similarly
appreciate the environment of cross-segment inspiration that Intel’s brass
has established. While Intel may be hard-pressed to get the enterprise situation
back under control, particularly with the utter lack of SMP configurations
under the Core 2 design to combat the Opteron, they can be lauded
for their return to the spotlight for the desktop and mobile segments. We’ll
keep a close eye on the market for the next few years and update you on changes
to the roadmap, including delays, fulfilled expectations, and surprise cores.

1 Comment:

  1. How did this end up bumped? dumping it.

Troll-free since 2003 ®