0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
a 1010
b 1011
c 1100
d 1101
e 1110
f 1111
address of 1st element of SP = FFFEh
element size = W = 2B
indefinite filling of Stack = Stack overflow (overwritten memory locations of program, but not of the OS)
Heap area = Dynamic Memory = optional zone, where during runtime, the programmer, through instructions,
temporarily allocates some memory for variables whose dimension can only be verified
during execution (eg the size of an input string). Its size is not predetermined and
can also be allocated and deallocated several times during runtime.
The management of the area of heap is obtained through code area instructions.
Stack area = memory zone handled automatically by Compilers. By programmer instructions
the compilers manage the area of stack transparently to the programmer,
allocating and deallocating the local variables and parameters passed to the procedures.
Only in the Programming in Assembly it's possible directly handle the stack area
with appropriate instructions.
MC includes (CPU=MP) + ((cache level 2 (greater and faster than cache level 1 included in MP))
HW <--( BIOS = SW normally contained in ROM or other non-volatile memory) --> SW
Clock - clock frequency - is the number of switches 0 and 1 that circuits in MP, normally
an instruction needs of more clock cycles.
Quartz Oscillator which is inside the cpu=mp and can be controlled via BIOS
CMOS complementary metal-oxide semiconductor, microchips that maintains hardware and
configuration settings by the power of a buffer battery
cache level 1 (I-cache = instruction cache + D-cahe = data cache) is inside MP
cache level 2 , greater and faster than L1, is in MC but extern and near at MP
BIU bus interface unit , input of info in processor, duplicates infos and send to
cache L1 (I-cache and D-cache) and to Cache L2
Fetch decode unit , fetches instructions from I-Cache ,
BTB branch target Buffer, compares every instruction with a record of another buffer
to verify if this instruction already is already used
AX BX CX DX to store. Arithmetical registers
SP BP DI SI to access to memory
to address memory spaces
memory = Segments
address = Segment:Offset
the segment = 16 bit + 0000 at right that multiply by 16)
eg : [CS] = 123Ah , [IP] = 341Bh
SEG:OFF = 123A0:341B = 123a0+341b=157BBh = physical address
from the top to bottom:
effective address
segment address
physical address
hexadecimal addresses, from the top to bottom:
first segment
2nd segment
3rd segment
FLAGS are indicator bits (grouped into a status log named PSW register)
normally read by conditioned jump instructions
OF: overflow indicator. It is setted at 1 when the result of one
addition or subtraction (with sign ) causes overflow
SF: sign indicator. It is setted at 1 when the result of a logic-arithmetic
operation is a negative number (= MSB of the result)
ZF: zero indicator. It is setted at 1 when the result of a logic-arithmetic
operation is = zero
CF: setted at 1 when a logic-arithmetic operation gives a rest
(indicates overflow in case of numbers
without sign)
PIC
INTEL
16-bit Processors and Segmentation (1978)
The IA-32 architecture family was preceded by 16-bit processors, the 8086 and 8088.
The 8086 has 16-bit registers
and a 16-bit external data bus, with 20-bit addressing giving a 1-MByte address space.
The 8088 is similar to
the 8086 except it has an 8-bit external data bus.
The 8086/8088 introduced segmentation to the IA-32 architecture. With segmentation,
a 16-bit segment register
contains a pointer to a memory segment of up to 64 KBytes. Using four segment
registers at a time, 8086/8088
processors are able to address up to 256 KBytes without switching between segments.
The 20-bit addresses that
can be formed using a segment register and an additional 16-bit pointer provide
a total address range of 1 MByte.
The Intel 286 Processor (1982)
The Intel 286 processor introduced protected mode operation into the IA-32
architecture. Protected mode uses the
segment register content as selectors or pointers into descriptor tables.
Descriptors provide 24-bit base addresses
with a physical memory size of up to 16 MBytes, support for virtual memory
management on a segment swapping
basis, and a number of protection mechanisms. These mechanisms include:
• Segment limit checking
• Read-only and execute-only segment options
• Four privilege levels
The Intel386 Processor (1985)
The Intel386 processor was the first 32-bit processor in the IA-32 architecture family.
It introduced 32-bit registers
for use both to hold operands and for addressing. The lower half of each 32-bit Intel386
register retains the properties
of the 16-bit registers of earlier generations, permitting backward compatibility.
The processor also provides
a virtual-8086 mode that allows for even greater efficiency when executing programs
created for 8086/8088
processors.
In addition, the Intel386 processor has support for:
• A 32-bit address bus that supports up to 4-GBytes of physical memory
• A segmented-memory model and a flat memory model
• Paging, with a fixed 4-KByte page size providing a method for virtual memory management
• Support for parallel stages
The Intel486 Processor (1989)
The Intel486 processor added more parallel execution capability by expanding the
Intel386 processor’s instruction
decode and execution units into five pipelined stages. Each stage operates in parallel
with the others on up to
five instructions in different stages of execution.
In addition, the processor added:
• An 8-KByte on-chip first-level cache that increased the percent of instructions
that could execute at the scalar
rate of one per clock
• An integrated x87 FPU
• Power saving and system management capabilities
The Intel Pentium Processor (1993)
The introduction of the Intel Pentium processor added a second execution pipeline to achieve superscalar performance
(two pipelines, known as u and v, together can execute two instructions per clock). The on-chip first-level
cache doubled, with 8 KBytes devoted to code and another 8 KBytes devoted to data. The data cache uses the MESI
protocol to support more efficient write-back cache in addition to the write-through cache previously used by the
Intel486 processor. Branch prediction with an on-chip branch table was added to increase performance in looping
constructs.
In addition, the processor added:
• Extensions to make the virtual-8086 mode more efficient and allow for 4-MByte as well as 4-KByte pages
• Internal data paths of 128 and 256 bits add speed to internal data transfers
• Burstable external data bus was increased to 64 bits
• An APIC to support systems with multiple processors
• A dual processor mode to support glueless two processor systems
A subsequent stepping of the Pentium family introduced Intel MMX technology (the Pentium Processor with MMX
technology). Intel MMX technology uses the single-instruction, multiple-data (SIMD) execution model to perform
parallel computations on packed integer data contained in 64-bit registers.
See Section 2.2.7, “SIMD Instructions.”
2.1.6 The P6 Family of Processors (1995-1999)
The P6 family of processors was based on a superscalar microarchitecture that set new performance standards; see
also Section 2.2.1, “P6 Family Microarchitecture.” One of the goals in the design of the P6 family microarchitecture
was to exceed the performance of the Pentium processor significantly while using the same 0.6-micrometer, fourlayer,
metal BICMOS manufacturing process. Members of this family include the following:
• The Intel Pentium Pro processor is three-way superscalar. Using parallel processing techniques, the
processor is able on average to decode, dispatch, and complete execution of (retire) three instructions per
clock cycle. The Pentium Pro introduced the dynamic execution (micro-data flow analysis, out-of-order
execution, superior branch prediction, and speculative execution) in a superscalar implementation. The
processor was further enhanced by its caches. It has the same two on-chip 8-KByte 1st-Level caches as the
Pentium processor and an additional 256-KByte Level 2 cache in the same package as the processor.
• The Intel Pentium II processor added Intel MMX technology to the P6 family processors along with new
packaging and several hardware enhancements. The processor core is packaged in the single edge contact
cartridge (SECC). The Level l data and instruction caches were enlarged to 16 KBytes each, and Level 2 cache
sizes of 256 KBytes, 512 KBytes, and 1 MByte are supported. A half-frequency backside bus connects the Level
2 cache to the processor. Multiple low-power states such as AutoHALT, Stop-Grant, Sleep, and Deep Sleep are
supported to conserve power when idling.
• The Pentium II Xeon processor combined the premium characteristics of previous generations of Intel
processors. This includes: 4-way, 8-way (and up) scalability and a 2 MByte 2nd-Level cache running on a fullfrequency
backside bus.
INTEL 64 AND IA-32 ARCHITECTURES
• The Intel Celeron processor family focused on the value PC market segment. Its introduction offers an
integrated 128 KBytes of Level 2 cache and a plastic pin grid array (P.P.G.A.) form factor to lower system design
cost.
• The Intel Pentium III processor introduced the Streaming SIMD Extensions (SSE) to the IA-32 architecture.
SSE extensions expand the SIMD execution model introduced with the Intel MMX technology by providing a
new set of 128-bit registers and the ability to perform SIMD operations on packed single-precision floatingpoint
values. See Section 2.2.7, “SIMD Instructions.”
• The Pentium III Xeon processor extended the performance levels of the IA-32 processors with the
enhancement of a full-speed, on-die, and Advanced Transfer Cache.
The Intel Pentium 4 Processor Family (2000-2006)
The Intel Pentium 4 processor family is based on Intel NetBurst microarchitecture;
The Intel Pentium 4 processor introduced Streaming SIMD Extensions 2 (SSE2);
The Intel Pentium 4 processor 3.40 GHz, supporting Hyper-Threading Technology introduced Streaming
SIMD Extensions 3 (SSE3); see Section 2.2.7, “SIMD Instructions.”
Intel 64 architecture was introduced in the Intel Pentium 4 Processor Extreme Edition supporting Hyper-Threading
Technology and in the Intel Pentium 4 Processor 6xx and 5xx sequences.
Intel Virtualization Technology (Intel® VT) was introduced in the Intel Pentium 4 processor 672 and 662.
2.1.8 The Intel® Xeon® Processor (2001- 2007)
Intel Xeon processors (with exception for dual-core Intel Xeon processor LV, Intel Xeon processor 5100 series) are
based on the Intel NetBurst microarchitecture; see Section 2.2.2, “Intel NetBurst® Microarchitecture.” As a family,
this group of IA-32 processors (more recently Intel 64 processors) is designed for use in multi-processor server
systems and high-performance workstations.
The Intel Xeon processor MP introduced support for Intel® Hyper-Threading Technology; see Section 2.2.8, “Intel®
Hyper-Threading Technology.”
The 64-bit Intel Xeon processor 3.60 GHz (with an 800 MHz System Bus) was used to introduce Intel 64 architecture.
The Dual-Core Intel Xeon processor includes dual core technology. The Intel Xeon processor 70xx series
includes Intel Virtualization Technology.
The Intel Xeon processor 5100 series introduces power-efficient, high performance Intel Core microarchitecture.
This processor is based on Intel 64 architecture; it includes Intel Virtualization Technology and dual-core technology.
The Intel Xeon processor 3000 series are also based on Intel Core microarchitecture. The Intel Xeon
processor 5300 series introduces four processor cores in a physical package, they are also based on Intel Core
microarchitecture.
The Intel Pentium M Processor (2003-2006)
The Intel Pentium M processor family is a high performance, low power mobile processor family with microarchitectural
enhancements over previous generations of IA-32 Intel mobile processors. This family is designed for
extending battery life and seamless integration with platform innovations that enable new usage models (such as
extended mobility, ultra thin form-factors, and integrated wireless networking).
Its enhanced microarchitecture includes:
• Support for Intel Architecture with Dynamic Execution
• A high performance, low-power core manufactured using Intel’s advanced process technology with copper
interconnect
• On-die, primary 32-KByte instruction cache and 32-KByte write-back data cache
• On-die, second-level cache (up to 2 MByte) with Advanced Transfer Cache Architecture
INTEL 64 AND IA-32 ARCHITECTURES
• Advanced Branch Prediction and Data Prefetch Logic
• Support for MMX technology, Streaming SIMD instructions, and the SSE2 instruction set
• A 400 or 533 MHz, Source-Synchronous Processor System Bus
• Advanced power management using Enhanced Intel SpeedStep technology
2.1.10 The Intel Pentium Processor Extreme Edition (2005)
The Intel Pentium processor Extreme Edition introduced dual-core technology. This technology provides advanced
hardware multi-threading support. The processor is based on Intel NetBurst microarchitecture and supports SSE,
SSE2, SSE3, Hyper-Threading Technology, and Intel 64 architecture.
The Intel Core Duo and Intel Core Solo Processors (2006-2007)
The Intel Core Duo processor offers power-efficient, dual-core performance with a low-power design that extends
battery life. This family and the single-core Intel Core Solo processor offer microarchitectural enhancements over
Pentium M processor family.
Its enhanced microarchitecture includes:
• Intel® Smart Cache which allows for efficient data sharing between two processor cores
• Improved decoding and SIMD execution
• Intel® Dynamic Power Coordination and Enhanced Intel® Deeper Sleep to reduce power consumption
• Intel® Advanced Thermal Manager which features digital thermal sensor interfaces
• Support for power-optimized 667 MHz bus
The dual-core Intel Xeon processor LV is based on the same microarchitecture as Intel Core Duo processor, and
supports IA-32 architecture.
The Intel Xeon Processor 5100, 5300 Series and Intel Core 2 Processor Family
(2006)
The Intel Xeon processor 3000, 3200, 5100, 5300, and 7300 series, Intel Pentium Dual-Core, Intel Core 2 Extreme,
Intel Core 2 Quad processors, and Intel Core 2 Duo processor family support Intel 64 architecture; they are based
on the high-performance, power-efficient Intel® Core microarchitecture built on 65 nm process technology. The
Intel Core microarchitecture includes the following innovative features:
• Intel Wide Dynamic Execution to increase performance and execution throughput
• Intel Intelligent Power Capability to reduce power consumption
• Intel Advanced Smart Cache which allows for efficient data sharing between two processor cores
• Intel Smart Memory Access to increase data bandwidth and hide latency of memory accesses
• Intel Advanced Digital Media Boost which improves application performance using multiple generations of
Streaming SIMD extensions
The Intel Xeon processor 5300 series, Intel Core 2 Extreme processor QX6800 series, and Intel Core 2 Quad
processors support Intel quad-core technology.
INTEL 64 AND IA-32 ARCHITECTURES
The Intel Xeon Processor 5200, 5400, 7400 Series and Intel® Core™2 Processor
Family (2007)
The Intel Xeon processor 5200, 5400, and 7400 series, Intel Core 2 Quad processor Q9000 Series, Intel Core 2 Duo
processor E8000 series support Intel 64 architecture; they are based on the Enhanced Intel® Core microarchitecture
using 45 nm process technology. The Enhanced Intel Core microarchitecture provides the following improved
features:
• A radix-16 divider, faster OS primitives further increases the performance of Intel® Wide Dynamic Execution.
• Improves Intel® Advanced Smart Cache with Up to 50% larger level-two cache and up to 50% increase in wayset
associativity.
• A 128-bit shuffler engine significantly improves the performance of Intel® Advanced Digital Media Boost and
SSE4.
Intel Xeon processor 5400 series and Intel Core 2 Quad processor Q9000 Series support Intel quad-core technology.
Intel Xeon processor 7400 series offers up to six processor cores and an L3 cache up to 16 MBytes.
2.1.14 The Intel® Atom™ Processor Family (2008)
The first generation of Intel® AtomTM processors are built on 45 nm process technology. They are based on a new
microarchitecture, Intel® AtomTM microarchitecture, which is optimized for ultra low power devices. The Intel®
AtomTM microarchitecture features two in-order execution pipelines that minimize power consumption, increase
battery life, and enable ultra-small form factors. The initial Intel Atom Processor family and subsequent generations including
Intel Atom processor D2000, N2000, E2000, Z2000, C1000 series provide the following features:
• Enhanced Intel® SpeedStep® Technology
• Intel® Hyper-Threading Technology
• Deep Power Down Technology with Dynamic Cache Sizing
• Support for instruction set extensions up to and including Supplemental Streaming SIMD Extensions 3
(SSSE3).
• Support for Intel® Virtualization Technology
• Support for Intel® 64 Architecture (excluding Intel Atom processor Z5xx Series)
2.1.15 The Intel® Atom™ Processor Family Based on Silvermont Microarchitecture (2013)
Intel Atom Processor C2xxx, E3xxx, S1xxx series are based on the Silvermont microarchitecture. Processors based on the Silvermont
microarchitecture supports instruction set extensions up to and including SSE4.2, AESNI, and PCLMULQDQ.
2.1.16 The Intel® Core™i7 Processor Family (2008)
The Intel Core i7 processor 900 series support Intel 64 architecture; they are based on Intel® microarchitecture
code name Nehalem using 45 nm process technology. The Intel Core i7 processor and Intel Xeon processor 5500
series include the following innovative features:
• Intel® Turbo Boost Technology converts thermal headroom into higher performance.
• Intel® HyperThreading Technology in conjunction with Quadcore to provide four cores and eight threads.
• Dedicated power control unit to reduce active and idle power consumption.
• Integrated memory controller on the processor supporting three channel of DDR3 memory.
• 8 MB inclusive Intel® Smart Cache.
• Intel® QuickPath interconnect (QPI) providing point-to-point link to chipset.
• Support for SSE4.2 and SSE4.1 instruction sets.
• Second generation Intel Virtualization Technology.
INTEL 64 AND IA-32 ARCHITECTURES
2.1.17 The Intel Xeon Processor 7500 Series (2010)
The Intel Xeon processor 7500 and 6500 series are based on Intel microarchitecture code name Nehalem using 45
nm process technology. They support the same features described in Section 2.1.16, plus the following innovative
features:
• Up to eight cores per physical processor package.
• Up to 24 MB inclusive Intel® Smart Cache.
• Provides Intel® Scalable Memory Interconnect (Intel® SMI) channels with Intel® 7500 Scalable Memory Buffer
to connect to system memory.
• Advanced RAS supporting software recoverable machine check architecture.
2.1.18 2010 Intel® Core™ Processor Family (2010)
2010 Intel Core processor family spans Intel Core i7, i5 and i3 processors. They are based on Intel® microarchitecture
code name Westmere using 32 nm process technology. The innovative features can include:
• Deliver smart performance using Intel Hyper-Threading Technology plus Intel Turbo Boost Technology.
• Enhanced Intel Smart Cache and integrated memory controller.
• Intelligent power gating.
• Repartitioned platform with on-die integration of 45 nm integrated graphics.
• Range of instruction set support up to AESNI, PCLMULQDQ, SSE4.2 and SSE4.1.
2.1.19 The Intel® Xeon® Processor 5600 Series (2010)
The Intel Xeon processor 5600 series are based on Intel microarchitecture code name Westmere using 32 nm
process technology. They support the same features described in Section 2.1.16, plus the following innovative
features:
• Up to six cores per physical processor package.
• Up to 12 MB enhanced Intel® Smart Cache.
• Support for AESNI, PCLMULQDQ, SSE4.2 and SSE4.1 instruction sets.
• Flexible Intel Virtualization Technologies across processor and I/O.
2.1.20 The Second Generation Intel® Core™ Processor Family (2011)
The Second Generation Intel Core processor family spans Intel Core i7, i5 and i3 processors based on the Sandy
Bridge microarchitecture. They are built from 32 nm process technology and have innovative features including:
• Intel Turbo Boost Technology for Intel Core i5 and i7 processors
• Intel Hyper-Threading Technology.
• Enhanced Intel Smart Cache and integrated memory controller.
• Processor graphics and built-in visual features like Intel® Quick Sync Video, Intel® InsiderTM etc.
• Range of instruction set support up to AVX, AESNI, PCLMULQDQ, SSE4.2 and SSE4.1.
Intel Xeon processor E3-1200 product family is also based on the Sandy Bridge microarchitecture.
Intel Xeon processor E5-2400/1400 product families are based on the Sandy Bridge-EP microarchitecture.
Intel Xeon processor E5-4600/2600/1600 product families are based on the Sandy Bridge-EP microarchitecture
and provide support for multiple sockets.
INTEL 64 AND IA-32 ARCHITECTURES
The Third Generation Intel Core Processor Family (2012)
The Third Generation Intel Core processor family spans Intel Core i7, i5 and i3 processors based on the Ivy Bridge
microarchitecture. The Intel Xeon processor E7-8800/4800/2800 v2 product families and Intel Xeon processor E3-
1200 v2 product family are also based on the Ivy Bridge microarchitecture.
The Intel Xeon processor E5-2400/1400 v2 product families are based on the Ivy Bridge-EP microarchitecture.
The Intel Xeon processor E5-4600/2600/1600 v2 product families are based on the Ivy Bridge-EP microarchitecture
and provide support for multiple sockets.
The Fourth Generation Intel Core Processor Family (2013)
The Fourth Generation Intel Core processor family spans Intel Core i7, i5 and i3 processors based on the Haswell
microarchitecture. Intel Xeon processor E3-1200 v3 product family is also based on the Haswell microarchitecture.
MORE ON SPECIFIC ADVANCES
The following sections provide more information on major innovations.
P6 Family Microarchitecture
The Pentium Pro processor introduced a new microarchitecture commonly referred to as P6 processor microarchitecture.
The P6 processor microarchitecture was later enhanced with an on-die, Level 2 cache, called Advanced
Transfer Cache.
The microarchitecture is a three-way superscalar, pipelined architecture. Three-way superscalar means that by
using parallel processing techniques, the processor is able on average to decode, dispatch, and complete execution
of (retire) three instructions per clock cycle. To handle this level of instruction throughput, the P6 processor family
uses a decoupled, 12-stage superpipeline that supports out-of-order instruction execution.
next figure shows a conceptual view of the P6 processor microarchitecture pipeline with the Advanced Transfer
Cache enhancement.
To ensure a steady supply of instructions and data for the instruction execution pipeline, the P6 processor microarchitecture
incorporates two cache levels. The Level 1 cache provides an 8-KByte instruction cache and an 8-KByte
data cache, both closely coupled to the pipeline. The Level 2 cache provides 256-KByte, 512-KByte, or 1-MByte
static RAM that is coupled to the core processor through a full clock-speed 64-bit cache bus.
The centerpiece of the P6 processor microarchitecture is an out-of-order execution mechanism called dynamic
execution. Dynamic execution incorporates three data-processing concepts:
• Deep branch prediction allows the processor to decode instructions beyond branches to keep the instruction
pipeline full. The P6 processor family implements highly optimized branch prediction algorithms to predict the
direction of the instruction.
• Dynamic data flow analysis requires real-time analysis of the flow of data through the processor to
determine dependencies and to detect opportunities for out-of-order instruction execution. The out-of-order
execution core can monitor many instructions and execute these instructions in the order that best optimizes
the use of the processor’s multiple execution units, while maintaining the data integrity.
• Speculative execution refers to the processor’s ability to execute instructions that lie beyond a conditional
branch that has not yet been resolved, and ultimately to commit the results in the order of the original
instruction stream. To make speculative execution possible, the P6 processor microarchitecture decouples the
dispatch and execution of instructions from the commitment of results. The processor’s out-of-order execution
core uses data-flow analysis to execute all available instructions in the instruction pool and store the results in
temporary registers. The retirement unit then linearly searches the instruction pool for completed instructions
that no longer have data dependencies with other instructions or unresolved branch predictions. When
completed instructions are found, the retirement unit commits the results of these instructions to memory
and/or the IA-32 registers (the processor’s eight general-purpose registers and eight x87 FPU data registers)
in the order they were originally issued and retires the instructions from the instruction pool.
Home Page