 |
READ THIS TO FIND OUT ABOUT:
 |
- Multi-core processors
- High-performance embedded design
- High-speed cache-memory architectures
|
 |
|
|
While multi-core processors from Intel and AMD are found in
embedded applications, in many cases these devices, which
are optimised for desktop or server computing, consume too
much power for industrial embedded designs. Richard Parker,
FAE, Future Electronics (UK) explains.
Intel and AMD continue to evolve the x86 processor architecture to
support multi-core computing applications. Freescale Semiconductor, on
the other hand, is pursuing a different architectural track, seeking to
optimise multi-core processing for embedded systems.
From the most fundamental design decisions onwards, Freescale’s focus
on multi-core processor development has been driven by the embedded
world’s requirement for low power and high data throughput.
For instance, before the introduction of multi-core processors,
Freescale’s most powerful family of devices was the single-core PowerPC
host processors. The core of these devices was used in the dual-core
MPC8641D processors which provide the e600 platform. A number of
devices in the company’s multi-core family are, however, to be based on
the e500 core as used in the PowerQUICC™ III range of processors.
Although in their single-core guise they offer lower performance than the
e600 for a number of reasons, e500-based devices offer specific
advantages that support operation at low power while delivering the
higher levels of data throughput offered by multi-core devices.
These advantages stem from three aspects of the processors’
development: First, the nature of the e500mc core on which Freescale is
building its multi-core processors; second, the rich peripheral set which
the processors support; and third, hardware assistance for general
functions which are commonly required in high-end embedded
applications.

Fig. 1: Functional block diagram of Freescale’s multi-core platform.
Architecture of the e500mc core
Freescale’s multi-core processors are based on the e500 cores used in the
PowerQUICC III family. In the dual-core MPC8572 family, these cores run at
up to 1.5GHz.
In multi-core processors with four or more cores, these core
architectures have been re-configured to create the new e500mc
platform, where mc denotes multi-core. Two important characteristics
differentiate the e500mc from the e500 core: process technology and
cache topology (see Figure 1).
Freescale has collaborated with IBM to create its e500mc devices using
an advanced 45nm process geometry, enabling improvements in
performance, power consumption and integration. In addition, a modified
cache topology delivers further performance gains. The e500mc has a
multi-level cache-coherent hierarchy, which allows each of the cores to
have its own L1 and L2 cache, while also providing a large, shared L3
cache. Also, attaching the L2 cache to the cores in a back-side cache
implementation dramatically improves performance in certain processing
modes.
At this point, it is worth clarifying the definitions of the two types
of cache memory: A front-side cache is placed between the main
external memory bus and the core. This means that it is easily filled
and copied back to main memory, but only runs at the speed of the
main memory. It also means that peripherals attached to the bus can
access this memory directly. Alternatively, back-side cache is placed
on a separate bus and the core has direct access to it.
This results in better code efficiency and enhanced speed since the
back-side cache can run at the same speed as the core, rather than
being limited to the main memory’s clock speed.
Operating with a back-side cache topology, Freescale’s multi-core
devices allow the cache to match the speed of the CPU. This typically
results in latency improvements of more than 50% compared to
shared-bus or shared-cache architectures.
This cache topology offers further advantages: The caches are
programmable, which allows partitioning of instructions and data to be
optimised for each application. The back-side cache implementation
also reduces traffic loading, which in turn reduces latency and improves
the performance.
Of course, multi-core architectures that adopt a front-side cache do
so for sound reasons. One of these reasons is the availability of shared
data structures, such as data stashing. This is where data from an onboard
peripheral, such as an Ethernet port, is stored directly to the
cache memory area before the core has actually requested it.
If each core in the Freescale architecture only had its own L1 and L2
cache, this would not be possible. But the new, shared L3 cache in the
e500mc architecture enables data stashing. This means that the
Freescale architecture offers the best of both worlds: reduction in
latency because each core has its own cache, as well as more efficient
use of processing resources through use of a shared data structure.

Fig. 2: Block diagram of the QorOQTM P4080.
A Freescale innovation: CoreNet
Freescale’s new multi-core architecture delivers a further important
enhancement. In a complex embedded-processor architecture, it is
essential that different elements of the device communicate effectively
with each other and with the outside world. The traditional approach
has been to use a shared bus, to which all internal devices are attached.
The problem with this approach is the risk of contention on the bus.
Such contention slows down the effective operation of the processor.
Freescale has developed a coherent fabric approach, called
CoreNet™, to solve this problem (see Figure 2). CoreNet can easily
accommodate up to 32 cores as well as heterogeneous core
implementations.
It can be described as a highly-concurrent, fully cache-coherent
multi-port fabric. Using point-to-point connectivity between the blocks,
and a flexible protocol, enables a pipelined interconnection between
CPUs, platform caches, memory controllers and I/O accelerators. This
then allows for lower average memory latency, because CoreNet
eliminates address retries normally triggered when the CPU tries to
snoop a shared bus. In addition, the implementation of a switching
topology provides multiple high-bandwidth address paths. This
produces a high address bandwidth – a key requirement for coherent
multi-core processor systems.
System-on-chip or processor?
Freescale, then, has introduced architectural innovations that support
high data throughput while keeping a lid on power consumption. But
embedded developers also need highly integrated devices.
Processors designed for the desktop computer do not need to
accommodate many peripherals, and the types of peripherals to be
supported are not diverse. This is not true in the embedded world,
where it makes more sense to consider the processor as a System-on-Chip
(SoC). In general these devices will include a memory controller,
serial ports and bussing options. With more peripheral and inter-process
communication options supported on-chip, the design team
can increase integration, shorten the design process and reduce
board real-estate.
Freescale has a history of developing feature-rich, SoC-type
processors optimised for the embedded world. A good example is the
MPC8572 dual-core processor, which uses e500 cores. Its on-chip
features include:
- DDR memory controller
- Four Gigabit Ethernet controllers
- x4 Serial RapidIO interface
- PCI Express interface
- Dual integrated DMA controller
- Dual I2C interface and dual UARTs
- General-purpose I/O
Users should expect to see this level of integration extended in
Freescale’s multi-core devices, as the 45nm process used in the
e500mc releases die area for additional functions.
On-demand applications support embedded requirements
The optimisation of Freescale’s multi-core devices for embedded designs
is also evident in additional hardware functions that assist certain general-purpose
applications. The most important of these are in security and
networking.
For instance, Freescale’s multi-core devices will share with the
PowerQUICC III family a Security Engine Controller (SEC) block. This block
off-loads encryption functions such as AES, DES and 3DES. This means
that developers can deploy complex and very secure encryption
algorithms without placing a large processing overhead on the cores.
Other hardware-assist functions supported by Freescale’s multi-core
devices will include:
- Table Look Up (TLU) – a type of hash table that can be used to match
IP addresses. This can accelerate networking performance and reduce
loading on the processor cores.
- Pattern Matching Engine (PME) – used for deep packet inspection
and full content processing. It is capable of matching 16,000 patterns
of up to 128 bytes in length with breaking across packets. An
additional feature of the PME is a deflate function, which can be used
to off-load decompression of a compressed file.
- Data-path acceleration – this function manages packet routing,
security, quality of service and deep packet inspection, freeing the
cores to focus on value-added services and application processing.
In addition, support for memory-buffer allocation by resources is
provided in hardware by the multi-core platform.
Finally, robust hardware-partitioning mechanisms minimise
instances of resource contention, aiding code migration and
verification. The multi-core platform provides hardware mechanisms
to ensure the cores can only access the resources they are authorised
to access. An I/O MMU is used to enforce access controls on bus-mastering
peripherals.
Conclusion
The evaluation that an embedded designer will make of competing
processors will focus on specific, detailed attributes such as data
throughput, power consumption and cost.
But an understanding of the high-level differences between the
Freescale multi-core devices and their x86 rivals will help to confirm
the conclusions that such an evaluation will produce. In short,
Freescale has optimised its multi-core architecture for the embedded
world in a way that is difficult for manufacturers of desktop computer
processors to do: its devices integrate more of the peripheral
functions which embedded developers care about. In particular low-power
cores can hit tighter power budgets; and by implementing
optimised cache and inter-process communications techniques are
able to deliver high throughput rates.
Intel and AMD will continue to make waves in embedded
applications because of their devices’ support for Windows-based
software. But for the many embedded developers who run highly
specialised or proprietary software, Freescale’s multi-core
processors will offer compelling advantages.