Previous Page   Next Page

 




READ THIS TO FIND OUT ABOUT:
  • Multi-core processors
  • High-performance embedded design
  • High-speed cache-memory architectures


While multi-core processors from Intel and AMD are found in embedded applications, in many cases these devices, which are optimised for desktop or server computing, consume too much power for industrial embedded designs. Richard Parker, FAE, Future Electronics (UK) explains.

Intel and AMD continue to evolve the x86 processor architecture to support multi-core computing applications. Freescale Semiconductor, on the other hand, is pursuing a different architectural track, seeking to optimise multi-core processing for embedded systems.

From the most fundamental design decisions onwards, Freescale’s focus on multi-core processor development has been driven by the embedded world’s requirement for low power and high data throughput.

For instance, before the introduction of multi-core processors, Freescale’s most powerful family of devices was the single-core PowerPC host processors. The core of these devices was used in the dual-core MPC8641D processors which provide the e600 platform. A number of devices in the company’s multi-core family are, however, to be based on the e500 core as used in the PowerQUICC™ III range of processors. Although in their single-core guise they offer lower performance than the e600 for a number of reasons, e500-based devices offer specific advantages that support operation at low power while delivering the higher levels of data throughput offered by multi-core devices.

These advantages stem from three aspects of the processors’ development: First, the nature of the e500mc core on which Freescale is building its multi-core processors; second, the rich peripheral set which the processors support; and third, hardware assistance for general functions which are commonly required in high-end embedded applications.

 


Fig. 1: Functional block diagram of Freescale’s multi-core platform.

 

Architecture of the e500mc core

Freescale’s multi-core processors are based on the e500 cores used in the PowerQUICC III family. In the dual-core MPC8572 family, these cores run at up to 1.5GHz.

In multi-core processors with four or more cores, these core architectures have been re-configured to create the new e500mc platform, where mc denotes multi-core. Two important characteristics differentiate the e500mc from the e500 core: process technology and cache topology (see Figure 1).

Freescale has collaborated with IBM to create its e500mc devices using an advanced 45nm process geometry, enabling improvements in performance, power consumption and integration. In addition, a modified cache topology delivers further performance gains. The e500mc has a multi-level cache-coherent hierarchy, which allows each of the cores to have its own L1 and L2 cache, while also providing a large, shared L3 cache. Also, attaching the L2 cache to the cores in a back-side cache implementation dramatically improves performance in certain processing modes.

At this point, it is worth clarifying the definitions of the two types of cache memory: A front-side cache is placed between the main external memory bus and the core. This means that it is easily filled and copied back to main memory, but only runs at the speed of the main memory. It also means that peripherals attached to the bus can access this memory directly. Alternatively, back-side cache is placed on a separate bus and the core has direct access to it.

This results in better code efficiency and enhanced speed since the back-side cache can run at the same speed as the core, rather than being limited to the main memory’s clock speed.

Operating with a back-side cache topology, Freescale’s multi-core devices allow the cache to match the speed of the CPU. This typically results in latency improvements of more than 50% compared to shared-bus or shared-cache architectures.

This cache topology offers further advantages: The caches are programmable, which allows partitioning of instructions and data to be optimised for each application. The back-side cache implementation also reduces traffic loading, which in turn reduces latency and improves the performance.

Of course, multi-core architectures that adopt a front-side cache do so for sound reasons. One of these reasons is the availability of shared data structures, such as data stashing. This is where data from an onboard peripheral, such as an Ethernet port, is stored directly to the cache memory area before the core has actually requested it.

If each core in the Freescale architecture only had its own L1 and L2 cache, this would not be possible. But the new, shared L3 cache in the e500mc architecture enables data stashing. This means that the Freescale architecture offers the best of both worlds: reduction in latency because each core has its own cache, as well as more efficient use of processing resources through use of a shared data structure.

 


Fig. 2: Block diagram of the QorOQTM P4080.

 

A Freescale innovation: CoreNet

Freescale’s new multi-core architecture delivers a further important enhancement. In a complex embedded-processor architecture, it is essential that different elements of the device communicate effectively with each other and with the outside world. The traditional approach has been to use a shared bus, to which all internal devices are attached. The problem with this approach is the risk of contention on the bus. Such contention slows down the effective operation of the processor.

Freescale has developed a coherent fabric approach, called CoreNet™, to solve this problem (see Figure 2). CoreNet can easily accommodate up to 32 cores as well as heterogeneous core implementations.

It can be described as a highly-concurrent, fully cache-coherent multi-port fabric. Using point-to-point connectivity between the blocks, and a flexible protocol, enables a pipelined interconnection between CPUs, platform caches, memory controllers and I/O accelerators. This then allows for lower average memory latency, because CoreNet eliminates address retries normally triggered when the CPU tries to snoop a shared bus. In addition, the implementation of a switching topology provides multiple high-bandwidth address paths. This produces a high address bandwidth – a key requirement for coherent multi-core processor systems.

 

System-on-chip or processor?

Freescale, then, has introduced architectural innovations that support high data throughput while keeping a lid on power consumption. But embedded developers also need highly integrated devices.

Processors designed for the desktop computer do not need to accommodate many peripherals, and the types of peripherals to be supported are not diverse. This is not true in the embedded world, where it makes more sense to consider the processor as a System-on-Chip (SoC). In general these devices will include a memory controller, serial ports and bussing options. With more peripheral and inter-process communication options supported on-chip, the design team can increase integration, shorten the design process and reduce board real-estate.

Freescale has a history of developing feature-rich, SoC-type processors optimised for the embedded world. A good example is the MPC8572 dual-core processor, which uses e500 cores. Its on-chip features include:


  • DDR memory controller
  • Four Gigabit Ethernet controllers
  • x4 Serial RapidIO interface
  • PCI Express interface
  • Dual integrated DMA controller
  • Dual I2C interface and dual UARTs
  • General-purpose I/O

Users should expect to see this level of integration extended in Freescale’s multi-core devices, as the 45nm process used in the e500mc releases die area for additional functions.

 

On-demand applications support embedded requirements

The optimisation of Freescale’s multi-core devices for embedded designs is also evident in additional hardware functions that assist certain general-purpose applications. The most important of these are in security and networking.

For instance, Freescale’s multi-core devices will share with the PowerQUICC III family a Security Engine Controller (SEC) block. This block off-loads encryption functions such as AES, DES and 3DES. This means that developers can deploy complex and very secure encryption algorithms without placing a large processing overhead on the cores.

Other hardware-assist functions supported by Freescale’s multi-core devices will include:


  • Table Look Up (TLU) – a type of hash table that can be used to match IP addresses. This can accelerate networking performance and reduce loading on the processor cores.
  • Pattern Matching Engine (PME) – used for deep packet inspection and full content processing. It is capable of matching 16,000 patterns of up to 128 bytes in length with breaking across packets. An additional feature of the PME is a deflate function, which can be used to off-load decompression of a compressed file.
  • Data-path acceleration – this function manages packet routing, security, quality of service and deep packet inspection, freeing the cores to focus on value-added services and application processing.

In addition, support for memory-buffer allocation by resources is provided in hardware by the multi-core platform.

Finally, robust hardware-partitioning mechanisms minimise instances of resource contention, aiding code migration and verification. The multi-core platform provides hardware mechanisms to ensure the cores can only access the resources they are authorised to access. An I/O MMU is used to enforce access controls on bus-mastering peripherals.

 

Conclusion

The evaluation that an embedded designer will make of competing processors will focus on specific, detailed attributes such as data throughput, power consumption and cost.

But an understanding of the high-level differences between the Freescale multi-core devices and their x86 rivals will help to confirm the conclusions that such an evaluation will produce. In short, Freescale has optimised its multi-core architecture for the embedded world in a way that is difficult for manufacturers of desktop computer processors to do: its devices integrate more of the peripheral functions which embedded developers care about. In particular low-power cores can hit tighter power budgets; and by implementing optimised cache and inter-process communications techniques are able to deliver high throughput rates.

Intel and AMD will continue to make waves in embedded applications because of their devices’ support for Windows-based software. But for the many embedded developers who run highly specialised or proprietary software, Freescale’s multi-core processors will offer compelling advantages.

 

  Future / TAP

 

 

 

Previous Page
Terms of Use  |  Privacy Policy
© 2012 Future Electronics. All rights reserved.

Next Page