ARM7 32-bit MCUs are well known and widely used in embedded applications.
So what future is there for the 16-bit MCU? |
READ THIS TO FIND OUT ABOUT:
 |
- The operational differences between Harvard and von Neumann architectures.
- How to evaluate MCUs based on application requirements.
|
 |
|
|
Specifying a high-end microcontroller can look simple: the price of 32-bit microcontrollers has fallen very close to the price of high-end 8-bit microcontrollers. The 32-bit devices offer higher clock speeds, support larger memories and provide more I/O than the 8-bit devices. So surely it is obvious that the designer looking for more performance than their 8-bit device can offer should migrate straight to a 32-bit micro? Gidi Mizrahi, B.Eng, Field Applications Engineer of Future Electronics (Israel) explains.
In fact, the tried and tested 16-bit microcontroller has certain advantages over
a 32-bit device. A 16-bit MCU will occupy a similar price point to many 32-bit
MCUs – devices using the ARM7 core are very popular. But embedded designs
typically require deterministic execution of code, small footprint, and ease of
software design. In some cases, a 16-bit device is better able to offer these
characteristics. So how to choose? The designer’s evaluation should start with
an examination of the rival architectures.
The most useful comparison to make is between the Harvard architecture
most often used by 16-bit MCUs, and the von Neumann architecture used by
ARM7 devices.
How the different architectures execute instructions
The von Neumann architecture used by the ARM7 core, named after the
mathematician and early computer scientist John von Neumann, was
originally developed for use in computers. Its distinctive feature is that it uses a
single storage structure to hold both programme memory and data memory
(see Figure 1).
The 16-bit Harvard architecture has a crucial difference from the von
Neumann architecture, in that it has separate memory spaces for programme
memory and data memory (see Figure 2). A 16-bit RISC CPU core can often
have a wider program memory bus, and one or two 16-bit data buses.
Even a cursory comparison of Figures 1 and 2 reveals one obvious
advantage of the Harvard architecture: the separate data and programme
buses allow simultaneous access of both programme memory and data
memory. Since one bus never has to wait while the other uses the majority of
its bus, faster and more deterministic execution is often possible. This is
particularly valuable in applications that are rich in single-cycle, single-word
instructions.
So, for instance, some 16-bit MCUs operate at full speed from on-board
Flash (at up to 40MHz). This high operating frequency, efficiently used by the
internal circuitry of the devices, provides the deterministic performance
expected by control engineers.
By contrast, in the ARM7 core the separation between the CPU and memory
can lead to a situation known as the ‘von Neumann bottleneck’. Under some
circumstances, when the CPU is required to perform minimal processing on
large amounts of data, this gives rise to a serious limitation in effective
processing speed. This is because the CPU is constantly waiting for vital data to
be transferred to or from memory. Interestingly, the bottleneck has the
potential to become tighter the higher the CPU operating frequency rises and
the bigger the memory grows.
Suppliers of ARM7 devices have worked hard to mitigate this inherent
weakness. NXP, for instance, in its LPC2000 family provides a Memory
Accelerator Module (MAM). This is a CMOS Flash memory that is 128 bits wide.
One fetch reads four 32-bit words at a time. The devices can also implement a
complex fetching sequence that uses multiple buffers to speculatively prefetch
and store one batch of data or instructions while the CPU is still
executing a previous batch.
The main purpose of this complex scheme is to prevent a branch or data
access from stalling the processor, especially during real-time operations.
Nevertheless, this effort by NXP to work around the von Neumann architecture
produces branches that break up the sequence of code execution and
requires the constant flushing and re-filling of the pre-fetch buffers. This
consumes clock cycles and slows down code execution.
At this stage, then, it could seem as though the 16-bit device is a clear
winner. But it is not so simple. For a start, ARM7 devices can mitigate their
congested architecture by driving traffic through at higher frequencies. While
typical 16-bit MCUs operate at 40MIPS CPU speed, NXP’s LPC2100 ARM7 family
is quoted as offering CPU speeds up to 72MHz. To use an analogy, the Harvard
device is like a wide road that accommodates more traffic; the ARM device
might have a narrower road, but each vehicle is moving a whole lot faster than
in the Harvard device. Indeed, suppliers of ARM7 devices can always find
performance tests in which their device executes code faster than a
comparable Harvard-architecture 16-bit device, and vice versa.
Furthermore, the Harvard architecture in Figure 2 uses two different busses,
one for data and one for programme memory. This architecture is far from easy
to implement in silicon. The difficulty of both designing and manufacturing
Harvard-architecture devices is reflected in higher-priced silicon or a less
abundant feature set.
The lack of competitiveness of 16-bit devices in terms of price and features is
not their only drawback. They are also constrained in the amount of memory
that they can access. At best, a 16-bit device today can address 256kB of Flash.
The roadmaps of some 16-bit manufacturers envisage devices offering 512kB
of Flash, but that is yet to be delivered in working silicon.
There is one other important area in which to draw a comparison between
16-bit Harvard devices and ARM7-based 32-bit MCUs: code efficiency. The
question of how efficiently a device compiles code is highly dependent on
many variables, not least the quality of the compiler. In addition, ARM7 devices
can be operated in Thumb mode, in which instructions are compressed to
16 bits wide to save on memory footprint.
Nevertheless, in almost all cases the compiled code for a 16-bit MCU will be
slightly smaller than comparable instructions implemented in an ARM7 device.
It is worth mentioning that the competitive strength of ARM-based devices
could be set to grow in the near future with the release of MCUs based on the
company’s Cortex core. In June 2007 STMicroelectronics was the first large
silicon vendor to announce an ARM Cortex MCU.
Interesting claims are already being made for the Cortex core, which breaks
from the ARM7 mould by adopting a Harvard architecture. It is said to be fast
and to offer considerably lower power consumption than the ARM7. It also
implements a new 16-bit Thumb 2 instruction set which produces much
smaller compiled code than the ARM7 even when used in Thumb mode.
But with ST’s Cortex device only recently introduced to the market, it is early
to be making definitive judgements about the comparative benefits of ARM
Cortex versus either traditional 16-bit devices or the ARM7 core.
So at least in the coming months, the main battle will continue to be
between traditional 16-bit devices and the ARM7 core. As the two contrasting
products described below demonstrate, the designer’s choice will generally be
determined by the needs of the application.
Home alarm control panel illustration
The first application to illustrate the comparsion is a control panel for a home
alarm, in the form of a touch-sensitive colour LCD panel. This design requires a
large memory (to save the graphics data for the LCD). It also needs a robust
communication interface to the host controller.
An ARM7 MCU, such as a device from NXP’s LPC24xx family, would be ideal
for this application. First, it has many important features integrated into the
chip, including Ethernet, USB host, CAN bus, four UARTs and an LCD driver that
can drive LCDs up to 1024 x 768 pixels. Such a rich feature set will not be found
on any 16-bit device due to the difficulty of designing and manufacturing
such a device in the Harvard architecture. In office buildings, an Ethernet
infrastructure will already exist, so this will provide a route back to the host
controller. For residential installations, the device provides an RS-485 link using
one of the UART or CAN bus interfaces. The LPC24xx family also offers 512kB
of on-board Flash and 98kB of onboard SRAM – enough to support the large
memory requirements of an LCD driver.
Wireless smart detector
The second illustration offers a contrast: here, a powerful MCU will be required
to perform intensive calculations, including Fast Fourier Transforms (FFT), to
process algorithms quickly and communicate back to the main system
controller before going into Sleep mode. The smart detector’s battery should
be capable of lasting five years.
ARM7 MCUs typically have higher power consumption than that offered by
comparable 16-bit processors – a fact which makes the 16-bit processors more
appropriate for battery-powered operation. In particular, their peripherals can
be driven at variable voltages and at variable clock speeds. This means the
design engineer can use the same MCU to perform interval measurements, to
wake up from Sleep mode within 10µs, and go into high-speed operational
mode to execute algorithms fast if it detects a change from a sensor. This
whole routine of waking up, executing code and going back to sleep can be
accomplished in as little as 150µs. This is at least eight times faster than an
ARM7 device, which has a minimum wake-up time of more than 1ms.
Crucially, with a 16-bit Harvard MCU the software execution time is
deterministic and predictable. This means that the software engineer can lower
the clock speed after code execution to save battery power, send the data to
the system controller and be confident that the data will emerge from the
MCU at the right speed and in the right format. The ST ARM Cortex 32-bit
MCU, however, might soon be a disruptive technology in this space as a move
to a Harvard architecture, with its low power consumption and fast start-up
from Sleep, might rewrite the current rulebook.