
Squeezing more from an 8051 architecture*
A performance comparison between a standard 8051 MCU and Ramtron's VersaMix
VMX51C1020 and the Versa VRS51L2xx
|  |
For many designers, the ability to squeeze greater performance and
flexibility out of their by existing 8051-based systems simply by upgrading the
microcontroller and implementing minor design changes rather than getting
involved with costly investments in new architectures, code and development
environments, might be somewhat of a godsend. This was precisely the development
objective behind Ramtron's most recent additions to its Versa family of high
performance 8051-based microcontrollers - the previewed at last month's Embedded
Systems Conference.
Both the VMX51C1020 and VRS51L2xxx
microcontrollers are high performance 8051-based devices with a high level of
integration.
The VMX51C1020 is a single-chip, mixed-signal
microcontroller solution for a diverse range of signal conditioning, data
acquisition, processing and control applications in the industrial, medical,
consumer, instrumentation and automotive markets. Its broad set of digital and
analogue peripherals provides a competitive advantage by, minimising board size
and assembly costs.
 |
The VRS51L2xxx is the first member of a new family of
advanced Versa 8051 devices that bring integration and performance to even
higher levels. Based on a state-of-the-art CMOS process, the
VRS51L2xxx can operate at 40MHz and can achieve up to 40 MIPS
of processing power. The device performs up to 12 times faster than standard
8051s, muscling into 16-bit MCU. As it is not based on a pipelined architecture,
it eliminates latency typically caused by pipeline processors when they jump to
different sections of code. In addition, its comprehensive set of highly
configurable digital peripherals eases the load on the processor.
|
Unlike many 8051 devices that require 12, 6 or in some case 4 oscillator
cycles per system clock cycle, the clock system of the
VMX51C1020 and the VRS51L2xxx is directly
connected to the device's oscillator, ensuring that one oscillator cycle
translates to one system clock cycle.
Performance comparisons
To demonstrate the performance advantages achieved by the
VMX51C1020 and VRS51L2xxx over the standard
8051 architecture, a simple comparison was undertaken. Of course, there are many
ways to compare processor performance and every method is likely to provide
different results. For the purpose of this article, a 16 Taps FIR filter
computation loop including data shifting operation was used as the comparison
basis, a demanding operation that is normally reserved for advanced processors
and DSPs.
The VMX51C1020 and VRS51L2xxx
microcontrollers include an enhanced hardware arithmetic unit allowing
tremendous performance gain for DSP operations, such as dynamic FIR filtering,
typically used in applications that require noise reduction and digital
filtering.
Three sets of test programs were developed for these performance comparison
tests and have been written using the freeware SDCC C-compiler which presents a
fairly good performance in terms of output code density.
Because of architecture and SFR register structure differences that exist
between the standard 8051, the VMX51C1020 and the
VRS51L2xxx, different versions of the test programs were
written to accommodate the three devices. Care was taken to keep the same code
structure for each device in order to have a valid basis for comparison.
Absolute processing power comparison
The first test program, implementing the 16 Taps FIR computation using C
instructions only, was chosen to compare the raw processing power of the
VMX51C1020 and VRS51L2xxx to a standard 8051
MCU. An I/O port was used to monitor the duration FIR Loop and data shifting
computation. The FIR loop calculation was performed on 12-bit data inputs
handled into an integer data type (16-bit) and the output is based on a long
(32-bit) variable. In order to speed up the FIR loop computation, the
coefficient was copied from the Flash memory to the internal RAM. The FIR Filter
coefficient was not included in the FIR processing loop as it was only done
once.
 |
Figure 1 shows a comparison of the processing power of the
VMX51C1020 and VRS51L2xxx compared to the
processing power of a standard 8051, when all the devices operate at their
maximum speed. The standard 8051 being the comparison basis, a factor of 1 has
been assigned to it.
In Figure 1, it is important to note that the operating frequency of the
VMX51C1020 is set to 14.75MHz while the operating frequencies
of both the Standard 8051 and the VRS51L2xxx have been set to
40MHz. Also note that the maximum operating frequency of many standard 8051
devices is limited to 24MHz and 33MHz instead of 40MHz when operating in X1
mode. Furthermore, many 8051 derivatives on the market provide an X2 mode where
6 oscillator cycles are required per system cycle instead of 12 in X1 mode.
However, for some of these devices, the maximum operating frequency is
significantly reduced compared to X1 mode. The Versa MCU processors are
Ramtron's drop-in replacement to standard 8051 devices and most of them can
operate up to 40MHz and do not have an X2 mode.
|
|
Figure 2 provides a relative processing power comparison for an operating
frequency of 14.75MHz which corresponds to the maximum operating speed of the
VMX51C1020 device. Again, the standard 8051 being the
comparison base, a factor of 1 has been assigned.
For a given oscillator frequency, the single cycle operation of the
VMX51C1020 and the VRS51L2xxx processor core
make them 7 to 8 times more powerful than a standard 8051.
Impact of the processing power on FIR loop computation
frequency
In many applications, the ability to perform digital filtering on acquired
data constitutes an advantage as the digital filtering is based on software and
can be adapted to various situations without requiring any hardware changes to
adapt the filter characteristic to the system condition. It can also help to
simplify the application's PCB and, therefore, lower costs
|  |
 |
Figure 3 gives a comparison of the maximum acquisition frequency a system
based on a standard 8051, a VMX51C1020 or a
VRS51L2xxx could sustain while performing a 16 Taps FIR filter
operation.
As demonstrated, standard 8051 devices have hardly enough processing power to
perform operations such as FIR filtering. The histogram figures provided do not
take into account the data acquisition process, so actual numbers are likely to
be lower especially in relation to the standard 8051 if a serial type A/D
converter is used.
The VMX51C1020 integrates a 7 channel on-chip ADC and an
acquisition module that takes care of the entire acquisition process. Also, both
the VMX51C1020 and the VRS51L2xxx provide an
enhanced SPI interface that greatly reduces the payload on the processor if an
external serial A/D converter is used to perform the data acquisition.
Just based on raw processing power, both the VMX51C1020 and
the VRS51L2xxx can sustain data acquisition and digital
filtering in the kilo-Hertz range. For sensor applications this facilitates over
sampling of the data and simplification of the ADC analog filter front end.
|
The Enhanced Hardware Arithmetic Unit
The VMX51C1020 and the VRS51L2xxx devices
integrate an Enhanced Hardware Arithmetic Unit which is able to perform 16-bit
multiplication, 32-bit additions and includes a 3-bit accumulator as well as a
32-bit Barrel Shifter. All of these operate within one system clock cycle. Mor
over, the VRS51L2xxx Arithmetic Unit can perform 16-bit divisions in 5 system
clock cycles.
The hardware based Enhanced Arithmetic Unit integrated to the
VMX51C1020 and the b provides a tremendous performance gain,
making it possible to perform DSP operations that would normally require a DSP
processor.
To demonstrate the benefit of using the Enhanced Arithmetic Unit of the
VMX51C1020 and the VRS51L2xxx the 16 Tap FIR
Filter program was adapted to take advantage of the Arithmetic Unit.
|
For the VRS51L2xxx sections were written in assembler in a
more optimised version to fully take advantage of the Enhanced Arithmetic Unit.
The extremely high performance gain provided by the Arithmetic unit is clearly
demonstrated in Figure 4. In where the operating frequency of the Standard 8051
and the VRS51L2xxx is 40MHz and the operating frequency of the
VMX51C1020 device is set to 14.75MHz.
The blue columns show the maximum frequency at which a 16 Taps FIR loop could
be executed when only relying on a device processor. The red columns show the
performances achieved when using the Arithmetic Unit on the
VMX51C1020 and the VRS51L2xxx but without
using in-line assembler instructions in the FIR computation and data shifting
loops. Finally, the green column shows the performance achieved on the
VRS51L2xxx when in-line assembler instructions are integrated
into the FIR computation and data shifting loops. Even better performances could
be achieved by coding the data processing function in assembler only. The
ability to sustain about a 34kHz FIR computation rate makes it possible to
perform audio processing on the VRS51L2xxx.
|  |
* By Francois Turgeon, Application Engineer, Ramtron International
Corporation
For further information don't hesitate to contact us:
Ramtron@msc-ge.com
|