Introduction
Lately
I have been working on a series of Analytical Instruments that are
roughly “Hybridized”
Lock
In
Amplifiers[1].
These designs are Hybrids because the purely Analog
Lock In Amplifier block
diagram has been
augmented
to incorporate Digital Signal Processing functions
by digitizing the waveforms then applying DSP Techniques (including
FFT's) for noise reduction, reference signal phase comparisons and
maintaining the quadrature phase lock of the synchronous detector
function, which is also a digital processing block. In addition,
sophisticated noise
filtering can be
accomplished all in software at very respectable update rates.
This
is all made possible and simple because of the
latest crop of single chip microprocessors that
are capable of running
at 200 MHz core frequencies and have specialized DSP functions like:
DMA and DSP Processing instructions that allow them to beat even most
specialized DSP Processors at their own game. The latest Microchip
PIC32MZ processors even include free and quite capable DSP libraries
so it really is one stop shopping now. All
this power for
less than the price of a good lunch, as these chips cost less than
$10 each.
PIC32MZ
FFT Performance
The
MIPS core based PIC32MZ processors running at 200 MHz have the
potential to do a really fast and
big FFT. To find out
just how well they do, I fired up my PIC32MZ EF
/ Connectivity Evaluation board from Microchip Technology (DM320007) to answer
a few questions.
The
first question is always: “How
fast is the FFT?” There
is, after all,
no point in asking
any other questions if a 1024 point FFT takes all day.
If
the answer to the first
question was reasonable, the next question should be: “How
big of an FFT can I do?”,
followed by: “What
type of dynamic range can I expect?”
and finally, perhaps more towards the implementation side is the
question: “How is
the FFT scaled, so I can get a calibrated response?”.
For
the FFT implementation I used the DSP library included with version
1.07.01 (March 2016) of Microchips Harmony Software suite. This
library works on the well known Q15 or Q31
bit fixed point schemes. If
you are unfamiliar with Q15 and Q31,
just think of them like signed 16 and 32 bit integers [2].
The
Harmony libraries
are easy to use and well documented. Only two functions need to be
called to accomplish
a FFT. An initialization routine must be called first (and only once)
to load up the twiddle factors for the specific length of FFT. Then
the FFT itself can be called to actually perform the FFT. To get more
information on the libraries, open the Harmony Help and search for
“DSP”.
The
libraries contain both a Q15 and a Q31
bit version, so I bench-marked
both. About the only
guidance
that Microchip gives in the documentation is that
the Q15 version will
probably be faster due
to optimization, and my
testing proved that to be true.
I
used my PC and a quick C# program I wrote that
uses a USB based
Serial connection to
load the test waveform down to the PIC's RAM and then to read the
results back to the PC for final
processing and display.
The test waveform was a
sine wave that was properly
scaled to either 16 or
32 bits. Since the sine wave was calculated with .NET
doubles the result is
better accuracy than the final cast. This means that
the sine wave distortion
was not the limiting
factor in dynamic range in any test
case.
For
timing I used the MIPS Core Counter. This counter runs
at ½ the core frequency or in this case 100 MHz. The Core timer was
read at the start of the routine and after the routine finished, then
the counts are subtracted to get the counts between calling the
routine and its return. This
count is converted to the equivalent time in microseconds
for the timings.
No
interrupts were enabled in
the program and the PIC
program was running in a big loop, where the FFT functions were
called in a blocking fashion. I
allocated the data arrays at compile time and didn't use anything
like malloc()
when the program was running. I
also used
Microchips free version
XC-32 compiler with the
Harmony Framework and no
optimizations turned on [3].
The
Evaluation
board is fitted with
the largest PIC32MZ available, a PIC32MZ2048EFH144, this device has a
whopping 2MB of Program Flash and 512kB of SRAM on board. This
large amount of SRAM allows for some rather large FFT's to be run.
The
whole program even with the command interpreter I built took less
than 1% of the program flash memory.
Now
- On
with the show
The
first test is naturally to find the FFT speed versus size of the FFT
for both the Q15 and Q31 formats.
Table
1 – The FFT speed in
micro-seconds was
measured for each FFT size for both the Q15 and Q31 formats. The
Compiler
could only allocate enough
memory for a 16k Q15 FFT and 8k Q31 FFT.
The
Harmony documentation states that the FFT Initialization function is
written in C code and as such, is sower. This is OK, as the
initialization function only needs to be called once or if the FFT
size changes.
Table
2 – The Q15 and Q31 Initialization function does take longer than
the FFT itself, but this function only needs to be called once or if
the FFT size changes. The
table values are in micro-seconds.
Figure
1 – The Q15 and Q31 FFT times are plotted for a graphical
comparison.
Determining
the FFT Speed for various FFT sizes actually answered two questions,
eventually as I increased the FFT size I ran into a point where the
program execution crashed when I tried to run the FFT. I was able to
coax a 16k Q15 and an
8k Q31 FFT out of the processor before it would not run anymore.
If
your application needs to apply a window to the time
data, that is another
vector array that would need to be maintained in
RAM. Vector averaging
and display buffering also require vector
arrays and these can
quickly eat up all available
RAM so that even these FFT sizes may not be achievable in a full
featured application as
RAM gets gobbled up very quickly as the FFT sizes increase.
Dynamic
Range – Or how many bits is that?
Dynamic
range in the FFT
processing is an
important factor I determining whether
a Q15 or Q31 format FFT is needed.
For
instance a 12 bit ADC can have around 72 dB of dynamic range with no
averaging or processing gain applied and a 24 Bit ADC can have a
basic 144 dB of dynamic
range.
It
wouldn't do you very much good to attempt to use a 24 Bit ADC with a
12 Bit dynamic range signal processing chain. You'd just be in the
numerical noise with no way of averaging your way out of it.
To
determine the FFT's dynamic range I generated a perfect sine wave
with my PC and then scaled it to either Q15 or Q31
full scale format. Then I sent this waveform to the PIC and had the
PIC do a FFT on the data. I returned the complex FFT result to the PC
where I converted it to magnitude dB format for display.
Below
is the results for a large FFT with a full scale signal for both the
Q15 and Q31 format FFT's.
Figure
2 – The Q15 FFT has about a 74 dB full scale to numeric noise
dynamic range.
Figure
3 – The Q31
format has a more 'Spurious' noise look to it but the full scale
range is just around 140 dB. Good enough for 16, 18 and probably 24
bits depending on
the applications exact needs. Note
that the 'spurious looking' noise floor is real and not some internal
limiting problem. I
backed off the input signal amplitude to make sure that some
calculation was not saturating and it had no effect on the noise
spurs amplitude.
Anybody
have a ruler?
As
I mentioned at the start – only
if an affirmative
answer is given to the speed, size and dynamic range questions should
you start worrying about the “How
is the FFT Scaled?”
question.
Actually
the Harmony FFT
implementation
is quite
easy to work with. Most
FFT implementations
that you will
find
scale the amplitude with the size of the FFT or 'N'. This
means that you have to apply a scale factor proportional to 1/N to
the result to get a constant amplitude regardless of the FFT size.
Not here, the proportional to 1/N scaling has already been applied,
meaning that you will get the same amplitude output for any size FFT.
Points scored for
whoever wrote this FFT!
The
amplitude that you get back
depends on the format
that you used in the first place. If you use a Q15 format FFT, for a
full scale peak to peak sine wave input, you will get out a peak
magnitude of (2^15)/2 or 16,384. Similarly the output of a Q31 format
FFT will be (2^31)/2 = 1,073,741,824.
The
½ effect is due to the fact that the actual output of a FFT is a
mirror of positive and negative frequencies with ½ the total power
in each spectrum. Since we are only normally concerned with the
positive frequency side, we observe
that the power is effectively reduced by ½.
You
can easily apply a constant multiplier to get any proper scale factor
that you need to the FFT result. Remember that the input signal, if
it is a sine wave will probably
be measured as a peak
to peak value, whereas the FFT Magnitude display shows the RMS value
of the input signal at
each specific frequency bin.
Also
remember that any windowing applied to the time series or zero
padding will effect the amplitude of the resulting FFT output also
and this must be
accounted for by proper scaling. You can check out some previous
articles for more about how
to do that
[4] [5].
Summary
It
is simply unbelievable what we can do now with a single chip
micro-controller when they cost less than lunch and run at 200 MHz.
It wasn't too long ago that I had a 33 MHz, 386 Desktop PC and this
little 32 Bit PIC can do a FFT faster than that PC could!
Article References
[1]
Lock-In-Amplifiers, Wikipedia
[2]
Q / Fixed Point
Notation formats,
Wikipedia
[3] Most of the DSP library that really needs speed is provided in object file format or coded in assembly already. Using the 'Pro' level compiler will probably not increase the speeds reported here.
[4]
Hageman, Steve, EDN Online, June 19, 2012, “Understanding DFT &
FFT Iplementations”
[5]
Hageman, Steve, EDN Online, August 6, 2015, “Real Spectrum Analysis
with Octave & MATLAB”
http://www.edn.com/design/test-and-measurement/4440068/Real-Spectrum-Analysis-with-Octave-and-MATLAB
Article
By:
Steve Hageman www.AnalogHome.com
We
design custom: Analog, RF and Embedded systems
for a wide variety of industrial and commercial clients. Please feel
free to contact us if we can help on your next project.
This Blog does not use cookies (other than the edible ones).
This Blog does not use cookies (other than the edible ones).
No comments:
Post a Comment