Lately I have been working on a series of Analytical Instruments that are roughly “Hybridized” Lock In Amplifiers. These designs are Hybrids because the purely Analog Lock In Amplifier block diagram has been augmented to incorporate Digital Signal Processing functions by digitizing the waveforms then applying DSP Techniques (including FFT's) for noise reduction, reference signal phase comparisons and maintaining the quadrature phase lock of the synchronous detector function, which is also a digital processing block. In addition, sophisticated noise filtering can be accomplished all in software at very respectable update rates.
This is all made possible and simple because of the latest crop of single chip microprocessors that are capable of running at 200 MHz core frequencies and have specialized DSP functions like: DMA and DSP Processing instructions that allow them to beat even most specialized DSP Processors at their own game. The latest Microchip PIC32MZ processors even include free and quite capable DSP libraries so it really is one stop shopping now. All this power for less than the price of a good lunch, as these chips cost less than $10 each.
PIC32MZ FFT Performance
The MIPS core based PIC32MZ processors running at 200 MHz have the potential to do a really fast and big FFT. To find out just how well they do, I fired up my PIC32MZ EF / Connectivity Evaluation board from Microchip Technology (DM320007) to answer a few questions.
The first question is always: “How fast is the FFT?” There is, after all, no point in asking any other questions if a 1024 point FFT takes all day.
If the answer to the first question was reasonable, the next question should be: “How big of an FFT can I do?”, followed by: “What type of dynamic range can I expect?” and finally, perhaps more towards the implementation side is the question: “How is the FFT scaled, so I can get a calibrated response?”.
For the FFT implementation I used the DSP library included with version 1.07.01 (March 2016) of Microchips Harmony Software suite. This library works on the well known Q15 or Q31 bit fixed point schemes. If you are unfamiliar with Q15 and Q31, just think of them like signed 16 and 32 bit integers .
The Harmony libraries are easy to use and well documented. Only two functions need to be called to accomplish a FFT. An initialization routine must be called first (and only once) to load up the twiddle factors for the specific length of FFT. Then the FFT itself can be called to actually perform the FFT. To get more information on the libraries, open the Harmony Help and search for “DSP”.
The libraries contain both a Q15 and a Q31 bit version, so I bench-marked both. About the only guidance that Microchip gives in the documentation is that the Q15 version will probably be faster due to optimization, and my testing proved that to be true.
I used my PC and a quick C# program I wrote that uses a USB based Serial connection to load the test waveform down to the PIC's RAM and then to read the results back to the PC for final processing and display. The test waveform was a sine wave that was properly scaled to either 16 or 32 bits. Since the sine wave was calculated with .NET doubles the result is better accuracy than the final cast. This means that the sine wave distortion was not the limiting factor in dynamic range in any test case.
For timing I used the MIPS Core Counter. This counter runs at ½ the core frequency or in this case 100 MHz. The Core timer was read at the start of the routine and after the routine finished, then the counts are subtracted to get the counts between calling the routine and its return. This count is converted to the equivalent time in microseconds for the timings.
No interrupts were enabled in the program and the PIC program was running in a big loop, where the FFT functions were called in a blocking fashion. I allocated the data arrays at compile time and didn't use anything like malloc() when the program was running. I also used Microchips free version XC-32 compiler with the Harmony Framework and no optimizations turned on .
The Evaluation board is fitted with the largest PIC32MZ available, a PIC32MZ2048EFH144, this device has a whopping 2MB of Program Flash and 512kB of SRAM on board. This large amount of SRAM allows for some rather large FFT's to be run.
The whole program even with the command interpreter I built took less than 1% of the program flash memory.
Now - On with the show
The first test is naturally to find the FFT speed versus size of the FFT for both the Q15 and Q31 formats.
Table 1 – The FFT speed in micro-seconds was measured for each FFT size for both the Q15 and Q31 formats. The Compiler could only allocate enough memory for a 16k Q15 FFT and 8k Q31 FFT.
The Harmony documentation states that the FFT Initialization function is written in C code and as such, is sower. This is OK, as the initialization function only needs to be called once or if the FFT size changes.
Table 2 – The Q15 and Q31 Initialization function does take longer than the FFT itself, but this function only needs to be called once or if the FFT size changes. The table values are in micro-seconds.
Figure 1 – The Q15 and Q31 FFT times are plotted for a graphical comparison.
Determining the FFT Speed for various FFT sizes actually answered two questions, eventually as I increased the FFT size I ran into a point where the program execution crashed when I tried to run the FFT. I was able to coax a 16k Q15 and an 8k Q31 FFT out of the processor before it would not run anymore.
If your application needs to apply a window to the time data, that is another vector array that would need to be maintained in RAM. Vector averaging and display buffering also require vector arrays and these can quickly eat up all available RAM so that even these FFT sizes may not be achievable in a full featured application as RAM gets gobbled up very quickly as the FFT sizes increase.
Dynamic Range – Or how many bits is that?
Dynamic range in the FFT processing is an important factor I determining whether a Q15 or Q31 format FFT is needed.
For instance a 12 bit ADC can have around 72 dB of dynamic range with no averaging or processing gain applied and a 24 Bit ADC can have a basic 144 dB of dynamic range.
It wouldn't do you very much good to attempt to use a 24 Bit ADC with a 12 Bit dynamic range signal processing chain. You'd just be in the numerical noise with no way of averaging your way out of it.
To determine the FFT's dynamic range I generated a perfect sine wave with my PC and then scaled it to either Q15 or Q31 full scale format. Then I sent this waveform to the PIC and had the PIC do a FFT on the data. I returned the complex FFT result to the PC where I converted it to magnitude dB format for display.
Below is the results for a large FFT with a full scale signal for both the Q15 and Q31 format FFT's.
Figure 2 – The Q15 FFT has about a 74 dB full scale to numeric noise dynamic range.
Figure 3 – The Q31 format has a more 'Spurious' noise look to it but the full scale range is just around 140 dB. Good enough for 16, 18 and probably 24 bits depending on the applications exact needs. Note that the 'spurious looking' noise floor is real and not some internal limiting problem. I backed off the input signal amplitude to make sure that some calculation was not saturating and it had no effect on the noise spurs amplitude.
Anybody have a ruler?
As I mentioned at the start – only if an affirmative answer is given to the speed, size and dynamic range questions should you start worrying about the “How is the FFT Scaled?” question.
Actually the Harmony FFT implementation is quite easy to work with. Most FFT implementations that you will find scale the amplitude with the size of the FFT or 'N'. This means that you have to apply a scale factor proportional to 1/N to the result to get a constant amplitude regardless of the FFT size. Not here, the proportional to 1/N scaling has already been applied, meaning that you will get the same amplitude output for any size FFT. Points scored for whoever wrote this FFT!
The amplitude that you get back depends on the format that you used in the first place. If you use a Q15 format FFT, for a full scale peak to peak sine wave input, you will get out a peak magnitude of (2^15)/2 or 16,384. Similarly the output of a Q31 format FFT will be (2^31)/2 = 1,073,741,824.
The ½ effect is due to the fact that the actual output of a FFT is a mirror of positive and negative frequencies with ½ the total power in each spectrum. Since we are only normally concerned with the positive frequency side, we observe that the power is effectively reduced by ½.
You can easily apply a constant multiplier to get any proper scale factor that you need to the FFT result. Remember that the input signal, if it is a sine wave will probably be measured as a peak to peak value, whereas the FFT Magnitude display shows the RMS value of the input signal at each specific frequency bin.
Also remember that any windowing applied to the time series or zero padding will effect the amplitude of the resulting FFT output also and this must be accounted for by proper scaling. You can check out some previous articles for more about how to do that  .
It is simply unbelievable what we can do now with a single chip micro-controller when they cost less than lunch and run at 200 MHz. It wasn't too long ago that I had a 33 MHz, 386 Desktop PC and this little 32 Bit PIC can do a FFT faster than that PC could!
 Lock-In-Amplifiers, Wikipedia
 Q / Fixed Point Notation formats, Wikipedia
 Most of the DSP library that really needs speed is provided in object file format or coded in assembly already. Using the 'Pro' level compiler will probably not increase the speeds reported here.
 Hageman, Steve, EDN Online, June 19, 2012, “Understanding DFT & FFT Iplementations”
 Hageman, Steve, EDN Online, August 6, 2015, “Real Spectrum Analysis with Octave & MATLAB”
Article By: Steve Hageman www.AnalogHome.com
We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.