Tuesday, March 14, 2017

Auto Generate a 'C' Function Prototype Header File

There is a lot of discussion on the proper way to use header files in Embedded C projects. I don't want to get into that discussion, rather I want to present a tool that is useful for my usage. If you don't agree with my use model, that's OK, you don't have to use this tool that way, it is very adaptable to any situation! [1]

I use header files for three things,

  1. One big one that has all the #includes in it so that every module has the proper references to things like: <stdio.h> and <math.h>.
  2. One that has the global program data in it.
  3. One that has the various public function prototypes (signatures) in it.
The first two are relatively easy to make and maintain. They also settle down quickly in a project as the tasks they maintain are defined early on and while there may be refactoring, they rarely change much.

#3 however is constantly changing and even in a small project it may grow to many, many function prototypes and refactoring causes constant updating.

It is tedious to have to change the data in two places when refactoring or when adding functionality through public functions in an embedded C program.

Auto Generation to the Rescue

I knew that someone somewhere must have written a utility that reads a directory, looks through all the '.c' files and auto-generates a header with all the public function prototypes in it.

Sure enough I found some C code written in 1993, by a Mr. Richard Hipp on the InterWebs that does just that [2].

It's a small program, all in one file and only several pages long. I brought it into Pelles C [3] to see if I could compile it under Win 7 and almost unbelievably it compiled with only a few simple changes. For code written in 1993 that compiles on Windows that seems almost unbelievable to me!

What the program is supposed to do is read every '.c' (and or '.h') file in a directory, extract the public functions and write them as function prototypes into individual '.h' files or one big '.h' file that can be added to the project.

Functions are marked as local (i.e. not for exporting into the '.h' file) by either using the keyword: 'static' or by the use of a 'LOCAL' define (see the program documentation [2]).

The program is wonderfully written, easy to follow and worked straight up out of the box, with one exception.

To read all the '.c' files at once to make a single output file the program depends on the UNIX ability to do wildcard expansion on the command line. MSDOS does not have that capability, so I had to wrap the makeheaders.exe in a MSDOS Batch file to make it work the way I wanted to use it.

MakeHeaders is so complete, it has the ability to pipe in a file list of names on the command line and then make a single big '.h' file. Mr. Hipp thought of everything. The batch file that I used is below.

REM Make one big Header File
dir /B *.c > mkhdr_input.txt
makeheaders -h -f mkhdr_input.txt >AutoGeneratedPrototypes.h
del mkhdr_input.txt
pause

 

Listing 1 – A MSDOS batch file to operate the MakeHeaders program work the way I wanted it to. Upon running it makes a temporary file with all the names of the '.c' files in the current directory. Then it feeds this list into the makeheaders.exe. The program then parses all the files picking out the public function prototypes to write into one big '.h' file.

The batch file of Listing 1 first makes a file called: “mkhdr_input.txt” that contains just the file names. The “dir /B” switch is for the bare format which will list just the file names.

The makeheaders.exe is then fed with the mkhdr_input.txt file as input. The '-h' switch causes makeheaders to make one big '.h' file as output and then it redirects the output to the standard output which I redirect to the file AutoGeneratedPrototypes.h with the “>” redirect command.

The '-f' switch tells makeheaers that it will get it's input from the file specified, in this case: “mkhdr_input.txt”

Finally I just cleanup by deleting the: “mkhdr_input.txt” file.

There are many other options and ways to make the MakeHeaders program operate so be sure to check out the documentation [2]. For example the same list input can be used to make an individual '.h' file that corresponds to every '.c' file in the directory. This format is preferred by many and in very large programs may be preferable.

I just add the AutoGeneratedHeaders.h in my master 'include .h' file and the rest is automatic.

Perfect Function Prototypes Every Time

Now, anytime I refactor or add public functions to any source file I can just double click on the MakeHeaders.bat file and a new AutoGeneratedPrototypes.h file is made all ready for compiling.

Extra Bonus

If you want to use MakeHeaders to create a single, individual '.h' file for every '.c' file just use the batch file below.

REM Make A Seperate Header File for Each *.C file
for %%G IN (*.c) DO MakeHeaders %%G
pause


References:

[1] To find out more about the various ways to use header files, do a Google Search like,
https://www.google.com/search?q=proper+use+of+c+header+files

Then pick a strategy that makes sense for you.

[2] Make Headers program. As of March 2017, the source code and documentation can be found at,
    http://www.hwaci.com/sw/mkhdr/

[3] Pelles C – A very good freeware C Compiler for Windows,
    http://www.pellesc.de/



Article By: Steve Hageman www.AnalogHome.com     

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.  

This Blog does not use cookies (other than the edible ones). 

Monday, March 6, 2017

Simple Circuits Add to Versatility of the AD9834 Direct Digital Synthesizer IC


The little AD9834 Direct Digital Synthesizer (DDS) made by Analog Devices is a powerhouse of a design that is found in all sorts of products from Radios to Power Supplies. It is the “NE555” of DDS chips in it's popularity [1].

The AD9834 consists of a 28 bit programmable DDS Core and a 10 Bit Current Output DAC. The Nominal DAC output current is 0 to 3 mA. This current can conveniently be output directly into a 200 Ohm load to generate a 0 to 600 mV output voltage (See figure 1).


Figure 1 – Standard output circuit configuration for AD9834 with a FSADJ resistor of 6.81k provides a fixed 0 to 600 mV output.

Note: The following circuits are simplified and do not show power supply connections or proper bypassing. Please refer to the parts specific data sheets for complete usage information.

While 0 to 600 mV may be useful in many applications it is not particular useful if the output voltage needs to be user adjustable or bipolar. This is especially true when the DDS is used like a function generator where the end user needs an adjustable amplitude.

Adjustable DAC Full Scale Current

The first approach to adding an adjustable output to the AD934 is to attack the DAC current setting resistor. This resistor is nominally 6.81k for a DAC current of 0 to 3mA. The voltage at pin 1 (FSADJ) is nominally 1.15 Volts and this generates a current in the 6.81k resistor of 0.1689 mA (1.15/6810). This current gets scaled by 18 times internally in the AD9834 to get to the final 3mA DAC Full Scale Current.

The internal design of this circuit lends itself to controlling the DAC full Scale current over a reasonable range and this can be a useful and inexpensive way to get an analog or digital adjustment on the DDS output voltage.

As shown in Figure 2, a simple precision OPAMP from Linear Technology [2] used in a scaling circuit has been added to the DDS to control the DDS output voltage over a 4:1 range. Maximum DDS output voltage for this circuit is achieved for a 0 Volt control input.

A potentiometer or any 0 to 5V DAC output can be use as the 0 to 5V input to allow complete digital control of the DDS output voltage.



Figure 2 – A simple OPAMP circuit added to the AD9834 can give the AD9834 a 4:1 output Adjustment Range for a 0 to 5 Volt input signal. The Input signal could be a Potentiometer or from a 0 to 5V DAC.

 
The limiting factor in the maximum achievable adjustment range of the circuit in Figure 2 is the AC performance of the DDS DAC. While the output can be adjusted down from its maximum value the feedthrough glitches from the DAC switches will remain the same and the linearity of the DAC will suffer at lower output levels. Note also, that any excess noise on the 0 to 5V control voltage will additionally cause AM Modulation on the DDS output so add filtering as may be required by your application.

I have used this circuit in Figure 2 for a 4:1 adjustability with decent results. Your mileage may vary, so be sure to check the AC parameters that are important in your specific application.

Multiplying DAC on the DDS Output

For the ultimate in digital adjustability a Multiplying DAC  (MDAC) can be used at the output of the DDS to get 2^14 (16384:1) or better than 80 dB of adjustment range.

The AD5453 family from Analog Devices is a very high bandwidth Multiplying DAC [3] that comes in 8, 10, 12 and 14 bit resolutions. It takes in a AC reference voltage and outputs a Current that is scaled by a Digital Control Word.

Most MDAC's have a very low -3 dB bandwidth on the order of 20 kHz, the High Speed AD5453 when used with a suitable OPAMP output has a -3 dB bandwidth of 10 MHz or better. The maximum attenuation (or how low you can control the output) is flat to 300 kHz at 14 bits, rising to 1MHz at 12 bits, 3MHz at 10 bits finally rising to 10 MHz at 8 bits.



Figure 3 – Combining a high speed AD5453 MDAC and a LT1087 Dual OPAMP allows very complete control of the AD9834 DDS output.


Note: The 10 uF capacitor sets the low frequency roll off. With the 10 uF value shown the low frequency, -3 dB point is below 2 Hz. (The input resistance of the AD5453 VREF Pin is 7k Ohms minimum).

Note: The 1.5pF capacitors should be adjusted in the final circuit for maximum output flatness over frequency.

The circuit of Figure 3 provides a +/- 5 Volt output with low distortion to 1 MHz and provides 80 dB plus of output voltage adjustment range. Additionally the output can be offset from -5V to +5V with the addition of an offset adjustment control via a low cost DAC or POT (Offset Adjust Input).

DDS Output Control At Even Higher Frequencies

If you need to operate the AD9834 at even higher frequencies, closer to the maximum specified fundamental output of 37.5 MHz, or even operating in “Super Nyquist” mode [4] you should look at a 50 Ohm CMOS RF Attenuator like those manufactured by Peregrine Semiconductor [5]. The PE43711 has a frequency range down to 9 kHz and 31 dB of control with 0.25 dB steps all the way to 6 GHz. At higher frequencies you will probably be designing around 50 ohm circuit impedance's anyway so this should not be much of an issue. Multiple PE43711's can be connected in series to get more attenuation in 31 dB chunks.

Note: At lower RF frequencies, less than about 50 MHz, CMOS, SiGe and Silicon based IC's are preferred to GaAs IC's. This is because the GaAs IC's typically have worse harmonic distortion (Especially very bad 2nd harmonic distortion) at low RF frequencies.

References:

[1] The NE555 timer is arguably the most popular linear IC of all time.

[2] Linear Technology LT1677 Precision and LT1087 High Speed OPAMPS are manufactured by Linear Technology Inc. Now a part of Analog Devices.

[3] Analog Devices Application Note: “Multiplying DACs Excel at Handling AC Signals”.

[4] Super Nyquist Mode, See: Analog Devices Application Note AN-939

[5] Peregrine Semiconductor, Inc



Article By: Steve Hageman www.AnalogHome.com 
 

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

This Blog does not use cookies (other than the edible ones). 

Thursday, January 12, 2017

Numeric optimizations in C# for a faster DFT

Kirk: "More speed Scotty !!!"

Like Capt. Kirk, we engineers it seems, are never satisfied and always want more speed!

How to get more speed out of DFT calculations in C#

Currently the open source project: "DSPLib" [1] calculates Discrete Fourier Transforms (DFT) in double precision math, so the question can be asked: Could using some other data type produce faster DFT calculations? And by fast I mean fast enough to be worthwhile spending the time doing – like 50% faster as 10% faster would likely not be worth the effort.

After reading many articles and blogs on how the .NET framework, the CLR (Common Language Runtime) and Intel Processors handle math, one will find seemingly convincing evidence and very simple benchmarks that Double and Float math calculation takes about the same time on a modern processor. There is also some circumstantial evidence that Floating Point Math is as fast as integer math because the Floating Point Unit is so fast in modern CPU's. Likewise Int64's should theoretically be about the same speed as Int32's because people say that Intel processors calculate everything as Int64's and truncate for shorter data types anyhow.

But... How would all this apply to an actual DFT algorithm? A DFT moves around a fair amount of data in arrays. Can a different numeric format speed up the DFT calculations that I specifically want to make?

I didn't know, but I remembered my Mark Twain. He popularized the saying:  

     "There are three kinds of lies: lies, darned lies, and statistics."

Mr. Twain was a humorist, but if he was a programmer he might have changed the saying to,
     "There are three kinds of lies: lies, darned lies, and benchmarks." 

This applies aptly to what I was faced with – one can find many very simple “Benchmarks” that people have written to show that one numeric type is faster than another or that the numeric format makes no difference, the trouble is these don't store intermediate results or access arrays over and over the way a DFT does. Are they really applicable? I set out to find out…


My DFT Benchmark

I roughed out a basic DFT (All the same loops, arrays and multiples), I used look up tables instead of calculating the Sin/Cos terms in the loop as that is the way I would use them anyway and I did not do any further optimization or try to help or hinder the C# compiler from optimizing anything. I also did not invoke the task parallel extensions, which have been shown to immediately provide a 3x improvement in performance even with a dual core processor [1][2].


// Note: This is not a working DFT - For simplified timings only.
// It does have the same data structures and data movements as a real DFT.
public Int64[] DftInt64(Int64[] data)
{
    int n = data.Length;
    int m = n;
    Int64[] real = new Int64[n];
    Int64[] imag = new Int64[n];
    Int64[] result = new Int64[m];
    for (int w = 0; w < m; w++)
    {
       for (int t = 0; t < n; t++)
       {
          real[w] += data[t] * Int32Sin[t];
          imag[w] -= data[t] * Int32Cos[t];
       }
       result[w] = real[w] + imag[w];
    }
    return result;
}

Figure 1: The basic code that I used to benchmark my DFT. I wrote one of these for every numeric data type by changing the variables to the type tested. I initialized the input data and the Sin / Cos arrays (Int32Sin[] and Int32Cos[] above) with real Sin and Cos data once at the start of the program. I then called the DFT's over and over again until the elapsed time settled out, which I believe is an indication that the entire program was running in the processors cache or at least a stable portion of memory. This procedure was repeated for each DFT Length.

I wrote DFT routines to calculate in: Double, Float, Int64, Int32 and Int16 data types and compiled release code in VS2015 for x86 and x64 architectures. The results of these tests are presented in figure 2 below.

Figure 2A – The raw results of my real world benchmark for DFT's. All the times are in Milliseconds and were recorded with .NET's StopWatch() functionality. Two program builds were recorded. One for the x86 and one for the x64 code compilation types in Visual Studio 2015.

Figure 2B – Raw timing numbers are a little hard to compare and comprehend. To improve on this, the view above normalizes each row to the Double data type time for that size DFT, which makes it easier to compare any speed gains in the results. For instance: In the X64 Release version, for a DFT size of 2000, the Int16 data type took 0.28:1 the time of the Double data type (10mS / 36mS = 0.28).


Int16 Compared to Int32 
The Int16 data type timing is comparable to that of the Int32 for both the x86 and x64 compiled program. An interesting note is that the Int16, compiled as a x86 program actually starts to take longer than the Int32 as the DFT size gets really big. This is probably due to the compiled code having to continually cast and manipulate the Int16 values to keep them properly at 16 bits long.

The bottom line is: Int16's are no faster and in some cases longer than Int32's so there is really no point in considering them further I this discussion of speeding up a DFT calculation.

Using an Int16 would also severely limit the dynamic range to less than 96 dB. Many digitizers have better dynamic range than this now. I would consider this a big limiting factor.

Int32 Compared to Double 
The Int32 is much faster than a Double for small DFT's at all compilations (x86 or x64). However as the DFT size gets really big the speed difference disappears. This is especially true for the x86 Compilation where a large DFT is actually slower than the equivalent Double DFT. With x64 compilation, the Int32 is still faster even at large DFT's but it too shows this non-linear behavior as the DFT size increases. This non-linear behavior is probably due to the data arrays not fitting in the processors fastest cache as the arrays get larger.


Interesting Point #1: Benchmark timings can be data array size dependent and in this particular case are non-linear.


Int32 compared to Int64 
For the x86 compilation the Int64 is definitely a non-starter. In all cases tested the Int64 is slower than the equivalent Double calculation. This makes sense as all the address and register calculations would need to be stitched together 32 bit registers. In the x86 case the Int64 is actually slower than the Double data type for all DFT sizes tested!

With the x64 compilation the Int32 and Int64 data types are comparable across all DFT sizes.

It probably does not make any sense to favor the Int64 over an Int32 even for the increased dynamic range. A Int32 calculation can yield a 192 dB dynamic range. This is way more than most applications can use or will ever require.

Interesting Point #2: Somewhat unsurprisingly, when using the x86 Compilation, the use of Int64's is very time consuming, especially when compared to the Int32. More surprisingly is that the Int64 actually takes longer than the equivalent Double.



Float compared to Double 
In both x86 and x64 compilation the Float is quicker than the Double data type. Twice as fast at smaller DFT sizes, the speed gets comparable at very large DFT sizes. That is not the common result that you can glean from the internet. Most peoples benchmarks and discussions suggest the Double and Float data type will have the same execution time.


Interesting Point #3: Despite what the Internet says, The Float data type is faster than Double especially for small DFT sizes in this benchmark.



Conclusion 
It is clear that for the fastest DFT calculations the x64 compilation should be used no matter what. This would not seem to be a hardship for anyone as 64 bit Windows 7 (and Win 10?) pretty much rules the world right now and everything from here on out will be at least 64 bits.

Even though the Float data type is usually faster than Double, the clear winner is to use the x64 compilation and the Int32 data type.

At the smaller DFT sizes the Int32 is nearly 3 times faster and even at really large DFT sizes the Int32 is still 25% faster than the Double data type.

Using an Int32 is not a resolution hardship in most real world cases. Most digitizers output their data as an integer data type anyway and a Int32 bit data type can handle any realistic 8 to 24 bit digitizer in use today.

Since the Int32 data type can provide a dynamic range of 192 dB. That seems to be plenty for a while at any rate. As a comparison the Float data type provides about 150 dB of dynamic range.


Next Steps 
Now I have some real and verified benchmarks that can lead me to the best bet on improving the real time performance of a .NET based DFT.

The next obvious steps to optimize this further for the 'Improved' DSPLib library is,


1) Apply Task Parallel techniques to speed the Int32 DFT up for nearly no extra work as the DSPLib library already does [1].

2) Having separate Real[] and Imaginary[] intermediate arrays probably prevents the C# array bounds checker from being the most effective. Flattening these into one array with double the single dimension size will probably yield a good speed increase. This however needs to be verified (Again: Benchmark). References 3 and 4 provide some information on how to proceed here.


At least I have separated the simplistic Internet benchmarks from the real facts as it applies to my specific DFT calculation needs.

The takeaway from all this is: To really know how to optimize even a relatively simple calculation, some time needs to be spent benchmarking actual code and hardware that reflects the actual calculations and data movement required.



References:

[1] DSPLib: An open source FFT / DFT Fourier Transform Library for .NET 4

https://www.codeproject.com/articles/1107480/dsplib-fft-dft-fourier-transform-library-for-net
DSPLib also applies modern multi-threading techniques to get the highest speed possible for any processor as explained in the article above. 

[2] The computer that I ran these benchmarks on is a lower end dual core i7-5500U processor with 8 GB of RAM running 64 Bit, Windows 7 Pro. Higher end processors generally have more on board Cache and will generally give faster results especially at larger DFT sizes.

[3] Dave Detlefs, “Array Bounds Check Elimination in the CLR”


[4] C# Flatten Array

https://www.dotnetperls.com/flatten-array

NOTE: The C# Compiler is always being worked on for optimization improvements. C# V7 is just around the corner in January 2017 and it has some substantial changes, these changes may also improve or change the current optimization schemes. If in doubt - always check what the current compiler does.

Article By: Steve Hageman www.AnalogHome.com 
We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

This Blog does not use cookies (other than the edible ones). 

Sunday, June 19, 2016

DSPLib – An Open Source FFT Library for .NET 4

There are many Open Source and Commercial implementations around the web and in textbooks for computing Fourier Transforms. Unfortunately most are flawed in a number of ways,
  1. They produce an un-calibrated result that changes depending on the number of points transformed. 
  2. They include no built in methods to scale for Windowing of the input data.
  3. They always have no proper way to measure noise accurately.
  4.  They don't size the returned spectrum to have just the real part of the spectrum. 
  5. They implement their own Complex number type. Ignoring .NET 4's built in Complex data type.
  6. They aren't complete. You have to add a bunch of helper routines every time.
  7.  They have restrictive Open Source Licenses.

All of these things take hours of tweaking to get a usable FFT or DFT running from even the best of the currently available libraries.

I decided to solve this problem for myself and my clients by making a pretty complete Fourier Transform Library that implements, 
  1. Properly Scaled Fast Fourier Transforms.
  2. Properly Scaled Discrete Fourier Transforms. 
  3. Properly Scaled Data Windowing.
  4. Proper functions to scale for noise and signals in the correct manor. 
  5. Signal Generation for Testing.
  6. Useful Array Math routines.

This work is the culmination of about 5 years worth of work using, revising and tweaking other libraries and implementations, both open source and commercial before I wrote my own.

DSPLib is the first Fourier Transform Library that can take any time domain signal input, like from an ADC, apply one of the 27 built in window types and produce a correctly scaled Spectrum Output for either signal or noise analysis with no code tweaking required at all.


The library is released under the very non restrictive MIT License and is essentially royalty free for any use, even commercial.

The complete write up is at codeproject.com – take a look and enjoy never needing to spend hours tweaking Fourier Transform code again.

Open Source .NET FFT Fast Fourier Transform Library Code
Open Source C#  FFT Fast Fourier Transform Library Code
Open Source .NET  DFT Discrete Fourier Transform Library Code
Open Source C#  DFT Discrete Fourier Transform Library Code
Open Source C# FFT Library,   Open Source .NET FFT Library
Open Source C# DSP Library Code
Open Source .NET DSP Library Code
 


Article By: Steve Hageman www.AnalogHome.com 

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project. 

This Blog does not use cookies (other than the edible ones). 


Monday, April 25, 2016

FFT's meet 200MHz, PIC32MZ Microprocessors

Introduction

Lately I have been working on a series of Analytical Instruments that are roughly “Hybridized” Lock In Amplifiers[1]. These designs are Hybrids because the purely Analog Lock In Amplifier block diagram has been augmented to incorporate Digital Signal Processing functions by digitizing the waveforms then applying DSP Techniques (including FFT's) for noise reduction, reference signal phase comparisons and maintaining the quadrature phase lock of the synchronous detector function, which is also a digital processing block. In addition, sophisticated noise filtering can be accomplished all in software at very respectable update rates.

This is all made possible and simple because of the latest crop of single chip microprocessors that are capable of running at 200 MHz core frequencies and have specialized DSP functions like: DMA and DSP Processing instructions that allow them to beat even most specialized DSP Processors at their own game. The latest Microchip PIC32MZ processors even include free and quite capable DSP libraries so it really is one stop shopping now. All this power for less than the price of a good lunch, as these chips cost less than $10 each.


PIC32MZ FFT Performance

The MIPS core based PIC32MZ processors running at 200 MHz have the potential to do a really fast and big FFT. To find out just how well they do, I fired up my PIC32MZ EF / Connectivity Evaluation board from Microchip Technology (DM320007) to answer a few questions.

The first question is always: “How fast is the FFT?” There is, after all, no point in asking any other questions if a 1024 point FFT takes all day.

If the answer to the first question was reasonable, the next question should be: “How big of an FFT can I do?”, followed by: “What type of dynamic range can I expect?” and finally, perhaps more towards the implementation side is the question: “How is the FFT scaled, so I can get a calibrated response?”.

For the FFT implementation I used the DSP library included with version 1.07.01 (March 2016) of Microchips Harmony Software suite. This library works on the well known Q15 or Q31 bit fixed point schemes. If you are unfamiliar with Q15 and Q31, just think of them like signed 16 and 32 bit integers [2].

The Harmony libraries are easy to use and well documented. Only two functions need to be called to accomplish a FFT. An initialization routine must be called first (and only once) to load up the twiddle factors for the specific length of FFT. Then the FFT itself can be called to actually perform the FFT. To get more information on the libraries, open the Harmony Help and search for “DSP”.

The libraries contain both a Q15 and a Q31 bit version, so I bench-marked both. About the only guidance that Microchip gives in the documentation is that the Q15 version will probably be faster due to optimization, and my testing proved that to be true.

I used my PC and a quick C# program I wrote that uses a USB based Serial connection to load the test waveform down to the PIC's RAM and then to read the results back to the PC for final processing and display. The test waveform was a sine wave that was properly scaled to either 16 or 32 bits. Since the sine wave was calculated with .NET doubles the result is better accuracy than the final cast. This means that the sine wave distortion was not the limiting factor in dynamic range in any test case.

For timing I used the MIPS Core Counter. This counter runs at ½ the core frequency or in this case 100 MHz. The Core timer was read at the start of the routine and after the routine finished, then the counts are subtracted to get the counts between calling the routine and its return. This count is converted to the equivalent time in microseconds for the timings.

No interrupts were enabled in the program and the PIC program was running in a big loop, where the FFT functions were called in a blocking fashion. I allocated the data arrays at compile time and didn't use anything like malloc() when the program was running. I also used Microchips free version XC-32 compiler with the Harmony Framework and no optimizations turned on [3].

The Evaluation board is fitted with the largest PIC32MZ available, a PIC32MZ2048EFH144, this device has a whopping 2MB of Program Flash and 512kB of SRAM on board. This large amount of SRAM allows for some rather large FFT's to be run.

The whole program even with the command interpreter I built took less than 1% of the program flash memory.


Now - On with the show

The first test is naturally to find the FFT speed versus size of the FFT for both the Q15 and Q31 formats.


Table 1 – The FFT speed in micro-seconds was measured for each FFT size for both the Q15 and Q31 formats. The Compiler could only allocate enough memory for a 16k Q15 FFT and 8k Q31 FFT.

The Harmony documentation states that the FFT Initialization function is written in C code and as such, is sower. This is OK, as the initialization function only needs to be called once or if the FFT size changes.


Table 2 – The Q15 and Q31 Initialization function does take longer than the FFT itself, but this function only needs to be called once or if the FFT size changes. The table values are in micro-seconds.


Figure 1 – The Q15 and Q31 FFT times are plotted for a graphical comparison.


Determining the FFT Speed for various FFT sizes actually answered two questions, eventually as I increased the FFT size I ran into a point where the program execution crashed when I tried to run the FFT. I was able to coax a 16k Q15 and an 8k Q31 FFT out of the processor before it would not run anymore.

If your application needs to apply a window to the time data, that is another vector array that would need to be maintained in RAM. Vector averaging and display buffering also require vector arrays and these can quickly eat up all available RAM so that even these FFT sizes may not be achievable in a full featured application as RAM gets gobbled up very quickly as the FFT sizes increase.


Dynamic Range – Or how many bits is that?

Dynamic range in the FFT processing is an important factor I determining whether a Q15 or Q31 format FFT is needed.

For instance a 12 bit ADC can have around 72 dB of dynamic range with no averaging or processing gain applied and a 24 Bit ADC can have a basic 144 dB of dynamic range.

It wouldn't do you very much good to attempt to use a 24 Bit ADC with a 12 Bit dynamic range signal processing chain. You'd just be in the numerical noise with no way of averaging your way out of it.

To determine the FFT's dynamic range I generated a perfect sine wave with my PC and then scaled it to either Q15 or Q31 full scale format. Then I sent this waveform to the PIC and had the PIC do a FFT on the data. I returned the complex FFT result to the PC where I converted it to magnitude dB format for display.

Below is the results for a large FFT with a full scale signal for both the Q15 and Q31 format FFT's.


Figure 2 – The Q15 FFT has about a 74 dB full scale to numeric noise dynamic range.



Figure 3 – The Q31 format has a more 'Spurious' noise look to it but the full scale range is just around 140 dB. Good enough for 16, 18 and probably 24 bits depending on the applications exact needs. Note that the 'spurious looking' noise floor is real and not some internal limiting problem. I backed off the input signal amplitude to make sure that some calculation was not saturating and it had no effect on the noise spurs amplitude.


Anybody have a ruler?

As I mentioned at the start – only if an affirmative answer is given to the speed, size and dynamic range questions should you start worrying about the “How is the FFT Scaled?” question.

Actually the Harmony FFT implementation is quite easy to work with. Most FFT implementations that you will find scale the amplitude with the size of the FFT or 'N'. This means that you have to apply a scale factor proportional to 1/N to the result to get a constant amplitude regardless of the FFT size. Not here, the proportional to 1/N scaling has already been applied, meaning that you will get the same amplitude output for any size FFT. Points scored for whoever wrote this FFT!

The amplitude that you get back depends on the format that you used in the first place. If you use a Q15 format FFT, for a full scale peak to peak sine wave input, you will get out a peak magnitude of (2^15)/2 or 16,384. Similarly the output of a Q31 format FFT will be (2^31)/2 = 1,073,741,824.

The ½ effect is due to the fact that the actual output of a FFT is a mirror of positive and negative frequencies with ½ the total power in each spectrum. Since we are only normally concerned with the positive frequency side, we observe that the power is effectively reduced by ½.

You can easily apply a constant multiplier to get any proper scale factor that you need to the FFT result. Remember that the input signal, if it is a sine wave will probably be measured as a peak to peak value, whereas the FFT Magnitude display shows the RMS value of the input signal at each specific frequency bin.

Also remember that any windowing applied to the time series or zero padding will effect the amplitude of the resulting FFT output also and this must be accounted for by proper scaling. You can check out some previous articles for more about how to do that [4] [5].


Summary

It is simply unbelievable what we can do now with a single chip micro-controller when they cost less than lunch and run at 200 MHz. It wasn't too long ago that I had a 33 MHz, 386 Desktop PC and this little 32 Bit PIC can do a FFT faster than that PC could!


Article References

[1] Lock-In-Amplifiers, Wikipedia

[2] Q / Fixed Point Notation formats, Wikipedia

[3] Most of the DSP library that really needs speed is provided in object file format or coded in assembly already. Using the 'Pro' level compiler will probably not increase the speeds reported here.
 
[4] Hageman, Steve, EDN Online, June 19, 2012, “Understanding DFT & FFT Iplementations”

[5] Hageman, Steve, EDN Online, August 6, 2015, “Real Spectrum Analysis with Octave & MATLAB”



Article By: Steve Hageman www.AnalogHome.com
 
We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project. 
 
This Blog does not use cookies (other than the edible ones).