Thursday, January 12, 2017

Numeric optimizations in C# for a faster DFT

Kirk: "More speed Scotty !!!"

Like Capt. Kirk, we engineers it seems, are never satisfied and always want more speed!

How to get more speed out of DFT calculations in C#

Currently the open source project: "DSPLib" [1] calculates Discrete Fourier Transforms (DFT) in double precision math, so the question can be asked: Could using some other data type produce faster DFT calculations? And by fast I mean fast enough to be worthwhile spending the time doing – like 50% faster as 10% faster would likely not be worth the effort.

After reading many articles and blogs on how the .NET framework, the CLR (Common Language Runtime) and Intel Processors handle math, one will find seemingly convincing evidence and very simple benchmarks that Double and Float math calculation takes about the same time on a modern processor. There is also some circumstantial evidence that Floating Point Math is as fast as integer math because the Floating Point Unit is so fast in modern CPU's. Likewise Int64's should theoretically be about the same speed as Int32's because people say that Intel processors calculate everything as Int64's and truncate for shorter data types anyhow.

But... How would all this apply to an actual DFT algorithm? A DFT moves around a fair amount of data in arrays. Can a different numeric format speed up the DFT calculations that I specifically want to make?

I didn't know, but I remembered my Mark Twain. He popularized the saying:  

     "There are three kinds of lies: lies, darned lies, and statistics."

Mr. Twain was a humorist, but if he was a programmer he might have changed the saying to,
     "There are three kinds of lies: lies, darned lies, and benchmarks." 

This applies aptly to what I was faced with – one can find many very simple “Benchmarks” that people have written to show that one numeric type is faster than another or that the numeric format makes no difference, the trouble is these don't store intermediate results or access arrays over and over the way a DFT does. Are they really applicable? I set out to find out…

My DFT Benchmark

I roughed out a basic DFT (All the same loops, arrays and multiples), I used look up tables instead of calculating the Sin/Cos terms in the loop as that is the way I would use them anyway and I did not do any further optimization or try to help or hinder the C# compiler from optimizing anything. I also did not invoke the task parallel extensions, which have been shown to immediately provide a 3x improvement in performance even with a dual core processor [1][2].

// Note: This is not a working DFT - For simplified timings only.
// It does have the same data structures and data movements as a real DFT.
public Int64[] DftInt64(Int64[] data)
    int n = data.Length;
    int m = n;
    Int64[] real = new Int64[n];
    Int64[] imag = new Int64[n];
    Int64[] result = new Int64[m];
    for (int w = 0; w < m; w++)
       for (int t = 0; t < n; t++)
          real[w] += data[t] * Int32Sin[t];
          imag[w] -= data[t] * Int32Cos[t];
       result[w] = real[w] + imag[w];
    return result;

Figure 1: The basic code that I used to benchmark my DFT. I wrote one of these for every numeric data type by changing the variables to the type tested. I initialized the input data and the Sin / Cos arrays (Int32Sin[] and Int32Cos[] above) with real Sin and Cos data once at the start of the program. I then called the DFT's over and over again until the elapsed time settled out, which I believe is an indication that the entire program was running in the processors cache or at least a stable portion of memory. This procedure was repeated for each DFT Length.

I wrote DFT routines to calculate in: Double, Float, Int64, Int32 and Int16 data types and compiled release code in VS2015 for x86 and x64 architectures. The results of these tests are presented in figure 2 below.

Figure 2A – The raw results of my real world benchmark for DFT's. All the times are in Milliseconds and were recorded with .NET's StopWatch() functionality. Two program builds were recorded. One for the x86 and one for the x64 code compilation types in Visual Studio 2015.

Figure 2B – Raw timing numbers are a little hard to compare and comprehend. To improve on this, the view above normalizes each row to the Double data type time for that size DFT, which makes it easier to compare any speed gains in the results. For instance: In the X64 Release version, for a DFT size of 2000, the Int16 data type took 0.28:1 the time of the Double data type (10mS / 36mS = 0.28).

Int16 Compared to Int32 
The Int16 data type timing is comparable to that of the Int32 for both the x86 and x64 compiled program. An interesting note is that the Int16, compiled as a x86 program actually starts to take longer than the Int32 as the DFT size gets really big. This is probably due to the compiled code having to continually cast and manipulate the Int16 values to keep them properly at 16 bits long.

The bottom line is: Int16's are no faster and in some cases longer than Int32's so there is really no point in considering them further I this discussion of speeding up a DFT calculation.

Using an Int16 would also severely limit the dynamic range to less than 96 dB. Many digitizers have better dynamic range than this now. I would consider this a big limiting factor.

Int32 Compared to Double 
The Int32 is much faster than a Double for small DFT's at all compilations (x86 or x64). However as the DFT size gets really big the speed difference disappears. This is especially true for the x86 Compilation where a large DFT is actually slower than the equivalent Double DFT. With x64 compilation, the Int32 is still faster even at large DFT's but it too shows this non-linear behavior as the DFT size increases. This non-linear behavior is probably due to the data arrays not fitting in the processors fastest cache as the arrays get larger.

Interesting Point #1: Benchmark timings can be data array size dependent and in this particular case are non-linear.

Int32 compared to Int64 
For the x86 compilation the Int64 is definitely a non-starter. In all cases tested the Int64 is slower than the equivalent Double calculation. This makes sense as all the address and register calculations would need to be stitched together 32 bit registers. In the x86 case the Int64 is actually slower than the Double data type for all DFT sizes tested!

With the x64 compilation the Int32 and Int64 data types are comparable across all DFT sizes.

It probably does not make any sense to favor the Int64 over an Int32 even for the increased dynamic range. A Int32 calculation can yield a 192 dB dynamic range. This is way more than most applications can use or will ever require.

Interesting Point #2: Somewhat unsurprisingly, when using the x86 Compilation, the use of Int64's is very time consuming, especially when compared to the Int32. More surprisingly is that the Int64 actually takes longer than the equivalent Double.

Float compared to Double 
In both x86 and x64 compilation the Float is quicker than the Double data type. Twice as fast at smaller DFT sizes, the speed gets comparable at very large DFT sizes. That is not the common result that you can glean from the internet. Most peoples benchmarks and discussions suggest the Double and Float data type will have the same execution time.

Interesting Point #3: Despite what the Internet says, The Float data type is faster than Double especially for small DFT sizes in this benchmark.

It is clear that for the fastest DFT calculations the x64 compilation should be used no matter what. This would not seem to be a hardship for anyone as 64 bit Windows 7 (and Win 10?) pretty much rules the world right now and everything from here on out will be at least 64 bits.

Even though the Float data type is usually faster than Double, the clear winner is to use the x64 compilation and the Int32 data type.

At the smaller DFT sizes the Int32 is nearly 3 times faster and even at really large DFT sizes the Int32 is still 25% faster than the Double data type.

Using an Int32 is not a resolution hardship in most real world cases. Most digitizers output their data as an integer data type anyway and a Int32 bit data type can handle any realistic 8 to 24 bit digitizer in use today.

Since the Int32 data type can provide a dynamic range of 192 dB. That seems to be plenty for a while at any rate. As a comparison the Float data type provides about 150 dB of dynamic range.

Next Steps 
Now I have some real and verified benchmarks that can lead me to the best bet on improving the real time performance of a .NET based DFT.

The next obvious steps to optimize this further for the 'Improved' DSPLib library is,

1) Apply Task Parallel techniques to speed the Int32 DFT up for nearly no extra work as the DSPLib library already does [1].

2) Having separate Real[] and Imaginary[] intermediate arrays probably prevents the C# array bounds checker from being the most effective. Flattening these into one array with double the single dimension size will probably yield a good speed increase. This however needs to be verified (Again: Benchmark). References 3 and 4 provide some information on how to proceed here.

At least I have separated the simplistic Internet benchmarks from the real facts as it applies to my specific DFT calculation needs.

The takeaway from all this is: To really know how to optimize even a relatively simple calculation, some time needs to be spent benchmarking actual code and hardware that reflects the actual calculations and data movement required.


[1] DSPLib: An open source FFT / DFT Fourier Transform Library for .NET 4
DSPLib also applies modern multi-threading techniques to get the highest speed possible for any processor as explained in the article above. 

[2] The computer that I ran these benchmarks on is a lower end dual core i7-5500U processor with 8 GB of RAM running 64 Bit, Windows 7 Pro. Higher end processors generally have more on board Cache and will generally give faster results especially at larger DFT sizes.

[3] Dave Detlefs, “Array Bounds Check Elimination in the CLR”

[4] C# Flatten Array

NOTE: The C# Compiler is always being worked on for optimization improvements. C# V7 is just around the corner in January 2017 and it has some substantial changes, these changes may also improve or change the current optimization schemes. If in doubt - always check what the current compiler does.

Article By: Steve Hageman 
We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

This Blog does not use cookies (other than the edible ones). 

Sunday, June 19, 2016

DSPLib – An Open Source FFT Library for .NET 4

There are many Open Source and Commercial implementations around the web and in textbooks for computing Fourier Transforms. Unfortunately most are flawed in a number of ways,
  1. They produce an un-calibrated result that changes depending on the number of points transformed. 
  2. They include no built in methods to scale for Windowing of the input data.
  3. They always have no proper way to measure noise accurately.
  4.  They don't size the returned spectrum to have just the real part of the spectrum. 
  5. They implement their own Complex number type. Ignoring .NET 4's built in Complex data type.
  6. They aren't complete. You have to add a bunch of helper routines every time.
  7.  They have restrictive Open Source Licenses.

All of these things take hours of tweaking to get a usable FFT or DFT running from even the best of the currently available libraries.

I decided to solve this problem for myself and my clients by making a pretty complete Fourier Transform Library that implements, 
  1. Properly Scaled Fast Fourier Transforms.
  2. Properly Scaled Discrete Fourier Transforms. 
  3. Properly Scaled Data Windowing.
  4. Proper functions to scale for noise and signals in the correct manor. 
  5. Signal Generation for Testing.
  6. Useful Array Math routines.

This work is the culmination of about 5 years worth of work using, revising and tweaking other libraries and implementations, both open source and commercial before I wrote my own.

DSPLib is the first Fourier Transform Library that can take any time domain signal input, like from an ADC, apply one of the 27 built in window types and produce a correctly scaled Spectrum Output for either signal or noise analysis with no code tweaking required at all.

The library is released under the very non restrictive MIT License and is essentially royalty free for any use, even commercial.

The complete write up is at – take a look and enjoy never needing to spend hours tweaking Fourier Transform code again.

Open Source .NET FFT Fast Fourier Transform Library Code
Open Source C#  FFT Fast Fourier Transform Library Code
Open Source .NET  DFT Discrete Fourier Transform Library Code
Open Source C#  DFT Discrete Fourier Transform Library Code
Open Source C# FFT Library,   Open Source .NET FFT Library
Open Source C# DSP Library Code
Open Source .NET DSP Library Code

Article By: Steve Hageman 

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project. 

This Blog does not use cookies (other than the edible ones). 

Monday, April 25, 2016

FFT's meet 200MHz, PIC32MZ Microprocessors


Lately I have been working on a series of Analytical Instruments that are roughly “Hybridized” Lock In Amplifiers[1]. These designs are Hybrids because the purely Analog Lock In Amplifier block diagram has been augmented to incorporate Digital Signal Processing functions by digitizing the waveforms then applying DSP Techniques (including FFT's) for noise reduction, reference signal phase comparisons and maintaining the quadrature phase lock of the synchronous detector function, which is also a digital processing block. In addition, sophisticated noise filtering can be accomplished all in software at very respectable update rates.

This is all made possible and simple because of the latest crop of single chip microprocessors that are capable of running at 200 MHz core frequencies and have specialized DSP functions like: DMA and DSP Processing instructions that allow them to beat even most specialized DSP Processors at their own game. The latest Microchip PIC32MZ processors even include free and quite capable DSP libraries so it really is one stop shopping now. All this power for less than the price of a good lunch, as these chips cost less than $10 each.

PIC32MZ FFT Performance

The MIPS core based PIC32MZ processors running at 200 MHz have the potential to do a really fast and big FFT. To find out just how well they do, I fired up my PIC32MZ EF / Connectivity Evaluation board from Microchip Technology (DM320007) to answer a few questions.

The first question is always: “How fast is the FFT?” There is, after all, no point in asking any other questions if a 1024 point FFT takes all day.

If the answer to the first question was reasonable, the next question should be: “How big of an FFT can I do?”, followed by: “What type of dynamic range can I expect?” and finally, perhaps more towards the implementation side is the question: “How is the FFT scaled, so I can get a calibrated response?”.

For the FFT implementation I used the DSP library included with version 1.07.01 (March 2016) of Microchips Harmony Software suite. This library works on the well known Q15 or Q31 bit fixed point schemes. If you are unfamiliar with Q15 and Q31, just think of them like signed 16 and 32 bit integers [2].

The Harmony libraries are easy to use and well documented. Only two functions need to be called to accomplish a FFT. An initialization routine must be called first (and only once) to load up the twiddle factors for the specific length of FFT. Then the FFT itself can be called to actually perform the FFT. To get more information on the libraries, open the Harmony Help and search for “DSP”.

The libraries contain both a Q15 and a Q31 bit version, so I bench-marked both. About the only guidance that Microchip gives in the documentation is that the Q15 version will probably be faster due to optimization, and my testing proved that to be true.

I used my PC and a quick C# program I wrote that uses a USB based Serial connection to load the test waveform down to the PIC's RAM and then to read the results back to the PC for final processing and display. The test waveform was a sine wave that was properly scaled to either 16 or 32 bits. Since the sine wave was calculated with .NET doubles the result is better accuracy than the final cast. This means that the sine wave distortion was not the limiting factor in dynamic range in any test case.

For timing I used the MIPS Core Counter. This counter runs at ½ the core frequency or in this case 100 MHz. The Core timer was read at the start of the routine and after the routine finished, then the counts are subtracted to get the counts between calling the routine and its return. This count is converted to the equivalent time in microseconds for the timings.

No interrupts were enabled in the program and the PIC program was running in a big loop, where the FFT functions were called in a blocking fashion. I allocated the data arrays at compile time and didn't use anything like malloc() when the program was running. I also used Microchips free version XC-32 compiler with the Harmony Framework and no optimizations turned on [3].

The Evaluation board is fitted with the largest PIC32MZ available, a PIC32MZ2048EFH144, this device has a whopping 2MB of Program Flash and 512kB of SRAM on board. This large amount of SRAM allows for some rather large FFT's to be run.

The whole program even with the command interpreter I built took less than 1% of the program flash memory.

Now - On with the show

The first test is naturally to find the FFT speed versus size of the FFT for both the Q15 and Q31 formats.

Table 1 – The FFT speed in micro-seconds was measured for each FFT size for both the Q15 and Q31 formats. The Compiler could only allocate enough memory for a 16k Q15 FFT and 8k Q31 FFT.

The Harmony documentation states that the FFT Initialization function is written in C code and as such, is sower. This is OK, as the initialization function only needs to be called once or if the FFT size changes.

Table 2 – The Q15 and Q31 Initialization function does take longer than the FFT itself, but this function only needs to be called once or if the FFT size changes. The table values are in micro-seconds.

Figure 1 – The Q15 and Q31 FFT times are plotted for a graphical comparison.

Determining the FFT Speed for various FFT sizes actually answered two questions, eventually as I increased the FFT size I ran into a point where the program execution crashed when I tried to run the FFT. I was able to coax a 16k Q15 and an 8k Q31 FFT out of the processor before it would not run anymore.

If your application needs to apply a window to the time data, that is another vector array that would need to be maintained in RAM. Vector averaging and display buffering also require vector arrays and these can quickly eat up all available RAM so that even these FFT sizes may not be achievable in a full featured application as RAM gets gobbled up very quickly as the FFT sizes increase.

Dynamic Range – Or how many bits is that?

Dynamic range in the FFT processing is an important factor I determining whether a Q15 or Q31 format FFT is needed.

For instance a 12 bit ADC can have around 72 dB of dynamic range with no averaging or processing gain applied and a 24 Bit ADC can have a basic 144 dB of dynamic range.

It wouldn't do you very much good to attempt to use a 24 Bit ADC with a 12 Bit dynamic range signal processing chain. You'd just be in the numerical noise with no way of averaging your way out of it.

To determine the FFT's dynamic range I generated a perfect sine wave with my PC and then scaled it to either Q15 or Q31 full scale format. Then I sent this waveform to the PIC and had the PIC do a FFT on the data. I returned the complex FFT result to the PC where I converted it to magnitude dB format for display.

Below is the results for a large FFT with a full scale signal for both the Q15 and Q31 format FFT's.

Figure 2 – The Q15 FFT has about a 74 dB full scale to numeric noise dynamic range.

Figure 3 – The Q31 format has a more 'Spurious' noise look to it but the full scale range is just around 140 dB. Good enough for 16, 18 and probably 24 bits depending on the applications exact needs. Note that the 'spurious looking' noise floor is real and not some internal limiting problem. I backed off the input signal amplitude to make sure that some calculation was not saturating and it had no effect on the noise spurs amplitude.

Anybody have a ruler?

As I mentioned at the start – only if an affirmative answer is given to the speed, size and dynamic range questions should you start worrying about the “How is the FFT Scaled?” question.

Actually the Harmony FFT implementation is quite easy to work with. Most FFT implementations that you will find scale the amplitude with the size of the FFT or 'N'. This means that you have to apply a scale factor proportional to 1/N to the result to get a constant amplitude regardless of the FFT size. Not here, the proportional to 1/N scaling has already been applied, meaning that you will get the same amplitude output for any size FFT. Points scored for whoever wrote this FFT!

The amplitude that you get back depends on the format that you used in the first place. If you use a Q15 format FFT, for a full scale peak to peak sine wave input, you will get out a peak magnitude of (2^15)/2 or 16,384. Similarly the output of a Q31 format FFT will be (2^31)/2 = 1,073,741,824.

The ½ effect is due to the fact that the actual output of a FFT is a mirror of positive and negative frequencies with ½ the total power in each spectrum. Since we are only normally concerned with the positive frequency side, we observe that the power is effectively reduced by ½.

You can easily apply a constant multiplier to get any proper scale factor that you need to the FFT result. Remember that the input signal, if it is a sine wave will probably be measured as a peak to peak value, whereas the FFT Magnitude display shows the RMS value of the input signal at each specific frequency bin.

Also remember that any windowing applied to the time series or zero padding will effect the amplitude of the resulting FFT output also and this must be accounted for by proper scaling. You can check out some previous articles for more about how to do that [4] [5].


It is simply unbelievable what we can do now with a single chip micro-controller when they cost less than lunch and run at 200 MHz. It wasn't too long ago that I had a 33 MHz, 386 Desktop PC and this little 32 Bit PIC can do a FFT faster than that PC could!

Article References

[1] Lock-In-Amplifiers, Wikipedia

[2] Q / Fixed Point Notation formats, Wikipedia

[3] Most of the DSP library that really needs speed is provided in object file format or coded in assembly already. Using the 'Pro' level compiler will probably not increase the speeds reported here.
[4] Hageman, Steve, EDN Online, June 19, 2012, “Understanding DFT & FFT Iplementations”

[5] Hageman, Steve, EDN Online, August 6, 2015, “Real Spectrum Analysis with Octave & MATLAB”

Article By: Steve Hageman
We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project. 
This Blog does not use cookies (other than the edible ones). 

Friday, February 5, 2016

Decoupling RF Circuits - Part 1

Several weeks ago I saw a RF circuit that scared me from the “Measurement Repeatability” standpoint. The circuit in question had RF at one end and DC at the other, simple enough and any number of modern devices contain such circuits. Whether it be a Bluetooth module or a PLL Synthesizer IC, a simple RF switch or as in this case one of those very useful little RF Power to DC converter IC's.

The design intent here was to make a decent RF power measurement at the input to the circuit and then digitize the output and pass that result on to a controller. The design was straight forward enough. The board was laid out with a reasonable SMA edge launch connector and passable RF trace design to the IC. What caught my eye however was the DC portion of the circuit layout. This should have been the easy part, but as it turns out the physical implementation of the DC circuit had a very large impact on the circuits RF performance.

Figure 1 – A simplified representation of the physical layout is shown above, straightforward enough, but as it turns out the DC portion of this circuit had a very large impact on its RF performance of this particular circuit. 
The DC output trace looked exactly like the RF input trace. Now the RF input trace would be properly terminated by whatever is connected to the SMA input one end and by the IC + Matching network on the other end. If you remember your transmission line theory a matched transmission line will have a low VSWR (also known as low reflection) and exhibit very little amplitude ripple due to mismatch and hence will have low measurement uncertainty.

An improperly matched transmission line will act like a resonant structure with the potential for an amazingly large Q. The DC output trace was clearly the same width as the RF input trace, it was also longer than the RF input trace (about 0.6” long). Since the operating frequency range of the circuit was upwards of 10 GHz, this means that the supposedly DC output trace would will likely resonate in the band of operation. Schematically the equivalent circuit could be modeled as in Figure 2.

Figure 2 – Above is a schematic representation of the circuit and below is the simplified transmission line equivalent. Estimating 0.1 to 0.2 pF of coupling across a small IC die like this seems entirely likely and at high frequency’s will certainly be a significant amount of coupling. The 3mm QFN also adds effective length to the circuit so it has to be taken into to account also if one wants to simulate the resonance peaks at the proper frequencies.

I Hooked the circuit board up to my network analyzer and measured the S11 and found the expected result. The input match was not well controlled and did indeed show all the characteristics of coupling to a high Q structure (Figure 3).

Figure 3 – The actual measured S11 (Input Match) to the circuit showed the expected and undesired coupling. The actual measured S11 was less than -5dB in a couple of spots. That poor of a match (under 3 dB!) can add quite a bit of amplitude uncertainty to any measurement.

To make sure that the coupling was indeed caused by the DC Output trace and not some other problem, I removed the QFN and replaced it by using two 100 ohm 0402 resistors connected in parallel to the ground plane on either side of the RF input trace directly at the QFN Pads, this gives a very low inductance and capacitance 50 ohm load to analyze just the RF input trace, I also modified the input matching to account for this. The results of just measuring the RF input trace are shown in figure 4.

Figure 4 – Repeating the S11 measurement with the IC replaced by two 100 Ohm 0402 resistors in parallel directly at the QFN input pins shows that the RF portion of the circuit was quite well behaved and actually had an outstanding match all the way up to 10 GHz.

Repeating the measurement a third time with just a 0.2 pF capacitor bridged across the RF trace to the DC output showed essentially the same results as figure 3, thus confirming the cause of the bad match as the DC output trace resonating. A simulation of the transmission line equivalent circuit of Figure 2B also showed about the same response as the actual measurements as is shown in Figure 5.

Figure 5 – Simulating the equivalent circuit of Figure 2B and we see essentially the same as the measured response, the first peak is at about 4 GHz as was measured. The continued sharp resonances in the simulation are because this simple linear simulation does not include real PCB and circuit losses, hence the simulated Q's are higher than actual.

Decoupling Is The Solution

The basic solution to this problem is to decouple the DC output trace properly.

Step #1 When designing the PCB in the first place, narrow the trace to a more reasonable width for a DC signal, narrowing this trace from 20 mils to 6 mils changes it's impedance from 50 ohms to around 95 ohms. There was simply no reason to have the DC trace the width of a 50 Ohm line other that it: “Looked Pretty”. By making the trace narrower it will have more loss at high frequencies and when it is a higher impedance, it will be easier to decouple.

Step #2 Is to keep the RF off the trace as best we can in the first place. This can be accomplished by decoupling the DC output from the trace. This solution can also be hacked on an already built PCB as a band-aid.

Decoupling essentially puts some sort of low pass filter between the IC and the output trace to keep the RF off the trace in the first place. This can be accomplished many ways here I will present one of my favorites, simple RC filtering.

By adding a simple and low cost RC Low Pass Filter as close as possible to the IC (Figure 6), RF can effectively kept off the output trace really reducing coupling.

Figure 6 – Modifying the original circuit by adding R2, R3, C1 & C2 to decouple the RF energy at the IC from getting on the long output trace and causing resonances can be accomplished by one of my favorite techniques shown above. I size the resistors and capacitors based on the technology that the majority of the rest of the design is using, normally this means either 0603 or 0402 sized parts.

Figure 7 shows the measurement results of this simple addition when it was hacked on to the test PCB.

In circuits like Figure 6 shows I usually start with 200 to 511 Ohm resistors as they are relatively flat over a large frequency range such as this and the largest COG capacitor I can find in whatever size the design is using, in this case it was a 1000 pF / 0603 Sized part. COG Temperature coefficient parts typically have better RF Performance because of lower losses.

The circuit of Figure 6 also adds a voltage divider that must be taken into account in the overall final design, in this case the A/D's gain was adjusted to compensate for the extra drop across the two 511 ohm resistors. Simulation also showed that the resistors can be made about 200 ohms each with nearly the same decoupling result.

Of note: Normally we would like to add a capacitor directly at every DC pin on the IC, but this particular pin is an OPAMP output and we don't want to make the OPAMP unstable with the addition of a capacitor directly at it's output.

Finally to test the decoupling circuit itself to make sure that no unruly resonances are occurring there, I connected SMA cables to both sides of the DC output trace and did a S21 through line measurement. The results shown in figure 8 show a nicely damped response with no funny resonances.

Variations on a Theme

There are naturally as many possible variations on making a suitable decoupling network as there are designers and there are many different circuit requirements too. For instance, the RC decoupling circuit of Figure 6 would not work very well for a DC bias line as it would produce a lot of voltage drop when the current get's into the MilliAmp range. Likewise the 3dB bandwidth of Figure 6 is just a little over 100 kHz, so using this decoupling network on a I2C or SPI line won't work very well either unless the clock frequency is very low. So there are many requirements to consider before selecting the actual circuit values.

Figure 7 – Adding Figure 6's decoupling to the original circuit tamed resonance on that long and wide DC trace to a very reasonable level, there is still coupling but the input Match is much more well behaved (Better than 17 dB everywhere). All this without even touching the actual input trace or match at all.

Figure 8 – S21 Measurement of the through response from one side of the long DC trace to the other shows a nicely damped response with no troublesome resonances anywhere. The reason the response is rising instead of falling is because at high frequencies the 1000 pF decoupling capacitors are actually inductive because of the package parasitics. 0402 Sized parts have less package inductance and for higher frequency circuits may be more appropriate.

Appendix: Sometimes finding the culprit resonance is not so easy

Sometimes it is not clear on a complex PCB what is causing the problem resonances, in these cases a bit of detective work is required. Remember that a poorly terminated transmission line will resonate when the frequency reaches a quarter wavelength and every quarter wavelength after that it will resonate with the opposite impedance.

Start off by measuring the resonant peak frequency or if multiple peaks, the delta frequency between peaks, then calculate the transmission line length that will give these frequency peaks and start looking for trace lengths on the PCB that match the calculation and pretty rapidly you will find the culprit. Sometimes you can also use the "Move your finger around touching things" method.

If your frequency sweep is from a low frequency and your circuit is fine, then suddenly there is a resonance, you are probably seeing a first quarter wavelength effect (Also see Reference [1]).

On the other hand if you are sweeping across some very high frequency band and you see some repeating ripple pattern then you are probably seeing a one half wavelength effect. At every half wavelength the impedance of "whatever" that is resonating will circle around to the same impedance again, hence giving rise to a repeating pattern.

In practice I have found problems in the range of 7 inches for a cable that ran off the PCB and then back on again causing a resonance to traces like this example PC that were a few tenths of an inch long.

Some common approximate frequencies and quarter wavelength for Microstrip on FR4 are given in the table below,

  The Table can be written as a formula like,

Where the Microstrip quarter wavelength on FR4 is in inches and F is the frequency in GHz.

Figure A.1 – As an example of the kind of repeating pattern that can be seen on a transmission line. Here an open circuited line was simulated. The Y axis here is Impedance Magnitude in Ohms and the X Axis is frequency.

The blue arrows on the bottom mark off ¼ wavelengths. As can be seen, a open circuit transmission line resonates every ¼ wavelength but the impedance at each ¼ wavelength goes from high impedance to low impedance and back again.

The red arrows at the top mark off ½ wavelength segments. On this open circuited line the ½ wavelength repeating pattern is at a very high impedance and there is also a repeating ½ wavelength pattern at low impedances (the peaks on the bottom of the graph, not marked).

As a note, a short circuited line has exactly the same pattern, it is just reversed in impedance. That is, instead of starting at a high impedance at low frequencies, it starts at a low impedance and the first ¼ resonant peak is at a high impedance, not the low impedance shown here. Then the pattern repeats the same as for this simulation.
The simulated response shown in Figure A.1 is for some randomly selected line length. Note that a longer line would produce peaks closer together while a shorter line would produce peaks that are farther apart. 


[1] A resonance effect on a RF circuit can also be caused by any shielding or box that the circuit is put into. You can read more about that here,

By: Steve Hageman
We design custom: Analog, RF and Embedded system for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project. 
This Blog does not use cookies (other than the edible ones).