Search This Blog

Monday, September 30, 2019

Make Your Code More Assertive!




Most if not all 32 bit processors have a software instruction to fire a ‘Breakpoint’, the MIPS core used by Microchip PIC32 processors is no different.

This can be used to make a “Debug Only Breakpoint” that is useful as a standard C ‘assert()’ alternative.

Assert is a method of adding check code into your program that can be used to check assumptions about the state of variables or program status to flag problems or errors.

Using assertions can dramatically reduce programming errors, especially the errors that occur when libraries are being used [1][2].

Asserts in a PC environment are pretty easy to use as there is normally console window available or disk file to log any asserted problems to.

In a small embedded system neither of these things is typically available. However, when developing and testing code a programmer / debugger is normally attached to the system and this can be used as the window into the systems operation.

What is needed is an assert macro that fires a software breakpoint and halts the program if the system is in a debugging state and ideally for the macro to generate no code if the system is built with a production state. When using MPLAB-X and XC-32, Microchips preferred development IDE and compiler, this is easy to do.

When building a Debug image, MPLAB-X via XC-32 defines a name: “__DEBUG” (Two leading underscores). This defined name can be used to build the assert macro two ways. 1) When debugging the macro is built and when not debugging the macro generates no code, as shown below.




A simple Macro that mimics the standard C assert() call for any PIC32 based project. The macro generates a check that calls the MIPS software breakpoint if the project was built with ‘__DEBUG’ defined. This name is automatically defined by MPALB-X when a debugging mage is built.

You can put code similar to the above into a C header file.Our new macro then works just like any standard C assert() macro,

   - assert_dbg(exp) where exp is non-zero does nothing (a non-assertion).

   - assert_dbg(exp) where exp evaluates to zero, stops the program on a software
     breakpoint if debugging mode is active in MPALB-X.

I chose to name the macro: “assert_dbg()” so that the association to a standard assert would be easy to remember (it works the same way) and I added the ‘_dbg’ suffix to show that something is slightly different here as a reminder to the programmer that this is like: “A standard assert, but slightly different”.

Naturally you can use any name you like.

Bonus Macro

While reading an article by Niall Murphy on the Barr Group website [3], I ran across a comment by ‘ronkinoz’ that shared another clever / useful debugging macro - I will call it ‘assert_compile()’ here.

What assert_compile() does is to check things that can be checked at compile time, if the assert fails then the compiler halts on an error.

The macro works by trying to define a typedef array with a size of one char if the assert passes, if the assert fails the array will be sized as ‘-1’ which GCC will halt on as a compile error.

This can be very useful in checking the size of known instances against known system constraints for instance.

One thing that often causes problems is making a EEPROM structure, like a Cal Constant table larger than the designed allocated space. This often happens when a project is revised and revised to add new features or as a product is developed and it is discovered that the calibration routines need to change.

You can catch things like getting a table too large and possibly overwriting other EEPROM allocations. The size of fixed objects is known to the compiler at compile time.

My first use is to peg my code to a compiler version. That way in a year or so if I forget to read my notes and try to compile the code under a different version, I will get a rude warning to take a careful look before continuing.

The macro is presented here,



References:

[1] https://www.microsoft.com/en-us/research/publication/assessing-the-relationship-between-software-assertions-and-code-qualityan-empirical-investigation/

[2] McConnell, Steve, “Code Complete, A Practical Handbook of Software Construction”, 2nd edition, page 189: “Assertions”. Microsoft Press, 2004.

[3]  https://barrgroup.com/comment/7#comment-7



Article By: Steve Hageman / www.AnalogHome.com

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Saturday, August 10, 2019

Microchip XC32 Compiler Optimization


It is often quipped that the humorist Mark Twain once said,

    “There are lies, darn lies and statistics”

If Mr. Twain was a programmer he might have said something to this effect,

    “There are lies, darn lies and benchmarks”

Benchmarking has a long history of being bent into something that produces: “a good number, that will have nothing to do with your actual situation”. This is especially true now that we have embedded processors with caches and all sorts of other performance enhancing and limiting factors.

In the 1980’s there was one benchmark test that really took hold. It was called the “Sieve of Eratosthenes” and it was a numerical / prime number test that was used quite a bit to see what compilers produced the 'fastest' code [1].

This benchmark really just looked at one aspect of performance: Calculating prime numbers. It didn’t test graphics, IO speed or a host of other factors that are also important in a real world applications. So it was very narrow minded in its scope.

It did gain such prominence that the compiler writers of the era actually wrote optimizers that would detect this code sequence and apply special techniques to get the fastest possible performance for this specific benchmark test. There was nothing against wrong in doing this – It is kind of like producing cars with a fast 0-60 MPH acceleration factor – they can detect the ‘pedal to the metal’ and change the shift points to get the fastest 0-60 time, which is great for bragging rights, but this is a rarely used sequence in real life driving. This is much the same, in my opinion with most modern microprocessor benchmarks.

Whats the point then?

This ‘Optimization’ project started for two reasons,

1) I was working on a battery powered image processing application and I wanted to see what the effects of the various XC32 optimization levels were to see if I could save a significant amount of power by switching to higher optimization levels and hence lower the CPU clock speed thus saving power.

2) To just see generally, what the effect of the various Pro XC32 optimization levels do compared to the free version. It has been noted online that some people feel like they are being ‘cheated’ from ‘significant’ performance improvements with the free version of the XC32 compiler. So, we’ll take a look at that.

Note: You may know that Microchip provides a free compiler for its PIC32 processor series called XC32. It is currently based on GCC 4.8 and the free version provides -O0 and -O1 optimization levels. The ‘paid’ version includes support and the other GCC optimization levels: -O2, -Os and -O3

The XC32 optimization levels are not exactly the same as the standard GCC levels but they roughly follow. The XC32 2.0x manual states,


Some Notes

I used the XC32 2.05 and 2.10 Version for these tests (These versions perform the same in my tests, the versions differences only add some new devices and fix some corner case defects as can be seen in the release notes).

When Debugging your program logic is it useful to set the optimization level to -O0 as this produces pretty much 1:1 code with your C code, this makes following along the logic and looking at variables easy. Also inlining of functions is disabled so functions appear as you wrote them.

-O1 is the standard optimization level for the free version of XC32 and even this level inlines functions and aggressively removes variables that can be kept on the stack or in a register making your code run much faster and be smaller, but also making debugging very hard to follow. This is the optimization that you most likely want to use for your ‘Release’ code, after all: Who doesn’t want faster / smaller code?

Many Standard libraries like the MIPS PIC32 DSP libraries are written in hand optimized MIPS assembly language and hence bypass the C compiler completely, so you get the fastest possible performance even with the free version of the compiler.
 

 -O2, -Os and -O3 Optimizations are only available with the paid version of XC32, which is quite inexpensive at < $30.00 per month and a must for professional developers if only for the included, expedited support that the license also includes.

[2020 Update] XC32 Version 2.40 and above now includes -01 and -02 optimizations in the free version. (See the release notes).

My hardware test setup for all these tests is a PIC32MZ2048EFH processor running at 200 MHz clock speed.

On to the Benchmarking

One of the standard benchmarks used with advanced 32 bit processors is the ‘CoreMark’ [2]. This is a multifaceted benchmark that try's to simulate many different aspects of an actual application, yet in the end produce a single performance number. When I looked at this I found that the code was quite small and didn’t move a lot of variables around so in execution it probably can spend most if not all of its time in any modern 32 bit processors cache.

Another application that I used in my benchmarking was a custom image processing application that I wrote. It is not big in the sense of requiring a large program memory footprint, but it does use upwards of 57,000 bytes of data as it process a 32 x 24 image up to 320 x 240 for display and along the way: scales, normalizes, applies automatic gain control, maps the resulting image to a 255 step color map, etc. So there is quit a bit of data manipulation along the way and all the data cannot fit in the PIC32MZ’s 16k data cache at once.

The last application I used was a relatively large application provided by Microchip as a demo program that represents a complete application with extensive graphics, etc.  This application was compiled just to see the relative code sizes that the XC32 compiler produced is, as I thought the CoreMark and my application were really too small for a realistic analysis of program size.

CoreMark Program Insights

The CoreMark site [2] provides results and provides the compilers “Command Line Parameters” that were used to compile the program. This information was very interesting as you will soon see.

First, let’s take a look at the CoreMark execution speed versus various XC32 compiler optimization levels. The CoreMark program when compiled at -O1 is only 32k bites long and uses only 344 bytes of data memory, so it is quite small and probably runs completely in the PIC32MZ processor cache, so speed of execution is all we can really look for here.

  
Figure 1 – This is the execution speed for the CoreMark with various optimization levels. All results were normalized to the -O1 level as this is the highest optimization for the free version of the XC32 Compiler. See the text for a discussion of each optimization.

I included the -O0 optimization level in Figure 1 just as a comparison to see what a huge difference even the free -O1 optimization level makes on code performance. The difference between -O0 and -O1 in the CoreMark (and nearly every other application I have ever compiled) is nearly 1.5:1, no other optimization makes that big of a jump. In fact all the other optimizations and ‘tweaks’ only produce marginal gains on the -O1 optimization. As noted the optimization level -O0 is really only useful in debugging code, no one would every build a final application with this optimization level unless they really just don’t like their customers.

-Os is ‘optimize for size’ and as expected it results in a nearly 10% performance hit here. The CoreMark program is really very small in memory footprint, so this option would be a waste of time in any small application like this one, it is included only as a reference.

-O1 is the default optimization for the XC32 compiler free version. All the other results are normalized to this result (100%).

-O2 and -O3 produce only marginal gains above the -O1 level. -O2 is 13% faster, -O3 is 17% faster.

-O2++ is the optimization that Microchip used for the benchmark results posted on the CoreMark site and as may be expected includes some undocumented command line options and some specific tweaks to get the performance up as much as possible. Again, there is nothing that says anyone can’t optimize specifically for the benchmark, only that others should be able duplicate the results, which I was able to do. Here is the command line options for -O2++ as I found them,

-g -O2 -G4096 -funroll-all-loops -funroll-loops -fgcse-sm -fgcse-las -fgcse -finline -finline-functions -finline-limit=550 -fsel-sched-pipelining -fselective-scheduling -mtune=34kc -falign-jumps=128 -mjals

The really interesting option here is this one: “-mtune=34kc” I could not find this option documented anywhere and I really did not have time to search through megabytes of source code to try to find it. But “34Kc” is a designation that MIPS uses to describe the core that the PIC32MZ is based on so it is some sort of optimization for this specific core.

Bottom line is that these optimizations produce the fastest result – 30% faster than -O1 alone, but it is only some 12% faster than the base -O3 optimization, so it is only marginally better than -O3 alone.

-O3++ is where I applied these same -O2++ command line options to the base -O3 optimizations just for fun. This produced a marginally faster result than -O3 alone but it was still slower than -O2++.

Interesting to see the results, but again, while CoreMark is more comprehensive than the 1980's 'Shieve' benchmark, the CoreMark probably does not apply to your application at all, it’s small and uses an incredibly small amount of data memory.

Image Processing Program Insights

This was an actual application that I wrote that takes raw data from a 32 x 24 pixel image sensor, converts the raw data into useful values, up interpolates the pixels to 320 x 240, and them scales and limits the values for display. The data formats were int32, int16, int8 and float data types. A large amount of data was processed, some 57,000 bytes for each image.

Reading the sensor and writing to the display are fixed and running as fast as they can as set by the limitations of the hardware interfaces, so nothing can be done about that. The experiment was to see if the central processing algorithms could be speed up enough that would allow me to slow down the CPU clock enough to save on battery power. Running some benchmarks was a first step in that determination.

    while(1) {
    GetSensorData();        // Fixed by how fast the sensor can be read.
    ConvertDataArray();
    BiLinInterpolate();
    GetImageAdjustments();  // User / GUI Interaction – get settings.
    ApplyExpansion();
    ApplyContrastCurve();
    ApplyBrightnessAdjustment();
    DisplayImage();        // Fixed by how fast the display can be written to.
    }


Figure 2 – The central image processing ‘loop’ consists of routines like this. At each step the data arrays were processed in deterministic loops. Some of the processing is redundant because multiple loops were used as each step consisted of one specific data operation. These loops could be combined if need be. But, without some profiling first the effort may have been in vain (See the conclusion), guessing almost never pays off in optimizing.


Figure 3 – The simplest optimization is to use the compilers built in ‘smarts’ to make the code faster. Here my simple but data intensive image processing program was optimized using various compiler settings and the speed of execution was measured. The optimization level -O1 was normalized to 100%.

Figure 3 is pretty straight forward. I timed only the image procession portion of the code, excluding the hardware IO as that is fixed by hardware constraints. Optimization level -O0 would never be used for released code, it is included here only as a comparison to show how aggressive the compiler gets even with the free -O1 optimization. Interestingly, option -O2 produced a slower result than option -O1 in this example, there is probably some data inefficiency going on with this option. As expected however option -O3 produced the fastest ‘standard’ result, but really only marginally faster than -O1 at around 10%.

XC32 also has some extra ‘switches’ that can be tweaked from the GUI. I set all these for the: “-O1++, -O2++ and -O3++ level tests. These switches are shown in figure 4.


 
Figure 4 – XC32 allows you to set these switches to turn more aggressive compiler options. As can be seen in figure 3 for the '-Ox++' results. It is faster, but not by a lot, typically less than 5%.

The bottom line here is: The XC32 free versions -O1 optimization was only slightly slower than the paid versions -O3++ maximum optimizations by less than 15%, OK but nothing really to get too excited about.

Large Application Insights

This large application is based on Microchips “Real Time FFT” application example program [3]. This is a fairly large application and it uses a lot of data, has a LCD and an extensive user interface, along with ADC, DAC drivers and FFT calculations. I don’t have the hardware to run this code, so I looked only at generated code size and data size. The Data Size held steady at 200 kBytes independent of the optimization level which is expected for a large graphics oriented program like this. The compiled program size was 279 kBytes when compiled with the optimization level of -O1.


Figure 5 – The Microchip application example: “Real Time FFT” was compiled at various optimization levels and the resulting program SIZE is shown plotted here. This is a rather big application at some 279 kBytes when compiled at -O1.  As can be expected when optimizing for absolute maximum speed (-O3++ see figure 4) the program gets much larger but at a huge cost in size.

As can be seen in Figure 5, The -Os optimization gave only a marginal size decrease of around 7% over the default -O1 optimization. -O3++ however grew very large, probably mostly due to the application of figure 5’s “Unroll Loops” switch. This switch forces the unrolling of all loops, even non-deterministic ones.

This result it to be expected, as any compilers ‘Money Spec’ is execution speed, not program size. Which for the majority of real world applications is the proper trade off. As my image processing application shows, the performance gains from -O1 to -O3 would be expected to be minimal and the trade off in program size might be excessive, especially if you are running out of space and can’t go to a larger memory device for some reason.

In a very large application code size may be a real issue because you want to save money by using a smaller memory chip. The -Os option probably isn’t the silver bullet you will be looking for as the < 10% code size savings are nearly insignificant. Microchip has an application note [4] that provides some interesting information on how -Os works and some more optimization tricks for code size. The tricks when applied result in less than a 2% improvement over the -Os optimization alone, so while this note is interesting reading, it provides no further improvement.

Conclusion

The only good benchmark is the one of your actual running application code. That being said, one item that can be gleaned from the experiments here is that the free XC32 compiler running at its maximum optimization level of -O1 produces code that is within 15-20% or so of the maximum optimizations possible with the XC32 paid compiler. The other item that can be gleaned here, is that by spending hours and hours you might be able to coax 20% or perhaps even 30% better performance by hand tweaking the compiler optimizations for the particular task at hand, but you can probably do the same or better right in your code by improving your base code algorithms.

It is always good to always remember the first and second rule of code optimizations however,

     #1 - Write plain and understandable code first.
     #2 - Don’t hand optimize anything until you have profiled the code and proven that the
             optimization will help.

Guessing at what to optimize is almost is never right and ends up wasting a lot of time for a very marginal performance increase and a highly likely increase in your codes maintenance costs.

I have found that there are all sorts of interesting articles out there on optimization with GCC and I have also found that unless they show actual results they are mostly someones 'Opinion' on how things work, or how things worked 15 years ago but don’t work that way today. For instance, I have found articles that state emphatically that 'Switch' statements produce far faster code than 'If ' statements, and I have found articles that say emphatically just the opposite. Naturally nether of these articles show any actual results, so I have learned to beware and check for myself. Which takes me back to the #1 rule of optimizing, don’t do it until you have proven that you need to do it. And yes, I fight this temptation myself all the time!

And finally, yes, finally… If you don’t have the XC32 paid version, don’t fret too much – in most applications the paid version will only provide a marginal gain in performance and size over the default and free XC32 -O1 optimization provided free. But, as always: "Your mileage may vary".

Appendix A –  General Optimization Settings – Best Practices with XC32

Best for Step by step Debugging

Set Optimization level to -O0 and set all other defaults to ‘reset’ condition. This will compile the code almost exactly as you have written it which will allow for easy line by line program trace during debugging. This is very helpful in tracing and verifying the programs logic.

   

Best settings for Speed or Size

Use these settings for maximum ‘easy’ speed in the XC32 free compiler. Meaning these settings will get you to around 80 to 90% of maximum speed possible. Any improvement over this will need to be accompanied by a very careful review of all the compiler options and their effect, actual time profiles of the code and possibly specific module optimizations.

Project Properties → xc32-gcc→ General



Project Properties → xc32-gcc→ Optimization
 
 

To Optimize a specific source file Individually

Right clock on the source file and select ‘Properties’
 
  



Then select: “Overwrite Build Options” and from there you can set the optimization level of the specific file separate from the main application settings.

  

Then select the xc32-gcc → Optimizations as above and have at it as detailed above.

Appendix B – Lists of the optimizations as applied by XC32 Version 2.05/2.10

Optimizations vs. -Ox level, All other settings at default levels.

Note: Use “-S -fverbose-asm” to list every silently applied option (including optimization ones) in assembler output.

# -g -O0 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcommon -fdebug-types-section
 # -fdelete-null-pointer-checks -fearly-inlining
 # -feliminate-unused-debug-types -ffunction-cse -ffunction-sections
 # -fgcse-lm -fgnu-runtime -fident -finline-atomics -fira-hoist-pressure
 # -fira-share-save-slots -fira-share-spill-slots -fivopts
 # -fkeep-static-consts -fleading-underscore -fmath-errno
 # -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return
 # -fpeephole -fprefetch-loop-arrays -fsched-critical-path-heuristic
 # -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
 # -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
 # -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fshow-column
 # -fsigned-zeros -fsplit-ivs-in-unroller -fstrict-volatile-bitfields
 # -fsync-libcalls -ftoplevel-reorder -ftrapping-math -ftree-coalesce-vars
 # -ftree-cselim -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im
 # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
 # -ftree-phiprop -ftree-pta -ftree-reassoc -ftree-scev-cprop
 # -ftree-slp-vectorize -ftree-vect-loop-version -funit-at-a-time
 # -fverbose-asm -fzero-initialized-in-bss -mbranch-likely
 # -mcheck-zero-division -mdivide-traps -mdouble-float -mdsp -mdspr2 -mel
 # -membedded-data -mexplicit-relocs -mextern-sdata -mfp64 -mfused-madd
 # -mgp32 -mgpopt -mhard-float -mimadd -mlocal-sdata -mlong32 -mno-mdmx
 # -mno-mips16 -mno-mips3d -mshared -msplit-addresses


# -g -O1 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcombine-stack-adjustments -fcommon -fcompare-elim
 # -fcprop-registers -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fearly-inlining
 # -feliminate-unused-debug-types -fforward-propagate -ffunction-cse
 # -ffunction-sections -fgcse-lm -fgnu-runtime -fguess-branch-probability
 # -fident -fif-conversion -fif-conversion2 -finline -finline-atomics
 # -finline-functions-called-once -fipa-profile -fipa-pure-const
 # -fipa-reference -fira-hoist-pressure -fira-share-save-slots
 # -fira-share-spill-slots -fivopts -fkeep-static-consts
 # -fleading-underscore -fmath-errno -fmerge-constants
 # -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer
 # -fpcc-struct-return -fpeephole -fprefetch-loop-arrays
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fshow-column -fshrink-wrap -fsigned-zeros
 # -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-volatile-bitfields
 # -fsync-libcalls -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop
 # -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
 # -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert
 # -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
 # -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
 # -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize -ftree-slsr
 # -ftree-sra -ftree-ter -ftree-vect-loop-version -funit-at-a-time
 # -fvar-tracking -fvar-tracking-assignments -fverbose-asm
 # -fzero-initialized-in-bss -mbranch-likely -mcheck-zero-division
 # -mdivide-traps -mdouble-float -mdsp -mdspr2 -mel -membedded-data
 # -mexplicit-relocs -mextern-sdata -mfp64 -mfused-madd -mgp32 -mgpopt
 # -mhard-float -mimadd -mlocal-sdata -mlong32 -mno-mdmx -mno-mips16
 # -mno-mips3d -mshared -msplit-addresses


# -g -Os -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcombine-stack-adjustments -fcommon
 # -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps
 # -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fdevirtualize -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffunction-cse -ffunction-sections -fgcse -fgcse-lm
 # -fgnu-runtime -fguess-branch-probability -fhoist-adjacent-loads -fident
 # -fif-conversion -fif-conversion2 -findirect-inlining -finline
 # -finline-atomics -finline-functions -finline-functions-called-once
 # -finline-small-functions -fipa-cp -fipa-profile -fipa-pure-const
 # -fipa-reference -fipa-sra -fira-hoist-pressure -fira-share-save-slots
 # -fira-share-spill-slots -fivopts -fkeep-static-consts
 # -fleading-underscore -fmath-errno -fmerge-constants
 # -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer
 # -foptimize-register-move -foptimize-sibling-calls -fpartial-inlining
 # -fpcc-struct-return -fpeephole -fpeephole2 -fprefetch-loop-arrays
 # -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fshrink-wrap
 # -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types
 # -fstrict-aliasing -fstrict-overflow -fstrict-volatile-bitfields
 # -fsync-libcalls -fthread-jumps -ftoplevel-reorder -ftrapping-math
 # -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch
 # -ftree-coalesce-vars -ftree-copy-prop -ftree-copyrename -ftree-cselim
 # -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
 # -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
 # -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
 # -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
 # -ftree-slp-vectorize -ftree-slsr -ftree-sra -ftree-switch-conversion
 # -ftree-tail-merge -ftree-ter -ftree-vect-loop-version -ftree-vrp
 # -funit-at-a-time -fuse-caller-save -fvar-tracking
 # -fvar-tracking-assignments -fverbose-asm -fzero-initialized-in-bss
 # -mbranch-likely -mcheck-zero-division -mdivide-traps -mdouble-float
 # -mdsp -mdspr2 -mel -membedded-data -mexplicit-relocs -mextern-sdata
 # -mfp64 -mfused-madd -mgp32 -mgpopt -mhard-float -mimadd -mlocal-sdata
 # -mlong32 -mmemcpy -mno-mdmx -mno-mips16 -mno-mips3d -mshared
 # -msplit-addresses


# -g -O2 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcombine-stack-adjustments -fcommon
 # -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps
 # -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fdevirtualize -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffunction-cse -ffunction-sections -fgcse -fgcse-lm
 # -fgnu-runtime -fguess-branch-probability -fhoist-adjacent-loads -fident
 # -fif-conversion -fif-conversion2 -findirect-inlining -finline
 # -finline-atomics -finline-functions-called-once -finline-small-functions
 # -fipa-cp -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
 # -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots
 # -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno
 # -fmerge-constants -fmerge-debug-strings -fmove-loop-invariants
 # -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
 # -foptimize-strlen -fpartial-inlining -fpcc-struct-return -fpeephole
 # -fpeephole2 -fprefetch-loop-arrays -fregmove -freorder-blocks
 # -freorder-functions -frerun-cse-after-loop
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fschedule-insns -fschedule-insns2
 # -fshow-column -fshrink-wrap -fsigned-zeros -fsplit-ivs-in-unroller
 # -fsplit-wide-types -fstrict-aliasing -fstrict-overflow
 # -fstrict-volatile-bitfields -fsync-libcalls -fthread-jumps
 # -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars
 # -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce
 # -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
 # -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
 # -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
 # -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
 # -ftree-slp-vectorize -ftree-slsr -ftree-sra -ftree-switch-conversion
 # -ftree-tail-merge -ftree-ter -ftree-vect-loop-version -ftree-vrp
 # -funit-at-a-time -fuse-caller-save -fvar-tracking
 # -fvar-tracking-assignments -fverbose-asm -fzero-initialized-in-bss
 # -mbranch-likely -mcheck-zero-division -mdivide-traps -mdouble-float
 # -mdsp -mdspr2 -mel -membedded-data -mexplicit-relocs -mextern-sdata
 # -mfp64 -mfused-madd -mgp32 -mgpopt -mhard-float -mimadd -mlocal-sdata
 # -mlong32 -mno-mdmx -mno-mips16 -mno-mips3d -mshared -msplit-addresses


# -g -O3 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcombine-stack-adjustments -fcommon
 # -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps
 # -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fdevirtualize -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffunction-cse -ffunction-sections -fgcse
 # -fgcse-after-reload -fgcse-lm -fgnu-runtime -fguess-branch-probability
 # -fhoist-adjacent-loads -fident -fif-conversion -fif-conversion2
 # -findirect-inlining -finline -finline-atomics -finline-functions
 # -finline-functions-called-once -finline-small-functions -fipa-cp
 # -fipa-cp-clone -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
 # -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots
 # -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno
 # -fmerge-constants -fmerge-debug-strings -fmove-loop-invariants
 # -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
 # -foptimize-strlen -fpartial-inlining -fpcc-struct-return -fpeephole
 # -fpeephole2 -fpredictive-commoning -fprefetch-loop-arrays -fregmove
 # -freorder-blocks -freorder-functions -frerun-cse-after-loop
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fschedule-insns -fschedule-insns2
 # -fshow-column -fshrink-wrap -fsigned-zeros -fsplit-ivs-in-unroller
 # -fsplit-wide-types -fstrict-aliasing -fstrict-overflow
 # -fstrict-volatile-bitfields -fsync-libcalls -fthread-jumps
 # -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars
 # -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce
 # -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
 # -ftree-loop-distribute-patterns -ftree-loop-if-convert -ftree-loop-im
 # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
 # -ftree-partial-pre -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc
 # -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize -ftree-slsr
 # -ftree-sra -ftree-switch-conversion -ftree-tail-merge -ftree-ter
 # -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time
 # -funswitch-loops -fuse-caller-save -fvar-tracking
 # -fvar-tracking-assignments -fvect-cost-model -fverbose-asm
 # -fzero-initialized-in-bss -mbranch-likely -mcheck-zero-division
 # -mdivide-traps -mdouble-float -mdsp -mdspr2 -mel -membedded-data
 # -mexplicit-relocs -mextern-sdata -mfp64 -mfused-madd -mgp32 -mgpopt
 # -mhard-float -mimadd -mlocal-sdata -mlong32 -mno-mdmx -mno-mips16
 # -mno-mips3d -mshared -msplit-addresses

References

[1] Some of the original BYTE magazine articles dealing with the Sieve of Eratosthenes

A High-Level Language Benchmark by Jim Gilbreath
BYTE Sep 1981, p.180
https://archive.org/details/byte-magazine-1981-09/page/n181

Eratosthenes Revisited: Once More through the Sieve by Jim Gilbreath and Gary Gilbreath
BYTE Jan 83 p.283
https://archive.org/details/byte-magazine-1983-01/page/n291

Benchmarking UNIX systems by David Hinnant
BYTE Aug 1984 p.132
https://archive.org/details/byte-magazine-1984-08/page/n137

[2] Coremark Benchmark
www.eembc.org/coremark/

[3] Microchip Technology, example application. Located in the Harmony install directory at,
    .../apps/audio/real_time_fft

[4] Microchip Technology, “How to get the least out of your PIC32 C compiler”,
https://www.microchip.com/mymicrochip/filehandler.aspx?ddocname=en557154


Article By: Steve Hageman / www.AnalogHome.com

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Monday, July 1, 2019

Microchip PIC32 Harmony Bare Metal Programming

All Microprocessor manufacturers now provide some sort of factory sponsored C language tool set and they also provide some High Level Abstraction Libraries (HAL) to help users configure and use their parts ‘easily’ without having to spend days wading through register maps in 500 page datasheets.

When Microchip first released their PIC32MX parts they also provided a HAL library that they called the ‘PLIB” or “Peripherals Library” to help users along.

For instance to setup a PIC32MX SPI clock using the original PLIB implementation one would need to call this function,

     SpiChnSetBrg(chn, brg);

This is still better than having to dig into the data sheet to determine all the register names and addresses directly, but it isn't all that friendly either and you still need intimate knowledge of the hardware to set the ‘brg’ register to get the proper SPI clock rate.

After this initial PLIB library was released Microchip correctly realized that the 32 Bit Microprocessor market is a commodity market and that they would have to differentiate themselves in some way. Since the hardware is a commodity, they chose to differentiate themselves with the software layer.

They named this new software “Harmony’ and discontinued the original PLIB concept going forward.

In Microchips marketing literature they touted Harmony as the “Do all, end all Framework” as it encompasses not only configuring the chips peripheral devices but includes a lot of higher level system functions like: File Systems, Graphics, Bluetooth and RTOS support to name a few, all under one roof.

Confusion Reigns, sowing Dis-Harmony

This has all caused some confusion as many people now think that to use Harmony means: Bringing in a lot of ‘Framework Baggage’ - this is simply not the case and since Microchip has not sought to clarify this, the rants about Harmony being just a lot of ‘Framework Baggage’ continue unabated.

What Microchip needed to do was write a simple application note describing how to use Harmony as a simple chip configurator or how to use harmony in ‘Bare Metal’ mode. Harmony is a decent chip configurator, about the same as everyone else’s and it is way better than resorting to the ‘Assembly Language’ approach of writing all the peripheral libraries yourself from register maps.

So I will take it upon myself to write a brief tutorial on what Microchip should have done some time ago to clarify the situation.

Note: This tutorial is based on Harmony 1.4x and 2.0x, Harmony 3 is still being flushed out, but I would expect it will follow that same basic principles as Harmony 1 and 2. Currently (July 2019) the first release of Harmony 3.0 just adds support for the Atmel ARM Chips.

The Missing Application Note: “Using Harmony as a Chip Configurator in Bare Metal Mode”

Using Microchip Harmony as a chip configurator will allow speedy configuration of nearly all of the peripherals in a PIC32 microprocessor and also stub out an application that will at the very minimum will get the chip up and running. Harmony will also setup the proper header paths so that the current High Level PLIB commands correctly target your selected chip. Another advantage of using harmony is in maintenance and updating the application later, since all the links in the libraries will be correctly kept up to date as the chip configuration or even the chip itself changes.

Prerequisites: You should be familiar with both of these Microchip provided tutorials,

     “MPLAB Harmony Tutorial – Creating Your First Project”
     “Creating a “Hello World” application Using the MPLAB Harmony Configurator”


These tutorials will help you in figuring out the steps required in operating the Harmony Configurator IDE and as with every manufacturers configuration solutions, there is at least a days worth of learning curve that needs to be overcome before you can be proficient and get off on your own. The tutorials above will help you through that first day learning process [1].

We will only be discussing the proper setup of the peripherals and how to get the generated application to “Bare Metal Mode” in this note using the SPI as an example.

Setting Up the Clock and IO Pins

Follow the procedure in the tutorials listed above to set the Clock and IO pins. There is no difference between the “Bare Metal Mode” described here and the procedure as described in the tutorials.

Setting Up and Using Peripherals

Microchip provides two types of drivers in Harmony: “Static” and “Dynamic”. Refer to Reference [2] for a discussion of how each driver type works. We will be focused on “Static” drivers here as they provide the expected “Bare Metal” performance with no additional software overhead required.

SPI Example

A basic “Bare Metal” setup would be for a plain old SPI configuration, just “Point and Shoot” with no interrupts, FIFO’s or external buffers, etc.  As you gain experience you can add those later if you wish either by yourself or with Harmony’s help. Configuring most peripherals in Harmony is done in 3 easy steps.

Step 1 - Expand the “Drivers” section as shown below



Step 2 - Expand the SPI section as below,


The settings shown above will be the simplest possible. For instance,

Standard Buffer: Not all PIC32’s have an enhanced (or FIFO) buffer, so using the standard buffer (or FIFO disabled) is safest.

8/16/32 bit Mode: Set this to what your SPI Devices expect as far as bit width. This parameter is easily changed on the fly using other PLIB_SPI commands.

Number of SPI Instances: Set this to the number of SPI peripherals you will be using. Here we will just be using one.

Step 3 - Now you have to configure one or more of the actual SPI hardware peripherals of the chip as shown below,

Here “Instance 0” is set to the Physical Hardware SPI peripheral: “SPI_ID_1”. Most of the other information is as described in the data sheet. Of note: it is nice to be able to set the SPI Clock rate and not have to worry about the frequency setting registers directly. Also of note, just because I set in this example 1 MHz, it does not mean that the SPI clock will end up being exactly 1 MHz. The configurator will simply pick the closest register settings that get closest to 1 MHz. If you want to know exactly the clock rate you can find the clock rate equation on the parts data sheet and calculate what it will exactly be from that.

For the “Clock Mode” setting there are 4 options, while most people are familiar with the usual SPI terminology of “SPI Mode 0” through “SPI Mode 3” or the slightly more descriptive: “CPOL” and “CPHA” for clock polarity and clock phase [4]. The figures below should help with the SPI setup.



Harmony configurator setting for SPI Mode is selected from one of the 4 options from the dialog as shown above.


Table 1 - The table above will help in translating from the standard SPI Mode number to the Harmony descriptive text.

We have now setup a basic instance of SPI peripheral #1, at this point we should have also configured the clocks and IO pins as per the earlier tutorials, this includes setting the SPI Peripheral #1 to the proper IO pins on the chip. Save and generate the code as per the tutorials, then do a compile to make sure everything is OK.

What Harmony Builds

Before we can get into using our “Bare Metal” SPI we need to decide what portions of the generated Harmony Code we will use.

As can be seen below Harmony builds for us a: main.c, app.c and its corresponding app.h files.


Figure 6 - The most important Harmony built files that we will explore here as shown in the MPLAB-X IDE.

When the program first starts it loads and runs some C startup code, then execution starts at main() and this is located in the file “main.c” as shown,
 
 
Figure 7 - The main() as built by Harmony. You can’t get much simpler than this.

Harmony has built a completely working project at this point that will,

1) Configure the clocks and hardware as setup by the user via the function SYS_Initialize().
2) Enter an infinite loop that calls a function called SYS_Tasks().

Taking a look at SYS_Initialize() we see another easy to understand function,



Figure 8 - SYS_Initialize() function as built by the Harmony Configurator. This unction is where all the clocks and peripherals get initialized.

As can be seen in the SYS_Initilize() function above, its function is straightforward. This is where the chip and any peripherals get initialized. First the clocks are initialized, then the ports are initialized, then you can see that the SPI instance is initialized. Note: This SPI initialization function contains all the commands to initialize the SPI that we set on the “SPI Driver Instance 0” Harmony configuration page above, it can be very informative to navigate to and look at this function if you have the need to change anything on the fly later in your program as all the configuration is done via PLIB_SPI functions.

The interrupts (if any) are next. At the very end we see that a call is made to a function APP_Initialize().

APP_Initialize() is located in app.c. APP_Initialize is a great place to set any user initialize functions that your application may need such as, initial conditions or flashing splash screens, etc. It is the place for anything that needs to be run or set once at the beginning of your application.

Figure 9 - As initially built, APP_Initialize() has some comments and sets the default state of the ‘roughed out’ state machine that Harmony builds at first. If you are not going to use the state machine approach, you can delete all of the contents of APP_Initialize() and put in here anything that your application needs to run once at the launch of your application.

Now that we are done with APP_Initialize() and SYS_Initialize() we are back in main().

The next part of main() (See main.c above) is a forever while loop that repeatedly calls SYS_Tasks(), drilling into SYS_Tasks(), which is located in a generated file named: system_tasks.c we find the following code,

 
Figure 10 - SYS_Tasks() Code as generated by Harmony.

As can be seen in the function SYS_Tasks(), there is a call to “Maintain Device Drivers”, but since we have set the driver to ‘Static’ and will be using the driver directly this function really will do nothing more than check a few flags and the return.

The next call is to APP_Tasks() which is located in the file app.c.

APP_Tasks() stubs out a basic State Machine that we can either use or not as shown below,
  

Figure 11 - App_Tasks() as the default build generated by Harmony.

Many (Most?) people don’t like to use a State Machine, preferring instead to use a ‘Big Loop’ implementation, no worries, this is easy to fix as shown below,
  

Figure 12 - APP_Tasks() modified into a ‘Big Loop’ as many people prefer.

If you are familiar with how the Arduino stubs out a sketch (program) then all this should also look familiar to you. Think of APP_Initialize() as the Arduino function: “setup()” and the APP_Tasks as the Arduino function “loop()”. As with the Arduino when you return from ‘loop()’ the hidden Arduino main() will call it again after doing any necessary housekeeping, similarly if you return from APP_Tasks() the Harmony supplied main() will service any needed tasks and call APP_Tasks() again, endlessly. Using APP_Tasks() this way allows any required internal tasks to be serviced in main() as required (See also Figure 10 above: SYS_Tasks()).

Note to all you 'Super Control Freaks' out there: Many people seem to not like ceding even this much structure to Harmony, and alternatively, you can simply delete all references to APP_Initialize() and APP_Tasks() from the project and start your program directly in main().

Adding the “Bare Metal” SPI commands

Harmony has a big advantage on many competitive software HAL’s in that they actually have good and up to date documentation, both printed and in Compiled Help HTML (CHM) formats. The CHM help file is located at,

    C:\microchip\harmony\v2_05_01\doc\help_harmony.chm

Naturally, you will have to change the path segment “v2_05_01” to your specific version of Harmony path.

I prefer to use the CHM help file format as it is searchable and example code is easily clipped from it.

Searching the CHM file for SPI leads one to a document titled “Library Interface” which describes all the possible SPI API calls available to you in the Harmony HAL.



Figure 13 - The “help_harmony.chm” compiled help file is very easily searched to find the driver API calls and example code. Here we easily locate the SPI library information.

As an example lets find the call to change the SPI clock speed to contrast how much easier the Harmony PLIB HAL is compared with the original PLIB library call presented earlier.




Figure 14 - A small portion of the help docment that is used to set a SPI Peripherals Clock speed (or as they call it: Baud Rate).

To use this you only need to know the peripheral bus clock speed that you set in the Harmony Clock Configurator, which in my case was 100 MHz as I am using a PIC32MZ chip that is running on a 200 MHz system clock. Then you also input the desired SPI clock speed and the function will do the rest.

We also need to get the proper module_id index, if you touch the blue hyperlink as shown in the figure above, the help file will show you the definition for the SPI_MODULE_ID enum as follows,
  
 
Figure 15 - Touching the SPI_MODULE_ID enum hyperlink in the CHM file will bring up the definition of that enum.

We know that we setup the SPI module as the physical peripheral #1 so we would want to use: “SPI_ID_1” in all our code when working with this peripheral.

Let’s add this SPI clock rate set to our APP_Initialize() code as follows,


Figure 16 - Making a PLIB_SPI function call to set the SPI_1 clock speed to 10 MHz. The 100 MHz is the SPI Peripheral bus speed.

You will notice that I did not have to add in any extra header files for this PLIB_SPI function to be recognized. Harmony has done that setup for us the moment we told it to use the SPI Driver library and placed the proper header files in “app.h” for you.

As you go around exploring in Harmony you will also find that you can get the clock speeds through API calls. In this instance you can get the SPI default clock source PBCLK2 value by using the call,

    SYS_CLK_PeripheralFrequencyGet(CLK_BUS_PERIPHERAL_2)

Then the code example can be made more robust to future clock changes by making the call like this,


Figure 17 - Changing the SPI Clock Speed using not only the PLIB_SPI function, but also getting the peripheral bus clock frequency at runtime.

Notice that you do not need to add in any header functions for the library “SYS_CLK” either as Harmony has added those linkages for us in “app.h” when we added the specific driver to the Harmony Configurator.

Now let’s add a SPI write command to the ‘Big Loop’ that we wrote in APP_Tasks(), as shown below,


Figure 18 - The function APP_Tasks() above writes byte 0x55 to the SPI port once every second, forever.

At this point some have asked: “Isn’t using the registers directly faster and smaller than calling these PLIB functions?”, the answer is: “Probably not”. The simple reason is that the definition of most of these PLIB functions is to include an ‘inline’ declaration. This tells the compiler to prefer to ‘inline’ the generated code instead of making a function call. As long as the XC32 compiler optimization level is set to ‘O1’ or greater, then inlining of functions is enabled. Note: You will find that optimization level ‘O1’ is quite aggressive and really minimizes the generated code [4].


Figure 19 - Almost all of the Harmony PLIB statements are marked to the compiler as “Inline”

PLIB_SPI_BufferWrite() is declared “inline” by this “PLIB_INLINE_API” statement.

PLIB_SPI_BufferWrite() then calls another abstracted function and that function is also declared as ‘inline,’ so the chances of really having these PLIB function calls end up as ‘inline code’ is pretty high.

Other Useful SPI Functions

Browse around the help_harmony SPI “Library Interface” for other useful SPI functionality. Basically if you need to do something with the SPI it will be there, including a function to check to see if the SPI is done transmitting the last character,

    PLIB_SPI_IsBusy(SPI_ID_1);

This is a useful command to check to see if the SPI hardware is done with the last character being sent before sending another character.

Other Useful PLIB Libraries

Other common PLIB libraries are available for nearly all the PIC32 series peripherals. To find the overview page of all of them, simply search for: “Peripheral Libraries Help” in the help_harmony.chm help file as shown below,



 
Figure 20 - This simple search will get you to a list of all the PLIB API’s in Harmony. 


Conclusion

I have shown how to use the Harmony Configurator and the Harmony Static Peripheral Libraries as a means to do “Bare Metal” programming on a PIC32 device peripheral. Step by step, the code startup process was described and followed via the Harmony generated application files: main.c, app.h and app.c. Also shown was how to find and use the other “Bare Metal” or as Microchip Harmony calls then the “Static” peripheral drivers.

In future projects you can simply setup everything in Harmony, generate the code, then focus on only App_Initialize() and APP_Tasks() and you don’t have to worry about anything else that Harmony generates or calls first. This is pretty much what every Microprocessor suppliers code configurators do and the advantage of using Harmony is it has setup a consistent HAL layer of software that is portable between projects and even between most chips in the PIC32 series.

Harmony when used this way does not overwhelm the chip with a large or burdensome ‘Framework’ as many people seem to think and fear, in fact the Harmony ‘footprint’ on a PIC32 is quite small and the biggest point of all: The Harmony PLIB’s get your project running quickly with very little detailed knowledge of the hardware registers required.

References:

[1] Microchip also has a Youtube Channel where you can find some useful Harmony Tutorials.
https://www.youtube.com/watch?v=wyUPpTfxRDE&list=PL9B4edd-p2agL42JeLytmJrRNh6CsyGiS


[2] Harmony static and dynamic driver explanation,
https://microchipdeveloper.com/harmony:intro-flexibility#Static_Dynamic


[3] Wikipedia article on SPI,
https://en.wikipedia.org/wiki/Serial_Peripheral_Interface

[4] https://analoghome.blogspot.com/2019/08/microchip-xc32-compiler-optimization.html

[Edit]
12May20 - Table 1 was incorrect. Changed the text around the 'Big Loop'.

Article By: Steve Hageman / www.AnalogHome.com

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Monday, June 24, 2019

Do Your Customers A Favor – Make your Data Plots Useful

Make your data sheets more useful by providing graphs that actually help your customers. How so? Read on...

Example 1 -

Jim Williams and Guy Hoover designed a low distortion oscillator some time back [1]. This application note made use of a LED controlled Photo Resistor that was used to adjust the gain of the oscillator. This part has since gone obsolete and the replacement part provides a curve of LED Current Vs. Resistance that looks like this,


Technically the above curve is correct, but, the interesting part of the curve is in the range of 0 to 5 mA, that is where all the useful resistance change happens. If the curve is plotted on a log chart then it becomes much more customer friendly or useful as shown below,



Same data, simply plotted with the X-Axis set to logarithmic. Now the curve is useful to the end customer, who incidentally is a circuit designer who needs to know how the resistance changes in the useful control range of LED current. For extra credit: Also plot a curves of the response over temperature as this information is needed for a complete circuit design.


Example 2 -

A certain well respected manufacturer of IC’s sells a low noise OPAMP and provides a curve like this one to document the Voltage Noise Vs Frequency. As in the last example the interesting part of the
curve to any designer is the very first data point of this chart, not the 99.9% shown.


This curve is better than most manufacturers in that it seems to be a 'Real' instrument trace instead of the 'Artists Conception' that we normally see on data sheets, but the useful noise detail in the low frequency 1/f region is obscured. This could have been easily solved by switching the instrument to measure ‘Log Frequency' mode. I have yet to see a low frequency FFT analyzer that doe not include this setting.



Usually Voltage Noise versus Frequency of an OPAMP is presented more usefully as this one above that I made for a LT6910-1 at a gain of 100. The interesting portion of the curve is in the 1/f region. Again, Log Scaling for frequency fixes the problem.
  

Example 3 -

It is important to have some 'domain knowledge' of your customers needs and what you are measuring. Here is a plot for an wide band receiving antenna that is similar to one I once was sent by another engineer. He measured this antenna and sent me the plot as a Smith Chart because that’s what he figured RF engineers wanted to see – a Smith Chart.



The above Smith Chart is a valid plot, but about all you can determine is that the antenna does seem to have several minor resonances going on and at one point, low in frequency it is actually pretty close to 50 Ohms (Where the marker is). Anything else, really is impossible to determine from this presentation.

Better to plot the antenna’s SWR [2] over the same frequency range as above, now you can see that the antenna is resonant at about 300 MHz and has a decent SWR (< 5:1) for receiving applications in the frequency range of 800 – 1200 MHz. Further I can determine what the reactance is at any frequency by the common SWR formula,




Where Zo is the characteristic system impedance which is usually 50 Ohms for RF work and ZL is the load impedance, in this case the antenna impedance at a given frequency [2].



Conclusion:

It is really easy to actually help your customers, even if you don’t have the specific domain knowledge that they have. You can (and should) always look at competitors data sheets and use the best available format for each curve or table. As a sanity check, take your data sheet to a few customers and ask them what they think, this will provide immediate and valuable feedback.

As a final note: It is difficult for any manufacturer to keep everything up to date, as silly errors invariably creep in. Hence feedback is essential. Most datasheets that I now see have a HTML link to a help, feedback support site or email printed right on them, usually in the data sheet footer. There is no better way to get feedback than to ask for it and really no better feedback than from your customers right then when they are reading your data sheet and trying to use it.

Hall Of Shame:

Well, we have all done it, the ink isn't even dry on the page and we find a glaring error, or how about when you look at something you did years before and see an obvious fault you never saw before? Yes, so much to do, so little time for error checking! Here I present the 'Hall of Shame' of funny or wrong data sheet data....

[Edit 12Apr20]

Remember that first time your College Professor reminded you to "Think about your numbers" and use the digits responsibly? I still remember it. He told us to stop writing down 1% accurate numbers to 10 decimal places. Check out this actual data sheet curve above. Can I return the part if I only measure 40.163155 MHz @ - 3db???
Now with software we do this in "secret", by making all our PC software floating point calculations using: 64 bit / doubles, by default whether her we need to or not! (And I almost spelled: 'Whether' as 'Weather'. How embarrassing would that have been?)




Everything is "Guaranteed by design" when you don't spec anything! The Row for "Charge Injection" of an Analog Switch had no numbers in it, but it did reference this handy note. I think I will always refer to this condition as "Note 9" from now on.



The chart on the LEFT was from a FET Bus Switch data sheet, it shows absolutely nothing of value and did not even specify the test (load) conditions, it's a analog switch - naturally it will have a 1:1 slope with no load! Much more useful would have been to show the channel resistance versus input voltage as shown on the RIGHT that another manufacturer had on their data sheet. So again: LEFT Not useful at all, RIGHT very useful!

--- More to come ---

References:

[1] Linear Technology Application Note 132, “Fidelity Testing for A to D Converters”

[2] Wikipedia article on SWR.
      https://en.wikipedia.org/wiki/Standing_wave_ratio



Article By: Steve Hageman / www.AnalogHome.com

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Saturday, June 1, 2019

Tame that Ringing in Switching Power Supplies


I recently read an application note by one of the big three semiconductor companies about a low EMI switching power supply. Normally, low EMI means that the Switching Power Supply IC has controlled slew rates on the power switch to lower the rate of change of the switching voltage to lower the high frequency content of the switching output waveform. This works very well and is a sure step in the right direction of making a EMI quiet design.

This application note showed some actual waveforms and I immediately spotted another source of EMI that was not addressed. The ringing that can be caused over the power switch or catch diode, or in transformer designs it can be exasperated by the leakage inductance in the transformer as shown in figure 1.

 

Figure 1 – Switching regulator noise is not just confined to the power switch rise and fall times. EMI is also produced by ringing waveforms as shown in the figure clipped from a semiconductor companies application note. This damped ring looks to be about 1 or 2 uSec period which would correspond to a 500 kHz to 1 MHz peak in the EMI signature produced by the design.

The ringing waveform of figure 1 may or may not be an issue in the final design, but I have seen many instances of otherwise well designed switching power supplies being overwhelmed by this type of noise, especially if it is to be used in any type of design that has an intermediate signal processing frequency band that is somewhere around the ringing frequency. For instance, a ring like this would easily be seen in the passband noise floor of my DSP based Lock-In amplifier [1].

Causes of Spurious Ringing

The ringing is caused by parasitic elements in the Power Switch, Catch Diode and output inductor all acting together to to produce a parasitic ring. This ringing in a switching power supply is more frequent when the output inductor runs ‘dry’ or is operating in the discontinuous conduction mode (DCM). The reason being that when the inductor runs dry, the catch diode and main power switch both stop conducting and go to a high impedance state which allows the high frequency parasitic ringing to take place. The ringing is normally not problematic by itself and usually never causes any possible over voltage damage because it is almost always below the switching voltages anyway, so it is often ignored.

The instant the power switch turns on again the circuit impedance is immediately lowered and the ringing is stopped. In transformer based designs the ringing is usually greatly influenced by the leakage inductance of the transformer.

Spurious Ringing Mitigation

To show the ringing, I made a simple model of a buck converter power stage using Tina-TI [2]. I did nothing special and just threw together some appropriately sized components as shown in figure 2. Upon running the simulation, sure enough the ringing appeared right on que, as shown in figure 3.

 

Figure 2 – A simple Tina-TI SPICE Simulation model of a Buck converter was thrown together to show the ringing.
 
 

Figure 3 – A Transient simulation was run and on the circuit of Figure 2. The ringing appeared right on Que!


The solution to the ringing is as old as switching electronics itself. In old time ‘relay’ or ‘contactor’ switching circuits, to quench the parasitic inductance and capacitance of a contactor that when opened might cause a damaging spark, a circuit called a ‘Snubber’ was included across the terminals to add some controlled impedance that would counteract or swamp out the parasitic elements while at the same time be a DC block so that no steady state DC power was used [3].

In this instance we just need to add enough damping impedance such that the high Q circuit formed by the parasitic elements is damped much sooner than they would be naturally. We can’t always get rid of all of the ‘ring’ but we can sure keep it to one or two cycles at best, thus substantially reducing the EMI footprint.

The basic circuit was modified by adding two Resistance / Capacitance snubbers across the catch diode and the main power inductor as shown in figure 4. Simulation of figure 4 shows that the snubbers have done their job and have substantially reduced the ringing to around a single cycle and at a lower frequency as shown in figure 5.



Figure 4 – Adding two RC ‘Snubbers’ in the buck regulator circuit can really control the ringing. Sometimes a single snubber will work and sometimes several are required it all depends on how exactly your circuit is designed, used, built and even on the layout.
 



Figure 5 – Running a simulation on the circuit of figure 3 shows a vast improvement in the ringing which is now damped quickly in less than two cycles. This will obviously reduce the EMI footprint of the design.

I knew from experience what values to choose for the Capacitors and I iterated in on the resistor values to minimize the ring. This is actually easier with actual hardware than in software. On an actual hardware circuit I normally get a small, single turn potentiometer of 100 or 200 ohms and solder a reasonably sized capacitor to it (Here I empirically found that 2.5nF was the optimum size), then I tack solder this across the circuit nodes, all using very short leads. In ten seconds I can tweak the potentiometer and see the effect. If need be I can make another snubber by using the same setup and a different capacitor value. In just a few minutes I can have the circuit ‘empirically’ optimized [4].

These values work well for a 100 kHz switching regulator, you might have a capacitor in the range of 1 to 3nF, the resistor values will always be less than 300 ohms, normally in the range of 25-150 ohms. For a 1 MHz switching regulator divide the capacitor value by 10, but the resistor values will still be in the same range.

Similar snubber implementations can also be used to mitigate leakage inductance induced overshoot on the main power switch in transformer based designs and across rectifier diodes to reduce the snap off that happens with diodes from their reverse recovery effects all of which can add to EMI issues with power supplies.

Downsides

We have seen how the EMI footprint can be improved with the application of a snubber. What are the downsides? The snubbers add AC impedance to the switching nodes and as such they will add AC power loss to the switching waveforms. If a few milliwatts are not a big issue then there is no harm. If power is everything to you, then you have to be careful and carefully trade-off the EMI footprint with you tolerable power loss, which will require more time to optimize everything.

Taking it to the limit

In some killowatt switchers I have seen where the snubbers consumed 10’s of watt’s. This is a lot of power to just waste as heat, and the solution there was to take that snubbed power and add some small DC/DC converters to take that power and convert it back to the input or output circuits to aid in the overall efficiency. The snubber power recovery DC/DC converters are bigger in this instance than most of us use for our main power supply! Wild!



References:

[1] A DSP Based Lock-In Amplifier,  Hageman, Steve,
www.analoghome.com/articles/a_modern_dsp_lockin_amplifier.pdf

[2] Tina-TI, SPICE Simulator, Texas Instruments, inc.  
http://www.ti.com/tool/TINA-TI

[3] The old company Sprague Electric used to make a marvelous array of ‘Snubbers’, one for every common use, some even with built in spark gaps, turning ‘snubbing’ into a true Art form. The term ‘Snubber’ is also used by fluid flow engineers to denote methods alleviate fluctuations and ‘ringing’ in fluid transport pipes.

[4] Yes you can find articles that will go into great detail on how to use circuit analysis to get to at least a nominal starting point for the values of the RC Snubbers, but since a large portion of the ringing is based on poorly specified or even unknown component parasitics, including the PCB layout and even the PC stackup, I find it easier to just empirically get the job done. Your: “mileage may vary” as they say, so approach the problem anyway you want – both will get to a solution eventually.


Article By: Steve Hageman / www.AnalogHome.com

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).