Thursday, March 19, 2020

How accurate is that GPS anyway?

There is always a lot of uncertainty when dealing with a new GPS system, that goes well beyond the immediate needs of powering up, interfacing to, and communicating with the hardware, such as:

     * How much ‘Positional Noise” is there in the readings?
     * How accurate is the Altitude reading?
     * Can I correlate the HDOP (Horizontal dilution of precision) value to positional inaccuracy?
     * Can I correlate the Number of Satellites seen to positional inaccuracy?

Well the Internet has many ‘opinions’ but dew real answers or data… A good overview of the GPS signal accuracy as transmitted can be found in reference [1], but this does not easily relate to what actual user experience, on the receiving end will be. I will present my basic measurements here to see if we can answer the above questions.

I recently started a small GPS Tracking project and the GPS selected was the MediaTek MTK3339 [2], because this all I one GPS is a fully integrated GPS with a built in Antenna, has good ‘Data Sheet’ Performance and Adafruit (as always) provides some really useful breakout boards which made early prototyping and testing easy [3].

After cobbling together some prototype software to read, store and display positions and calculate distances, data was acquired for several long periods to answer the above questions.


I sat the GPS prototype in a location inside my LAB, which is a wood frame building that is pretty transparent to GPS Signals – The GPS never sees less than 5 Satellites at any time and recorded around 9 hours of data over several days. I plotted this data on a Map using the GPS Visualizer [4] to see where the positional ‘Tracks’ were as shown in Figure 1.


Figure 1 – Using the Web Based Plotting Page "GPS Visualizer" I plotted 9 Hours of positional data with the GPS absolutely stationary. As you can see there is some positional ‘noise’ or ‘meandering’ in the readings. At the very bottom of this figure is a distance scale.

While the true and accurate position on the Earth down to a foot is difficult to ascertain unless you are a surveyor, I averaged all the data in Figure 1 and came up with an average position. On subsequent data runs I compared this data to the previous average positions and the results are almost exactly the same, that is the data overlays itself to within 0.5 Feet. This suggests that if there are any systematic measurement offsets or errors, these errors are static day to day. The data certainly looks to be within a foot to the detail that I can see on Google Maps (assuming that Google Maps is accurate to a foot).

For my intended application a 50 foot accuracy is more than enough. The real worry is since my tracking application is also recording total distance traveled, I need to know what the GPS positional noise is so that I can have a limit on how much movement is required before I log it. This prevents me from recording or integrating the noise into a false distance.

With the foregoing in mind lets look at 9 hours of data recorded with the GPS receiver stationary in my Lab. The ‘Delta Distance’ is the instantaneous distance calculated for any reading to the average value of all 9 hours of data acquired. If we assume that the long term mean is correct, which it looks to be, then this would be a measure of the instantaneous noise. The GPS receiver is operated at a 1 Sample per second rate and the data recorded is on every fifteenth sample.

Figure 2a – Using the average of all the data as a reference point, the delta distance of each reading is plotted. The peak distance from the average can be quite large as can be seen. The Red Line is a one hour moving average which smooths the peak error by almost 3:1.

Figure 2b – Same setup as Figure 1a, but with the data captured on a different day.

More statistics of figure 1a and 1b. The Peak error was: 54.1 Feet for figure 1a and 68.1 feet for figure 1b, RMS error of all points was: 16.3 and 16.5 Feet respectively. This 16 Feet RMS error compares well with what reference [1] states as a typical expected user accuracy.

Now the question becomes: Can I use some other measure to know about any one measurement to get an idea about the probable error in that particular measurement? All GPS units that I have seen also supply a HDOP calculation [5] and you can also get the number of Satellites that the GPS is currently basing the calculations on.

Figure 3 – A Scatter Plot to see if there is any correlation between delta distance and the number of Satellites used in the calculations as recorded by the GPS. As can be see, there is no correlation here at all.

Figure 4 - A Scatter Plot to see if there is any correlation between delta distance and the HDOP as recorded by the GPS. As can be see, there is no correlation here at all either.


Errors happen in every measuring instrument or system. With this GPS you can be reasonably certain that your maximum positional error will be less than 70 feet as was verified by me over several 9 hour stints of data recording. This testing for long periods of time and on several different days gave plenty of time for the satellites to be in all sorts of positions both good and bad for measurements.

As for using more data supplied by the GPS to get an ‘instant’ sense about how much any one particular reading is in error, at least for this particular receiver, neither the number of Satellites or the HDOP correlates to any instantaneous positional error that I can see. Bottom line is: You can’t be sure of the errors until you acquire enough data (At least an hours worth) in any stationary spot to tell what the errors likely are [6].

By way of actual data this hopefully gives some people better data with which to make better informed decisions on what kind of ‘Typical’ data they can expect from a low cost GPS receiver in actual use.

Bonus Data:

No analysis would be complete without showing the histogram of the data. After a plot of the data by itself, the histogram provides a view into the nature of the data distribution that is not always clear from looking at the data itself. With that in mind Figure 5 plots the histogram of Figure 2A's data.

Figure 5 - A histogram of Figure 2A's data shows a familiar kind of statistical distribution that resembles a Weibull distribution. At the very least we can see that the smoothed data (Red Line) is some sort of distribution and not a random distribution.

As can bee seen above there is a nice 'classic' distribution that resembles the Weibull distribution. Other statistical data for the data in Figure 2A is (All units in Feet),

Extra Bonus Data:

It is well known that the Altitude data is even more inaccurate with GPS data, this has to do with the geometry of the satellites and the calculations involved [1]. From Topographic maps I believe my Labs true elevation to be approximately 141 Feet. The GPS data for 9 hours worth of data is shown in Figure 6.

Figure 6 – Altitude data for 9 hours worth of GPS Data. The test location was at approximately 141 Feet (from Topographic Maps). The average of this data was 136.4 Feet, suggesting a 5 foot offset in long term data, however this could be within the uncertainty of the topographic data that I used. The peak to peak deviation of the data over 9 hours was: 84.63 Feet. Just interesting to see some real, actual data.



[2] MediaTek Labs model: MTK3339

[3] Adafruit Ultimate GPS breakout board. PRODUCT ID: 746 . 

[4] GPS Visualizer web page for plotting GPS Data.

[5] HDOP discussion:

[6] There are offline programs that can look at the satellite geometry that you are seeing at and that will give you a better idea of the maximum error you might see at that very moment, but this is beyond what the simple consumer GPS Modules typically provide. These programs are used by GPS surveyors to time or at least try to time their measurements to concede with the maximum positional accuracy.

Article By: Steve Hageman /

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Wednesday, January 8, 2020

The ground isn’t flat, so our PCB designs shouldn't be either…


In the 'bad old days' we used to spend a lot of time measuring and eyeballing PCB designs to see if everything would fit as intended. Naturally this led to many errors and iterations the first time anything new was tried.

Since Altium led the way with native 3D design capability in their PCB Design Software some 10 years ago, it has been an indispensable in how modern PCB’s are designed.


Figure 1 – The classic PCB view – It is great for routing traces, but it’s all flat, you really can’t tell if your footprints or parts placements are going to collide or not until the first hardware gets built and by then its too late.

No more endless hours making detailed measurements and transferring from one tool to another, guessing if things will fit together. Now, just press the ‘3’ key in Altium to look at the parts and PCB in 3D mode and see instantly how it all fits, then press ‘2’ to get back to the flat PCB view to route traces. This is especially true with connectors, which have historically been a real source of confusion and errors for decades. Actually starting with a 3D model of the connector and placing it on a PCB, then placing the mating connector on this will catch most of the reversed connector errors before the first hardware gets made wrong.

Figure 2 – A 3D view of a small GPS Logger prototype. Now that’s instant parts placement verification. Press the ‘3’ key in Altium to get an instant 3D view of your PCB design.

Figure 3 – The prototype GPS logger – it all fit together perfectly thanks to Altium’s native 3D capability.

About the project: A client needed a quick GPS Logger prototype to gain valuable system experience with and to start writing code for. Using a few off the shelf modules from Adafruit (Their Ultimate GPS Breakout and Feather Wing Display breakout), plus a small custom processor and memory board, a quick prototype was put together in a week that allowed the system to be tested and firmware to be developed all before the final design was finalized. This rapid prototype, instant feedback, really speeds up the completion of projects and leads to less errors and problems down the road.

Just say no to PCB tools that don’t give you instant / native 3D capability!

Article By: Steve Hageman /

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Monday, September 30, 2019

Make Your Code More Assertive!

Most if not all 32 bit processors have a software instruction to fire a ‘Breakpoint’, the MIPS core used by Microchip PIC32 processors is no different.

This can be used to make a “Debug Only Breakpoint” that is useful as a standard C ‘assert()’ alternative.

Assert is a method of adding check code into your program that can be used to check assumptions about the state of variables or program status to flag problems or errors.

Using assertions can dramatically reduce programming errors, especially the errors that occur when libraries are being used [1][2].

Asserts in a PC environment are pretty easy to use as there is normally console window available or disk file to log any asserted problems to.

In a small embedded system neither of these things is typically available. However, when developing and testing code a programmer / debugger is normally attached to the system and this can be used as the window into the systems operation.

What is needed is an assert macro that fires a software breakpoint and halts the program if the system is in a debugging state and ideally for the macro to generate no code if the system is built with a production state. When using MPLAB-X and XC-32, Microchips preferred development IDE and compiler, this is easy to do.

When building a Debug image, MPLAB-X via XC-32 defines a name: “__DEBUG” (Two leading underscores). This defined name can be used to build the assert macro two ways. 1) When debugging the macro is built and when not debugging the macro generates no code, as shown below.

A simple Macro that mimics the standard C assert() call for any PIC32 based project. The macro generates a check that calls the MIPS software breakpoint if the project was built with ‘__DEBUG’ defined. This name is automatically defined by MPALB-X when a debugging mage is built.

You can put code similar to the above into a C header file.Our new macro then works just like any standard C assert() macro,

   - assert_dbg(exp) where exp is non-zero does nothing (a non-assertion).

   - assert_dbg(exp) where exp evaluates to zero, stops the program on a software
     breakpoint if debugging mode is active in MPALB-X.

I chose to name the macro: “assert_dbg()” so that the association to a standard assert would be easy to remember (it works the same way) and I added the ‘_dbg’ suffix to show that something is slightly different here as a reminder to the programmer that this is like: “A standard assert, but slightly different”.

Naturally you can use any name you like.

Bonus Macro

While reading an article by Niall Murphy on the Barr Group website [3], I ran across a comment by ‘ronkinoz’ that shared another clever / useful debugging macro - I will call it ‘assert_compile()’ here.

What assert_compile() does is to check things that can be checked at compile time, if the assert fails then the compiler halts on an error.

The macro works by trying to define a typedef array with a size of one char if the assert passes, if the assert fails the array will be sized as ‘-1’ which GCC will halt on as a compile error.

This can be very useful in checking the size of known instances against known system constraints for instance.

One thing that often causes problems is making a EEPROM structure, like a Cal Constant table larger than the designed allocated space. This often happens when a project is revised and revised to add new features or as a product is developed and it is discovered that the calibration routines need to change.

You can catch things like getting a table too large and possibly overwriting other EEPROM allocations. The size of fixed objects is known to the compiler at compile time.

My first use is to peg my code to a compiler version. That way in a year or so if I forget to read my notes and try to compile the code under a different version, I will get a rude warning to take a careful look before continuing.

The macro is presented here,



[2] McConnell, Steve, “Code Complete, A Practical Handbook of Software Construction”, 2nd edition, page 189: “Assertions”. Microsoft Press, 2004.


Article By: Steve Hageman /

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Saturday, August 10, 2019

Microchip XC32 Compiler Optimization

It is often quipped that the humorist Mark Twain once said,

    “There are lies, darn lies and statistics”

If Mr. Twain was a programmer he might have said something to this effect,

    “There are lies, darn lies and benchmarks”

Benchmarking has a long history of being bent into something that produces: “a good number, that will have nothing to do with your actual situation”. This is especially true now that we have embedded processors with caches and all sorts of other performance enhancing and limiting factors.

In the 1980’s there was one benchmark test that really took hold. It was called the “Sieve of Eratosthenes” and it was a numerical / prime number test that was used quite a bit to see what compilers produced the 'fastest' code [1].

This benchmark really just looked at one aspect of performance: Calculating prime numbers. It didn’t test graphics, IO speed or a host of other factors that are also important in a real world applications. So it was very narrow minded in its scope.

It did gain such prominence that the compiler writers of the era actually wrote optimizers that would detect this code sequence and apply special techniques to get the fastest possible performance for this specific benchmark test. There was nothing against wrong in doing this – It is kind of like producing cars with a fast 0-60 MPH acceleration factor – they can detect the ‘pedal to the metal’ and change the shift points to get the fastest 0-60 time, which is great for bragging rights, but this is a rarely used sequence in real life driving. This is much the same, in my opinion with most modern microprocessor benchmarks.

Whats the point then?

This ‘Optimization’ project started for two reasons,

1) I was working on a battery powered image processing application and I wanted to see what the effects of the various XC32 optimization levels were to see if I could save a significant amount of power by switching to higher optimization levels and hence lower the CPU clock speed thus saving power.

2) To just see generally, what the effect of the various Pro XC32 optimization levels do compared to the free version. It has been noted online that some people feel like they are being ‘cheated’ from ‘significant’ performance improvements with the free version of the XC32 compiler. So, we’ll take a look at that.

Note: You may know that Microchip provides a free compiler for its PIC32 processor series called XC32. It is currently based on GCC 4.8 and the free version provides -O0 and -O1 optimization levels. The ‘paid’ version includes support and the other GCC optimization levels: -O2, -Os and -O3

The XC32 optimization levels are not exactly the same as the standard GCC levels but they roughly follow. The XC32 2.0x manual states,

Some Notes

I used the XC32 2.05 and 2.10 Version for these tests (These versions perform the same in my tests, the versions differences only add some new devices and fix some corner case defects as can be seen in the release notes).

When Debugging your program logic is it useful to set the optimization level to -O0 as this produces pretty much 1:1 code with your C code, this makes following along the logic and looking at variables easy. Also inlining of functions is disabled so functions appear as you wrote them.

-O1 is the standard optimization level for the free version of XC32 and even this level inlines functions and aggressively removes variables that can be kept on the stack or in a register making your code run much faster and be smaller, but also making debugging very hard to follow. This is the optimization that you most likely want to use for your ‘Release’ code, after all: Who doesn’t want faster / smaller code?

Many Standard libraries like the MIPS PIC32 DSP libraries are written in hand optimized MIPS assembly language and hence bypass the C compiler completely, so you get the fastest possible performance even with the free version of the compiler.

-O2, -Os and -O3 Optimizations are only available with the paid version of XC32, which is quite inexpensive at < $30.00 per month and a must for professional developers if only for the included, expedited support that the license also includes.

My hardware test setup for all these tests is a PIC32MZ2048EFH processor running at 200 MHz clock speed.

On to the Benchmarking

One of the standard benchmarks used with advanced 32 bit processors is the ‘CoreMark’ [2]. This is a multifaceted benchmark that try's to simulate many different aspects of an actual application, yet in the end produce a single performance number. When I looked at this I found that the code was quite small and didn’t move a lot of variables around so in execution it probably can spend most if not all of its time in any modern 32 bit processors cache.

Another application that I used in my benchmarking was a custom image processing application that I wrote. It is not big in the sense of requiring a large program memory footprint, but it does use upwards of 57,000 bytes of data as it process a 32 x 24 image up to 320 x 240 for display and along the way: scales, normalizes, applies automatic gain control, maps the resulting image to a 255 step color map, etc. So there is quit a bit of data manipulation along the way and all the data cannot fit in the PIC32MZ’s 16k data cache at once.

The last application I used was a relatively large application provided by Microchip as a demo program that represents a complete application with extensive graphics, etc.  This application was compiled just to see the relative code sizes that the XC32 compiler produced is, as I thought the CoreMark and my application were really too small for a realistic analysis of program size.

CoreMark Program Insights

The CoreMark site [2] provides results and provides the compilers “Command Line Parameters” that were used to compile the program. This information was very interesting as you will soon see.

First, let’s take a look at the CoreMark execution speed versus various XC32 compiler optimization levels. The CoreMark program when compiled at -O1 is only 32k bites long and uses only 344 bytes of data memory, so it is quite small and probably runs completely in the PIC32MZ processor cache, so speed of execution is all we can really look for here.

Figure 1 – This is the execution speed for the CoreMark with various optimization levels. All results were normalized to the -O1 level as this is the highest optimization for the free version of the XC32 Compiler. See the text for a discussion of each optimization.

I included the -O0 optimization level in Figure 1 just as a comparison to see what a huge difference even the free -O1 optimization level makes on code performance. The difference between -O0 and -O1 in the CoreMark (and nearly every other application I have ever compiled) is nearly 1.5:1, no other optimization makes that big of a jump. In fact all the other optimizations and ‘tweaks’ only produce marginal gains on the -O1 optimization. As noted the optimization level -O0 is really only useful in debugging code, no one would every build a final application with this optimization level unless they really just don’t like their customers.

-Os is ‘optimize for size’ and as expected it results in a nearly 10% performance hit here. The CoreMark program is really very small in memory footprint, so this option would be a waste of time in any small application like this one, it is included only as a reference.

-O1 is the default optimization for the XC32 compiler free version. All the other results are normalized to this result (100%).

-O2 and -O3 produce only marginal gains above the -O1 level. -O2 is 13% faster, -O3 is 17% faster.

-O2++ is the optimization that Microchip used for the benchmark results posted on the CoreMark site and as may be expected includes some undocumented command line options and some specific tweaks to get the performance up as much as possible. Again, there is nothing that says anyone can’t optimize specifically for the benchmark, only that others should be able duplicate the results, which I was able to do. Here is the command line options for -O2++ as I found them,

-g -O2 -G4096 -funroll-all-loops -funroll-loops -fgcse-sm -fgcse-las -fgcse -finline -finline-functions -finline-limit=550 -fsel-sched-pipelining -fselective-scheduling -mtune=34kc -falign-jumps=128 -mjals

The really interesting option here is this one: “-mtune=34kc” I could not find this option documented anywhere and I really did not have time to search through megabytes of source code to try to find it. But “34Kc” is a designation that MIPS uses to describe the core that the PIC32MZ is based on so it is some sort of optimization for this specific core.

Bottom line is that these optimizations produce the fastest result – 30% faster than -O1 alone, but it is only some 12% faster than the base -O3 optimization, so it is only marginally better than -O3 alone.

-O3++ is where I applied these same -O2++ command line options to the base -O3 optimizations just for fun. This produced a marginally faster result than -O3 alone but it was still slower than -O2++.

Interesting to see the results, but again, while CoreMark is more comprehensive than the 1980's 'Shieve' benchmark, the CoreMark probably does not apply to your application at all, it’s small and uses an incredibly small amount of data memory.

Image Processing Program Insights

This was an actual application that I wrote that takes raw data from a 32 x 24 pixel image sensor, converts the raw data into useful values, up interpolates the pixels to 320 x 240, and them scales and limits the values for display. The data formats were int32, int16, int8 and float data types. A large amount of data was processed, some 57,000 bytes for each image.

Reading the sensor and writing to the display are fixed and running as fast as they can as set by the limitations of the hardware interfaces, so nothing can be done about that. The experiment was to see if the central processing algorithms could be speed up enough that would allow me to slow down the CPU clock enough to save on battery power. Running some benchmarks was a first step in that determination.

    while(1) {
    GetSensorData();        // Fixed by how fast the sensor can be read.
    GetImageAdjustments();  // User / GUI Interaction – get settings.
    DisplayImage();        // Fixed by how fast the display can be written to.

Figure 2 – The central image processing ‘loop’ consists of routines like this. At each step the data arrays were processed in deterministic loops. Some of the processing is redundant because multiple loops were used as each step consisted of one specific data operation. These loops could be combined if need be. But, without some profiling first the effort may have been in vain (See the conclusion), guessing almost never pays off in optimizing.

Figure 3 – The simplest optimization is to use the compilers built in ‘smarts’ to make the code faster. Here my simple but data intensive image processing program was optimized using various compiler settings and the speed of execution was measured. The optimization level -O1 was normalized to 100%.

Figure 3 is pretty straight forward. I timed only the image procession portion of the code, excluding the hardware IO as that is fixed by hardware constraints. Optimization level -O0 would never be used for released code, it is included here only as a comparison to show how aggressive the compiler gets even with the free -O1 optimization. Interestingly, option -O2 produced a slower result than option -O1 in this example, there is probably some data inefficiency going on with this option. As expected however option -O3 produced the fastest ‘standard’ result, but really only marginally faster than -O1 at around 10%.

XC32 also has some extra ‘switches’ that can be tweaked from the GUI. I set all these for the: “-O1++, -O2++ and -O3++ level tests. These switches are shown in figure 4.

Figure 4 – XC32 allows you to set these switches to turn more aggressive compiler options. As can be seen in figure 3 for the '-Ox++' results. It is faster, but not by a lot, typically less than 5%.

The bottom line here is: The XC32 free versions -O1 optimization was only slightly slower than the paid versions -O3++ maximum optimizations by less than 15%, OK but nothing really to get too excited about.

Large Application Insights

This large application is based on Microchips “Real Time FFT” application example program [3]. This is a fairly large application and it uses a lot of data, has a LCD and an extensive user interface, along with ADC, DAC drivers and FFT calculations. I don’t have the hardware to run this code, so I looked only at generated code size and data size. The Data Size held steady at 200 kBytes independent of the optimization level which is expected for a large graphics oriented program like this. The compiled program size was 279 kBytes when compiled with the optimization level of -O1.

Figure 5 – The Microchip application example: “Real Time FFT” was compiled at various optimization levels and the resulting program SIZE is shown plotted here. This is a rather big application at some 279 kBytes when compiled at -O1.  As can be expected when optimizing for absolute maximum speed (-O3++ see figure 4) the program gets much larger but at a huge cost in size.

As can be seen in Figure 5, The -Os optimization gave only a marginal size decrease of around 7% over the default -O1 optimization. -O3++ however grew very large, probably mostly due to the application of figure 5’s “Unroll Loops” switch. This switch forces the unrolling of all loops, even non-deterministic ones.

This result it to be expected, as any compilers ‘Money Spec’ is execution speed, not program size. Which for the majority of real world applications is the proper trade off. As my image processing application shows, the performance gains from -O1 to -O3 would be expected to be minimal and the trade off in program size might be excessive, especially if you are running out of space and can’t go to a larger memory device for some reason.

In a very large application code size may be a real issue because you want to save money by using a smaller memory chip. The -Os option probably isn’t the silver bullet you will be looking for as the < 10% code size savings are nearly insignificant. Microchip has an application note [4] that provides some interesting information on how -Os works and some more optimization tricks for code size. The tricks when applied result in less than a 2% improvement over the -Os optimization alone, so while this note is interesting reading, it provides no further improvement.


The only good benchmark is the one of your actual running application code. That being said, one item that can be gleaned from the experiments here is that the free XC32 compiler running at its maximum optimization level of -O1 produces code that is within 15-20% or so of the maximum optimizations possible with the XC32 paid compiler. The other item that can be gleaned here, is that by spending hours and hours you might be able to coax 20% or perhaps even 30% better performance by hand tweaking the compiler optimizations for the particular task at hand, but you can probably do the same or better right in your code by improving your base code algorithms.

It is always good to always remember the first and second rule of code optimizations however,

     #1 - Write plain and understandable code first.
     #2 - Don’t hand optimize anything until you have profiled the code and proven that the
             optimization will help.

Guessing at what to optimize is almost is never right and ends up wasting a lot of time for a very marginal performance increase and a highly likely increase in your codes maintenance costs.

I have found that there are all sorts of interesting articles out there on optimization with GCC and I have also found that unless they show actual results they are mostly someones 'Opinion' on how things work, or how things worked 15 years ago but don’t work that way today. For instance, I have found articles that state emphatically that 'Switch' statements produce far faster code than 'If ' statements, and I have found articles that say emphatically just the opposite. Naturally nether of these articles show any actual results, so I have learned to beware and check for myself. Which takes me back to the #1 rule of optimizing, don’t do it until you have proven that you need to do it. And yes, I fight this temptation myself all the time!

And finally, yes, finally… If you don’t have the XC32 paid version, don’t fret too much – in most applications the paid version will only provide a marginal gain in performance and size over the default and free XC32 -O1 optimization provided free. But, as always: "Your mileage may vary".

Appendix A –  General Optimization Settings – Best Practices with XC32

Best for Step by step Debugging

Set Optimization level to -O0 and set all other defaults to ‘reset’ condition. This will compile the code almost exactly as you have written it which will allow for easy line by line program trace during debugging. This is very helpful in tracing and verifying the programs logic.


Best settings for Speed or Size

Use these settings for maximum ‘easy’ speed in the XC32 free compiler. Meaning these settings will get you to around 80 to 90% of maximum speed possible. Any improvement over this will need to be accompanied by a very careful review of all the compiler options and their effect, actual time profiles of the code and possibly specific module optimizations.

Project Properties → xc32-gcc→ General

Project Properties → xc32-gcc→ Optimization

To Optimize a specific source file Individually

Right clock on the source file and select ‘Properties’

Then select: “Overwrite Build Options” and from there you can set the optimization level of the specific file separate from the main application settings.


Then select the xc32-gcc → Optimizations as above and have at it as detailed above.

Appendix B – Lists of the optimizations as applied by XC32 Version 2.05/2.10

Optimizations vs. -Ox level, All other settings at default levels.

Note: Use “-S -fverbose-asm” to list every silently applied option (including optimization ones) in assembler output.

# -g -O0 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcommon -fdebug-types-section
 # -fdelete-null-pointer-checks -fearly-inlining
 # -feliminate-unused-debug-types -ffunction-cse -ffunction-sections
 # -fgcse-lm -fgnu-runtime -fident -finline-atomics -fira-hoist-pressure
 # -fira-share-save-slots -fira-share-spill-slots -fivopts
 # -fkeep-static-consts -fleading-underscore -fmath-errno
 # -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return
 # -fpeephole -fprefetch-loop-arrays -fsched-critical-path-heuristic
 # -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
 # -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
 # -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fshow-column
 # -fsigned-zeros -fsplit-ivs-in-unroller -fstrict-volatile-bitfields
 # -fsync-libcalls -ftoplevel-reorder -ftrapping-math -ftree-coalesce-vars
 # -ftree-cselim -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im
 # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
 # -ftree-phiprop -ftree-pta -ftree-reassoc -ftree-scev-cprop
 # -ftree-slp-vectorize -ftree-vect-loop-version -funit-at-a-time
 # -fverbose-asm -fzero-initialized-in-bss -mbranch-likely
 # -mcheck-zero-division -mdivide-traps -mdouble-float -mdsp -mdspr2 -mel
 # -membedded-data -mexplicit-relocs -mextern-sdata -mfp64 -mfused-madd
 # -mgp32 -mgpopt -mhard-float -mimadd -mlocal-sdata -mlong32 -mno-mdmx
 # -mno-mips16 -mno-mips3d -mshared -msplit-addresses

# -g -O1 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcombine-stack-adjustments -fcommon -fcompare-elim
 # -fcprop-registers -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fearly-inlining
 # -feliminate-unused-debug-types -fforward-propagate -ffunction-cse
 # -ffunction-sections -fgcse-lm -fgnu-runtime -fguess-branch-probability
 # -fident -fif-conversion -fif-conversion2 -finline -finline-atomics
 # -finline-functions-called-once -fipa-profile -fipa-pure-const
 # -fipa-reference -fira-hoist-pressure -fira-share-save-slots
 # -fira-share-spill-slots -fivopts -fkeep-static-consts
 # -fleading-underscore -fmath-errno -fmerge-constants
 # -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer
 # -fpcc-struct-return -fpeephole -fprefetch-loop-arrays
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fshow-column -fshrink-wrap -fsigned-zeros
 # -fsplit-ivs-in-unroller -fsplit-wide-types -fstrict-volatile-bitfields
 # -fsync-libcalls -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop
 # -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts
 # -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert
 # -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize
 # -ftree-parallelize-loops= -ftree-phiprop -ftree-pta -ftree-reassoc
 # -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize -ftree-slsr
 # -ftree-sra -ftree-ter -ftree-vect-loop-version -funit-at-a-time
 # -fvar-tracking -fvar-tracking-assignments -fverbose-asm
 # -fzero-initialized-in-bss -mbranch-likely -mcheck-zero-division
 # -mdivide-traps -mdouble-float -mdsp -mdspr2 -mel -membedded-data
 # -mexplicit-relocs -mextern-sdata -mfp64 -mfused-madd -mgp32 -mgpopt
 # -mhard-float -mimadd -mlocal-sdata -mlong32 -mno-mdmx -mno-mips16
 # -mno-mips3d -mshared -msplit-addresses

# -g -Os -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcombine-stack-adjustments -fcommon
 # -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps
 # -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fdevirtualize -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffunction-cse -ffunction-sections -fgcse -fgcse-lm
 # -fgnu-runtime -fguess-branch-probability -fhoist-adjacent-loads -fident
 # -fif-conversion -fif-conversion2 -findirect-inlining -finline
 # -finline-atomics -finline-functions -finline-functions-called-once
 # -finline-small-functions -fipa-cp -fipa-profile -fipa-pure-const
 # -fipa-reference -fipa-sra -fira-hoist-pressure -fira-share-save-slots
 # -fira-share-spill-slots -fivopts -fkeep-static-consts
 # -fleading-underscore -fmath-errno -fmerge-constants
 # -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer
 # -foptimize-register-move -foptimize-sibling-calls -fpartial-inlining
 # -fpcc-struct-return -fpeephole -fpeephole2 -fprefetch-loop-arrays
 # -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fshrink-wrap
 # -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types
 # -fstrict-aliasing -fstrict-overflow -fstrict-volatile-bitfields
 # -fsync-libcalls -fthread-jumps -ftoplevel-reorder -ftrapping-math
 # -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch
 # -ftree-coalesce-vars -ftree-copy-prop -ftree-copyrename -ftree-cselim
 # -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
 # -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
 # -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
 # -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
 # -ftree-slp-vectorize -ftree-slsr -ftree-sra -ftree-switch-conversion
 # -ftree-tail-merge -ftree-ter -ftree-vect-loop-version -ftree-vrp
 # -funit-at-a-time -fuse-caller-save -fvar-tracking
 # -fvar-tracking-assignments -fverbose-asm -fzero-initialized-in-bss
 # -mbranch-likely -mcheck-zero-division -mdivide-traps -mdouble-float
 # -mdsp -mdspr2 -mel -membedded-data -mexplicit-relocs -mextern-sdata
 # -mfp64 -mfused-madd -mgp32 -mgpopt -mhard-float -mimadd -mlocal-sdata
 # -mlong32 -mmemcpy -mno-mdmx -mno-mips16 -mno-mips3d -mshared
 # -msplit-addresses

# -g -O2 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcombine-stack-adjustments -fcommon
 # -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps
 # -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fdevirtualize -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffunction-cse -ffunction-sections -fgcse -fgcse-lm
 # -fgnu-runtime -fguess-branch-probability -fhoist-adjacent-loads -fident
 # -fif-conversion -fif-conversion2 -findirect-inlining -finline
 # -finline-atomics -finline-functions-called-once -finline-small-functions
 # -fipa-cp -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
 # -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots
 # -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno
 # -fmerge-constants -fmerge-debug-strings -fmove-loop-invariants
 # -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
 # -foptimize-strlen -fpartial-inlining -fpcc-struct-return -fpeephole
 # -fpeephole2 -fprefetch-loop-arrays -fregmove -freorder-blocks
 # -freorder-functions -frerun-cse-after-loop
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fschedule-insns -fschedule-insns2
 # -fshow-column -fshrink-wrap -fsigned-zeros -fsplit-ivs-in-unroller
 # -fsplit-wide-types -fstrict-aliasing -fstrict-overflow
 # -fstrict-volatile-bitfields -fsync-libcalls -fthread-jumps
 # -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars
 # -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce
 # -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
 # -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
 # -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre
 # -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink
 # -ftree-slp-vectorize -ftree-slsr -ftree-sra -ftree-switch-conversion
 # -ftree-tail-merge -ftree-ter -ftree-vect-loop-version -ftree-vrp
 # -funit-at-a-time -fuse-caller-save -fvar-tracking
 # -fvar-tracking-assignments -fverbose-asm -fzero-initialized-in-bss
 # -mbranch-likely -mcheck-zero-division -mdivide-traps -mdouble-float
 # -mdsp -mdspr2 -mel -membedded-data -mexplicit-relocs -mextern-sdata
 # -mfp64 -mfused-madd -mgp32 -mgpopt -mhard-float -mimadd -mlocal-sdata
 # -mlong32 -mno-mdmx -mno-mips16 -mno-mips3d -mshared -msplit-addresses

# -g -O3 -ffunction-sections -ftoplevel-reorder -fverbose-asm
 # options enabled:  -faggressive-loop-optimizations -fauto-inc-dec
 # -fbranch-count-reg -fcaller-saves -fcombine-stack-adjustments -fcommon
 # -fcompare-elim -fcprop-registers -fcrossjumping -fcse-follow-jumps
 # -fdebug-types-section -fdefer-pop -fdelayed-branch
 # -fdelete-null-pointer-checks -fdevirtualize -fearly-inlining
 # -feliminate-unused-debug-types -fexpensive-optimizations
 # -fforward-propagate -ffunction-cse -ffunction-sections -fgcse
 # -fgcse-after-reload -fgcse-lm -fgnu-runtime -fguess-branch-probability
 # -fhoist-adjacent-loads -fident -fif-conversion -fif-conversion2
 # -findirect-inlining -finline -finline-atomics -finline-functions
 # -finline-functions-called-once -finline-small-functions -fipa-cp
 # -fipa-cp-clone -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra
 # -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots
 # -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno
 # -fmerge-constants -fmerge-debug-strings -fmove-loop-invariants
 # -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
 # -foptimize-strlen -fpartial-inlining -fpcc-struct-return -fpeephole
 # -fpeephole2 -fpredictive-commoning -fprefetch-loop-arrays -fregmove
 # -freorder-blocks -freorder-functions -frerun-cse-after-loop
 # -fsched-critical-path-heuristic -fsched-dep-count-heuristic
 # -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
 # -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
 # -fsched-stalled-insns-dep -fschedule-insns -fschedule-insns2
 # -fshow-column -fshrink-wrap -fsigned-zeros -fsplit-ivs-in-unroller
 # -fsplit-wide-types -fstrict-aliasing -fstrict-overflow
 # -fstrict-volatile-bitfields -fsync-libcalls -fthread-jumps
 # -ftoplevel-reorder -ftrapping-math -ftree-bit-ccp
 # -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-coalesce-vars
 # -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce
 # -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
 # -ftree-loop-distribute-patterns -ftree-loop-if-convert -ftree-loop-im
 # -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
 # -ftree-partial-pre -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc
 # -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize -ftree-slsr
 # -ftree-sra -ftree-switch-conversion -ftree-tail-merge -ftree-ter
 # -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time
 # -funswitch-loops -fuse-caller-save -fvar-tracking
 # -fvar-tracking-assignments -fvect-cost-model -fverbose-asm
 # -fzero-initialized-in-bss -mbranch-likely -mcheck-zero-division
 # -mdivide-traps -mdouble-float -mdsp -mdspr2 -mel -membedded-data
 # -mexplicit-relocs -mextern-sdata -mfp64 -mfused-madd -mgp32 -mgpopt
 # -mhard-float -mimadd -mlocal-sdata -mlong32 -mno-mdmx -mno-mips16
 # -mno-mips3d -mshared -msplit-addresses


[1] Some of the original BYTE magazine articles dealing with the Sieve of Eratosthenes

A High-Level Language Benchmark by Jim Gilbreath
BYTE Sep 1981, p.180

Eratosthenes Revisited: Once More through the Sieve by Jim Gilbreath and Gary Gilbreath
BYTE Jan 83 p.283

Benchmarking UNIX systems by David Hinnant
BYTE Aug 1984 p.132

[2] Coremark Benchmark

[3] Microchip Technology, example application. Located in the Harmony install directory at,

[4] Microchip Technology, “How to get the least out of your PIC32 C compiler”,

Article By: Steve Hageman /

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Monday, July 1, 2019

Microchip PIC32 Harmony Bare Metal Programming

All Microprocessor manufacturers now provide some sort of factory sponsored C language tool set and they also provide some High Level Abstraction Libraries (HAL) to help users configure and use their parts ‘easily’ without having to spend days wading through register maps in 500 page datasheets.

When Microchip first released their PIC32MX parts they also provided a HAL library that they called the ‘PLIB” or “Peripherals Library” to help users along.

For instance to setup a PIC32MX SPI clock using the original PLIB implementation one would need to call this function,

     SpiChnSetBrg(chn, brg);

This is still better than having to dig into the data sheet to determine all the register names and addresses directly, but it isn't all that friendly either and you still need intimate knowledge of the hardware to set the ‘brg’ register to get the proper SPI clock rate.

After this initial PLIB library was released Microchip correctly realized that the 32 Bit Microprocessor market is a commodity market and that they would have to differentiate themselves in some way. Since the hardware is a commodity, they chose to differentiate themselves with the software layer.

They named this new software “Harmony’ and discontinued the original PLIB concept going forward.

In Microchips marketing literature they touted Harmony as the “Do all, end all Framework” as it encompasses not only configuring the chips peripheral devices but includes a lot of higher level system functions like: File Systems, Graphics, Bluetooth and RTOS support to name a few, all under one roof.

Confusion Reigns, sowing Dis-Harmony

This has all caused some confusion as many people now think that to use Harmony means: Bringing in a lot of ‘Framework Baggage’ - this is simply not the case and since Microchip has not sought to clarify this, the rants about Harmony being just a lot of ‘Framework Baggage’ continue unabated.

What Microchip needed to do was write a simple application note describing how to use Harmony as a simple chip configurator or how to use harmony in ‘Bare Metal’ mode. Harmony is a decent chip configurator, about the same as everyone else’s and it is way better than resorting to the ‘Assembly Language’ approach of writing all the peripheral libraries yourself from register maps.

So I will take it upon myself to write a brief tutorial on what Microchip should have done some time ago to clarify the situation.

Note: This tutorial is based on Harmony 1.4x and 2.0x, Harmony 3 is still being flushed out, but I would expect it will follow that same basic principles as Harmony 1 and 2. Currently (July 2019) the first release of Harmony 3.0 just adds support for the Atmel ARM Chips.

The Missing Application Note: “Using Harmony as a Chip Configurator in Bare Metal Mode”

Using Microchip Harmony as a chip configurator will allow speedy configuration of nearly all of the peripherals in a PIC32 microprocessor and also stub out an application that will at the very minimum will get the chip up and running. Harmony will also setup the proper header paths so that the current High Level PLIB commands correctly target your selected chip. Another advantage of using harmony is in maintenance and updating the application later, since all the links in the libraries will be correctly kept up to date as the chip configuration or even the chip itself changes.

Prerequisites: You should be familiar with both of these Microchip provided tutorials,

     “MPLAB Harmony Tutorial – Creating Your First Project”
     “Creating a “Hello World” application Using the MPLAB Harmony Configurator”

These tutorials will help you in figuring out the steps required in operating the Harmony Configurator IDE and as with every manufacturers configuration solutions, there is at least a days worth of learning curve that needs to be overcome before you can be proficient and get off on your own. The tutorials above will help you through that first day learning process [1].

We will only be discussing the proper setup of the peripherals and how to get the generated application to “Bare Metal Mode” in this note using the SPI as an example.

Setting Up the Clock and IO Pins

Follow the procedure in the tutorials listed above to set the Clock and IO pins. There is no difference between the “Bare Metal Mode” described here and the procedure as described in the tutorials.

Setting Up and Using Peripherals

Microchip provides two types of drivers in Harmony: “Static” and “Dynamic”. Refer to Reference [2] for a discussion of how each driver type works. We will be focused on “Static” drivers here as they provide the expected “Bare Metal” performance with no additional software overhead required.

SPI Example

A basic “Bare Metal” setup would be for a plain old SPI configuration, just “Point and Shoot” with no interrupts, FIFO’s or external buffers, etc.  As you gain experience you can add those later if you wish either by yourself or with Harmony’s help. Configuring most peripherals in Harmony is done in 3 easy steps.

Step 1 - Expand the “Drivers” section as shown below

Step 2 - Expand the SPI section as below,

The settings shown above will be the simplest possible. For instance,

Standard Buffer: Not all PIC32’s have an enhanced (or FIFO) buffer, so using the standard buffer (or FIFO disabled) is safest.

8/16/32 bit Mode: Set this to what your SPI Devices expect as far as bit width. This parameter is easily changed on the fly using other PLIB_SPI commands.

Number of SPI Instances: Set this to the number of SPI peripherals you will be using. Here we will just be using one.

Step 3 - Now you have to configure one or more of the actual SPI hardware peripherals of the chip as shown below,

Here “Instance 0” is set to the Physical Hardware SPI peripheral: “SPI_ID_1”. Most of the other information is as described in the data sheet. Of note: it is nice to be able to set the SPI Clock rate and not have to worry about the frequency setting registers directly. Also of note, just because I set in this example 1 MHz, it does not mean that the SPI clock will end up being exactly 1 MHz. The configurator will simply pick the closest register settings that get closest to 1 MHz. If you want to know exactly the clock rate you can find the clock rate equation on the parts data sheet and calculate what it will exactly be from that.

For the “Clock Mode” setting there are 4 options, while most people are familiar with the usual SPI terminology of “SPI Mode 0” through “SPI Mode 3” or the slightly more descriptive: “CPOL” and “CPHA” for clock polarity and clock phase [4]. The figures below should help with the SPI setup.

Harmony configurator setting for SPI Mode is selected from one of the 4 options from the dialog as shown above.

Table 1 - The table above will help in translating from the standard SPI Mode number to the Harmony descriptive text.

We have now setup a basic instance of SPI peripheral #1, at this point we should have also configured the clocks and IO pins as per the earlier tutorials, this includes setting the SPI Peripheral #1 to the proper IO pins on the chip. Save and generate the code as per the tutorials, then do a compile to make sure everything is OK.

What Harmony Builds

Before we can get into using our “Bare Metal” SPI we need to decide what portions of the generated Harmony Code we will use.

As can be seen below Harmony builds for us a: main.c, app.c and its corresponding app.h files.

Figure 6 - The most important Harmony built files that we will explore here as shown in the MPLAB-X IDE.

When the program first starts it loads and runs some C startup code, then execution starts at main() and this is located in the file “main.c” as shown,
Figure 7 - The main() as built by Harmony. You can’t get much simpler than this.

Harmony has built a completely working project at this point that will,

1) Configure the clocks and hardware as setup by the user via the function SYS_Initialize().
2) Enter an infinite loop that calls a function called SYS_Tasks().

Taking a look at SYS_Initialize() we see another easy to understand function,

Figure 8 - SYS_Initialize() function as built by the Harmony Configurator. This unction is where all the clocks and peripherals get initialized.

As can be seen in the SYS_Initilize() function above, its function is straightforward. This is where the chip and any peripherals get initialized. First the clocks are initialized, then the ports are initialized, then you can see that the SPI instance is initialized. Note: This SPI initialization function contains all the commands to initialize the SPI that we set on the “SPI Driver Instance 0” Harmony configuration page above, it can be very informative to navigate to and look at this function if you have the need to change anything on the fly later in your program as all the configuration is done via PLIB_SPI functions.

The interrupts (if any) are next. At the very end we see that a call is made to a function APP_Initialize().

APP_Initialize() is located in app.c. APP_Initialize is a great place to set any user initialize functions that your application may need such as, initial conditions or flashing splash screens, etc. It is the place for anything that needs to be run or set once at the beginning of your application.

Figure 9 - As initially built, APP_Initialize() has some comments and sets the default state of the ‘roughed out’ state machine that Harmony builds at first. If you are not going to use the state machine approach, you can delete all of the contents of APP_Initialize() and put in here anything that your application needs to run once at the launch of your application.

Now that we are done with APP_Initialize() and SYS_Initialize() we are back in main().

The next part of main() (See main.c above) is a forever while loop that repeatedly calls SYS_Tasks(), drilling into SYS_Tasks(), which is located in a generated file named: system_tasks.c we find the following code,
Figure 10 - SYS_Tasks() Code as generated by Harmony.

As can be seen in the function SYS_Tasks(), there is a call to “Maintain Device Drivers”, but since we have set the driver to ‘Static’ and will be using the driver directly this function really will do nothing more than check a few flags and the return.

The next call is to APP_Tasks() which is located in the file app.c.

APP_Tasks() stubs out a basic State Machine that we can either use or not as shown below,

Figure 11 - App_Tasks() as the default build generated by Harmony.

Many (Most?) people don’t like to use a State Machine, preferring instead to use a ‘Big Loop’ implementation, no worries, this is easy to fix as shown below,

Figure 12 - APP_Tasks() modified into a ‘Big Loop’ as many people prefer.

If you are familiar with how the Arduino stubs out a sketch (program) then all this should also look familiar to you. Think of APP_Initialize() as the Arduino function: “setup()” and the APP_Tasks as the Arduino function “loop()”. As with the Arduino if you return from ‘loop()’ the hidden Arduino main() will call it again, similarly if you return from APP_Tasks() the Harmony supplied main() will call it again. Many people seem to not like ceding even that much control to Harmony, and alternatively, you can simply put the while(1) big loop in APP_Tasks() directly as shown above.

Adding the “Bare Metal” SPI commands

Harmony has a big advantage on many competitive software HAL’s in that they actually have good and up to date documentation, both printed and in Compiled Help HTML (CHM) formats. The CHM help file is located at,


Naturally, you will have to change the path segment “v2_05_01” to your specific version of Harmony path.

I prefer to use the CHM help file format as it is searchable and example code is easily clipped from it.

Searching the CHM file for SPI leads one to a document titled “Library Interface” which describes all the possible SPI API calls available to you in the Harmony HAL.

Figure 13 - The “help_harmony.chm” compiled help file is very easily searched to find the driver API calls and example code. Here we easily locate the SPI library information.

As an example lets find the call to change the SPI clock speed to contrast how much easier the Harmony PLIB HAL is compared with the original PLIB library call presented earlier.

Figure 14 - A small portion of the help docment that is used to set a SPI Peripherals Clock speed (or as they call it: Baud Rate).

To use this you only need to know the peripheral bus clock speed that you set in the Harmony Clock Configurator, which in my case was 100 MHz as I am using a PIC32MZ chip that is running on a 200 MHz system clock. Then you also input the desired SPI clock speed and the function will do the rest.

We also need to get the proper module_id index, if you touch the blue hyperlink as shown in the figure above, the help file will show you the definition for the SPI_MODULE_ID enum as follows,
Figure 15 - Touching the SPI_MODULE_ID enum hyperlink in the CHM file will bring up the definition of that enum.

We know that we setup the SPI module as the physical peripheral #1 so we would want to use: “SPI_ID_1” in all our code when working with this peripheral.

Let’s add this SPI clock rate set to our APP_Initialize() code as follows,

Figure 16 - Making a PLIB_SPI function call to set the SPI_1 clock speed to 10 MHz. The 100 MHz is the SPI Peripheral bus speed.

You will notice that I did not have to add in any extra header files for this PLIB_SPI function to be recognized. Harmony has done that setup for us the moment we told it to use the SPI Driver library and placed the proper header files in “app.h” for you.

As you go around exploring in Harmony you will also find that you can get the clock speeds through API calls. In this instance you can get the SPI default clock source PBCLK2 value by using the call,


Then the code example can be made more robust to future clock changes by making the call like this,

Figure 17 - Changing the SPI Clock Speed using not only the PLIB_SPI function, but also getting the peripheral bus clock frequency at runtime.

Notice that you do not need to add in any header functions for the library “SYS_CLK” either as Harmony has added those linkages for us in “app.h” when we added the specific driver to the Harmony Configurator.

Now let’s add a SPI write command to the ‘Big Loop’ that we wrote in APP_Tasks(), as shown below,

Figure 18 - The function APP_Tasks() above writes byte 0x55 to the SPI port once every second, forever.

At this point some have asked: “Isn’t using the registers directly faster and smaller than calling these PLIB functions?”, the answer is: “Probably not”. The simple reason is that the definition of most of these PLIB functions is to include an ‘inline’ declaration. This tells the compiler to prefer to ‘inline’ the generated code instead of making a function call. As long as the XC32 compiler optimization level is set to ‘O1’ or greater, then inlining of functions is enabled. Note: You will find that optimization level ‘O1’ is quite aggressive and really minimizes the generated code.

Figure 19 - Almost all of the Harmony PLIB statements are marked to the compiler as “Inline”

PLIB_SPI_BufferWrite() is declared “inline” by this “PLIB_INLINE_API” statement.

PLIB_SPI_BufferWrite() then calls another abstracted function and that function is also declared as ‘inline,’ so the chances of really having these PLIB function calls end up as ‘inline code’ is pretty high.

Other Useful SPI Functions

Browse around the help_harmony SPI “Library Interface” for other useful SPI functionality. Basically if you need to do something with the SPI it will be there, including a function to check to see if the SPI is done transmitting the last character,


This is a useful command to check to see if the SPI hardware is done with the last character being sent before sending another character.

Other Useful PLIB Libraries

Other common PLIB libraries are available for nearly all the PIC32 series peripherals. To find the overview page of all of them, simply search for: “Peripheral Libraries Help” in the help_harmony.chm help file as shown below,

Figure 20 - This simple search will get you to a list of all the PLIB API’s in Harmony. 


I have shown how to use the Harmony Configurator and the Harmony Static Peripheral Libraries as a means to do “Bare Metal” programming on a PIC32 device peripheral. Step by step, the code startup process was described and followed via the Harmony generated application files: main.c, app.h and app.c. Also shown was how to find and use the other “Bare Metal” or as Microchip Harmony calls then the “Static” peripheral drivers.

In future projects you can simply setup everything in Harmony, generate the code, then focus on only App_Initialize() and APP_Tasks() and you don’t have to worry about anything else that Harmony generates or calls first. This is pretty much what every Microprocessor suppliers code configurators do and the advantage of using Harmony is it has setup a consistent HAL layer of software that is portable between projects and even between most chips in the PIC32 series.

Harmony when used this way does not overwhelm the chip with a large or burdensome ‘Framework’ as many people seem to think and fear, in fact the Harmony ‘footprint’ on a PIC32 is quite small and the biggest point of all: The Harmony PLIB’s get your project running quickly with very little detailed knowledge of the hardware registers required.


[1] Microchip also has a Youtube Channel where you can find some useful Harmony Tutorials.

[2] Harmony static and dynamic driver explanation,

[3] Wikipedia article on SPI,

Article By: Steve Hageman /

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).

Monday, June 24, 2019

Do Your Customers A Favor – Make your Data Plots Useful

Make your data sheets more useful by providing graphs that actually help your customers. How so? Read on...

Example 1 -

Jim Williams and Guy Hoover designed a low distortion oscillator some time back [1]. This application note made use of a LED controlled Photo Resistor that was used to adjust the gain of the oscillator. This part has since gone obsolete and the replacement part provides a curve of LED Current Vs. Resistance that looks like this,

Technically the above curve is correct, but, the interesting part of the curve is in the range of 0 to 5 mA, that is where all the useful resistance change happens. If the curve is plotted on a log chart then it becomes much more customer friendly or useful as shown below,

Same data, simply plotted with the X-Axis set to logarithmic. Now the curve is useful to the end customer, who incidentally is a circuit designer who needs to know how the resistance changes in the useful control range of LED current. For extra credit: Also plot a curves of the response over temperature as this information is needed for a complete circuit design.

Example 2 -

A certain well respected manufacturer of IC’s sells a low noise OPAMP and provides a curve like this one to document the Voltage Noise Vs Frequency. As in the last example the interesting part of the
curve to any designer is the very first data point of this chart, not the 99.9% shown.

This curve is better than most manufacturers in that it seems to be a 'Real' instrument trace instead of the 'Artists Conception' that we normally see on data sheets, but the useful noise detail in the low frequency 1/f region is obscured. This could have been easily solved by switching the instrument to measure ‘Log Frequency' mode. I have yet to see a low frequency FFT analyzer that doe not include this setting.

Usually Voltage Noise versus Frequency of an OPAMP is presented more usefully as this one above that I made for a LT6910-1 at a gain of 100. The interesting portion of the curve is in the 1/f region. Again, Log Scaling for frequency fixes the problem.

Example 3 -

It is important to have some 'domain knowledge' of your customers needs and what you are measuring. Here is a plot for an wide band receiving antenna that is similar to one I once was sent by another engineer. He measured this antenna and sent me the plot as a Smith Chart because that’s what he figured RF engineers wanted to see – a Smith Chart.

The above Smith Chart is a valid plot, but about all you can determine is that the antenna does seem to have several minor resonances going on and at one point, low in frequency it is actually pretty close to 50 Ohms (Where the marker is). Anything else, really is impossible to determine from this presentation.

Better to plot the antenna’s SWR [2] over the same frequency range as above, now you can see that the antenna is resonant at about 300 MHz and has a decent SWR (< 5:1) for receiving applications in the frequency range of 800 – 1200 MHz. Further I can determine what the reactance is at any frequency by the common SWR formula,

Where Zo is the characteristic system impedance which is usually 50 Ohms for RF work and ZL is the load impedance, in this case the antenna impedance at a given frequency [2].


It is really easy to actually help your customers, even if you don’t have the specific domain knowledge that they have. You can (and should) always look at competitors data sheets and use the best available format for each curve or table. As a sanity check, take your data sheet to a few customers and ask them what they think, this will provide immediate and valuable feedback.

As a final note: It is difficult for any manufacturer to keep everything up to date, as silly errors invariably creep in. Hence feedback is essential. Most datasheets that I now see have a HTML link to a help, feedback support site or email printed right on them, usually in the data sheet footer. There is no better way to get feedback than to ask for it and really no better feedback than from your customers right then when they are reading your data sheet and trying to use it.


[1] Linear Technology Application Note 132, “Fidelity Testing for A to D Converters”

[2] Wikipedia article on SWR.

Article By: Steve Hageman /

We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.

Note: This Blog does not use cookies (other than the edible ones).