Search This Blog

Monday, October 11, 2021

Preventing Design and Support 'Oopsies'

    

This is a story of how easy it is to have ‘Oopsies’ in the development of products, new, and old. It’s all related to the Data Sheet, and mostly out of your control.

Case 1:

Over a decade ago new FPGA designs came online and naturally everyone used one since they were 'big and fast'. In one FPGA project I was associated with we could not get the FPGA to ‘turn over’, this was perplexing. Everything looked wired OK, yet the FPGA would not start. Since everything looked to be done according to the data sheet, what was the issue? Everyone started to take an interest in this and another engineer downloaded the datasheet directly from the manufacturer's website ‘freshly’ as it were.

Well – ‘Big Surprise’ !!!! This new revision of the datasheet had added in the fact that the power supplies were required to be sequenced in a certain way for proper operation, where the previous datasheet said nothing of the sort! Once the Power Supplies were sequenced as per the latest datasheet – then the FPGA started right up.

Now you know why there was suddenly a rash of FPGA power supply sequencing IC’s released a decade ago!

Lesson learned: At the start, middle, and end of the current project, download ALL the critical parts datasheets and check them for changes (at least check the revision date). IC manufacturers are rushing just as fast as everyone else, and you have no idea when they will find an ‘issue’ and report it.

Aside: This is probably the reason why so many so many FPGA designers are Bald, they didn’t start out that way, it just comes from scratching their head so much.
 

Why FPGA Designers are mostly bald! 

Photo by Ketut Subiyanto from Pexels, used with permission.


Case 2:

Last year I started working on an Air-quality Data Logger. This was a battery-powered device with an LCD and SD Card for data logging. I started the project and immediately found that the I2C connection to the Air-quality sensor was not reliable at the rated 100 kHz clock speed. But seemed to work at 40 kHz and below, well to measure Air Quality doesn’t take Gigabits of bandwidth, so I started running reliability tests at 40 kHz, by sending random commands and making random measurements and found that there were no connection issues. I have used the I2C with this particular hardware setup for years with no issues at all, so I doubted that the issue was on my end. The fact that I could not reach the datasheet speeds was still disconcerting, but you know – put off till later.

As things go this project got put aside for more pressing projects, and about 6 months later I had time to start it up again. I found the same issue and I found that if the Air quality Sensor hung a power cycle was required to clear it. The operation mode was to have the air quality sensor off for 10 minutes then turn it on for one minute to make a measurement then turn it off again, so this seemed like a sure-fire way to clear any errors that may happen in real use.

Since it had been six months I had in the meantime forgotten what the sensors units of measure were, so on a summer afternoon break under a big Oak Tree, I downloaded that datasheet again on my Tablet to refresh my memory….. and – well you know what I saw! Yes, the datasheet had been updated the previous month to show that a non-standard I2C delay needed to be inserted between the writing of the command byte and the subsequent reading of the data. Normally an I2C compliant device would use “Clock Stretching” to achieve the required delay, but…

When I got back to the lab, I modified my I2C driver and – Viola! No communication issues even at 400 kHz!

Lesson learned: Don’t ignore issues like this, at the very least call the manufacturer or see if they have a user forum, or search the web for some forum entries on the part to see if anyone else has found the issue you have. If I hadn’t had the project break, I would have not found this datasheet change and would not have had the confidence to know if the driver was robust or not.

Aside 1: This is not the first time that I have had a timing issue between sending a command to some device and having some issue with how long it takes the sensor to process the command and send the result. So hopefully in the future, if I see this problem again I will hook up my logic analyzer and vary the delay between the command to return to see if the issue can be resolved as experience has shown this is the most likely issue.

Aside 2: The most common issue with I2C devices is the Stop/Start versus Repeated Start operation, while in theory, most devices like EEPROM’s work equally well with either a Stop/Start or Repeated Start sequence, many modern and complex IC’s do not. Also, many data sheets are unclear as to their particular device's exact requirements. So if you can’t get your device to respond correctly try exchanging a Stop/Start with a Repeated Start and visa versa.

  

STM32 I2C Register Configuration Bits


Case 3:

I have used this one microprocessor for several project generations, so there is little risk when updating one generations product to the next, and that is mostly limited to the changed parts of the design that happens between generations. I was wrapping up the Firmware Drivers on one of the latest projects and out of the blue, I decided that I should check the latest Errata on the part to see if anything got better. Sometimes, just sometimes, Silicon changes for the better between revisions, after all.

As I read the newest Errata, which had only been updated the month before, I was aghast to see that the required processor wait states had increased for the newest silicon revision! This doesn’t affect any parts already built, but certainly does affect any new parts that might be built, and the scary part is it was by total happenstance that I found the issue to begin with. A simple fix for this latest project, but I needed to go ‘touch’ all the previous Firmware on any shipping product to roll this fix it. That entails doing at least some targeted QA on all the projects to look for unintended side effects also. This could be a large unplanned cost.

Think about what can happen here: Someone designs a product, it works, then they move onto something else, and as long as it keeps working, they never look back, yet the part changed its characteristics in the meantime, which can now cause ‘corner case’ failures where there were none before it. The issue is: no one is in charge of ‘Maintenance’, so if someone doesn’t notice this change you can end up with a lot of iffy parts in the field that fails for baffling reasons, under certain conditions.

Lesson learned: Check the Datasheet for Errata at the beginning and end of every project to make sure that you understand the changes that have happened, don’t be lulled by ‘Familiarity Bias’. Don’t rely on the manufacturer issuing a PCN (Product Change Notice), because everyone has a different ‘definition’ of what a PCN is and this change did not elicit that kind of trigger from this manufacturer.

Aside 1: Sometimes Erratas are buried in the pages of the data sheets – most Microprocessors have 500-page plus data sheets and who can get through that? So look at the end of the datasheet where the Revision History is kept and try to discern when and where the changes have been made.

   

Microchip dsPIC Silicon Revision Eratta

Bottom Line:

A small number of manufacturers, and even some distributors allow you to sign up for ‘product updates’ – Always do this. Sadly not enough do, also sadly most manufacturers don’t use PCN’s in the spirit that they were intended: “Any change to form fit or function that might cause a customer grief”.

The only thing you can do is to keep checking the datasheets for updates, especially when actively working on a project. It is a less than ideal situation, but a very important one.



Article By: Steve Hageman / www.AnalogHome.com
We design custom: Analog, RF, and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help with your next project.

Note: This Blog does not use cookies (other than the edible ones).




No comments:

Post a Comment