Wednesday, August 9, 2017

FTDI / FT232R USB to Serial Bridge Throughput Optimization



Using a USB / UART Bridge IC like the FTDI FT232R is kid of a funny thing, what with "Latency Timers",  “Packets”, "Buffers" and what not, you soon find that it is not like a traditional RS232 port on a PC. Meaning: At higher BAUD rates you will probably notice that it is slower than your old PC with a dedicated Serial port.

Problem is you can't get a PC with a serial port anymore so we are kind of forced to use the USB equivalent.

There are ways to speed up the USB / UART bridge however, so read on (for the remainder of the article I will call the USB / UART bridge by the simple name: Bridge).

USB Transfer Background

A very popular, reliable and low cost Bridge chip is the FTDI FT232R. These FTDI chips have various settings: Baud rate, Packet Size, Latency Timer, Data Buffer and Flow Control pins and these all conspire together to meter data flowing across the USB link [1].

Baud Rate – This is the rate at which the FT232R UART will communicate to the attached downstream serial port. Normally on the downstream side this will connect to a processors UART port. The maximum BAUD rate is dependent on the application. If the end user is going to use some prewritten application, like a terminal program then there will be some constraints that you won’t be able to exceed. 115.2k is at least the minimum BAUD rate that every modern application would be expected to support. Most recent PC applications will support several BAUD rates above this, usually up to 921.6k.

The FTDI driver also has the ability to ‘alias’ higher rates to lower numbers to fool PC Applications into supporting faster BAUD rates. See reference 2 for more information on how to do this.

If you are going to write your own application and plan on using the FTDI DLL interface, instead of the Virtual Com Port (VCP) then you can use any BAUD rate that the processor and FT232R will mutually support. The maximum BAUD rate for the FT232R chip is 3M BAUD, which is easily supported on all modern 32 bit processors.

Note that there is no physical UART on the PC side, so Baud Rate means nothing on the PC side. This parameter is passed to the FTDI chip and that is how it sets it’s downstream BAUD rate to the attached processor, which does have a physical UART.

Packet Size – In these kinds of USB transfers the basic data packet size is 64 bytes. FTDI reserves 2 byes for it’s own use, so the user gets 62 bytes of data to use in every USB packet transfer. The packet size is fixed by the way USB works and can’t be changed.

The Data Buffer – The data buffer is set to 4k bytes by default (in the driver .INF file) and its part in metering data is this: The driver requests a certain amount of data from the device. The driver will continue to receive data until the buffer fills or the latency timer times out, then a USB transfer to the upstream program in the PC will take place. Valid buffer sizes are from 64 to 65536 in steps of 64 bytes. In the FTDI DLL Driver [3] the buffer size can be set by the command,

 FT_SetUSBParameters (FT_HANDLE ftHandle, DWORD dwInTransferSize, DWORD
dwOutTransferSize)

Note: Only the InTransferSize can be set, the OutTransferSize parameter is only a placeholder and with the FT232 does nothing [3].

Note: The data buffer is physically on the upstream PC Side and is held in the USB Host Controller Driver.

The Latency Timer – This timer is set to 16 milliseconds (mSec) by default. If there is data in the Bridge when the latency timer times out then this data is sent. So at worst (with default settings) there is a 16 mSec delay in getting small amounts of data across the link. Valid Latency values are from 2-255 mSec. In the FTDI DLL Driver [3] the latency can be set by the command,

    FT_SetLatencyTimer (FT_HANDLE ftHandle, UCHAR ucTimer)

Note: The Latency Timer is physically on the upstream PC Side and is implemented in the USB Host Controller Driver.

Flow Control Pins – For the FT232 chip, the control pins have special meaning and functions. If the downstream side changes the state on of one of the flow control pins then the buffer, empty or not is sent in the next possible moment. This can be used to advantage by the downstream processor to signal the end of a transmission and to get the data sent ASAP to the PC. The PC can also control the downstream flow control lines. For instance the DTR line can be connected to the DSR line at the FT232R chip. Then the PC can change the state of the DTR line, which will cause the DSR line to change state and the transfer will immediately be initiated. All of the downstream to upstream Flow Control pins operate at the same priority, so there is no advantage to using one over the other if you are using them to just initiate a data transfer quickly.

Note: If you are using the Virtual Com Port (VCP) interface the Baud Rate, Latency and Buffer size can be controlled by: Editing the USB Driver “.INF” file, changing the appropriate registry keys or by using the Device Manager. There is no simple programming interface that I am aware of. Yet another reason to use the FTDI DLL for interfacing instead of the VCP, especially for applications that use custom programming on the PC side.

Understanding The Problem

Maximum throughput occurs when the maximum number of packets are sent in the minimum time. As a designer you have control over the: Baud Rate, Latency Timer, Buffer Size and the Flow Control pins.

So it is obvious that some combination of these four parameters will result in the fastest possible data transfer rate. The question then becomes: What are the optimum settings?

First we should study how my particular downstream processor and the PC communicate.

My application for this example is the control of an external digitizer. The external digitizer instrument has a 32 bit processor that is connected to the FT232R chip through a UART in the processor. My command structure is always a “Command / Response” type of communication.

Case 1: To start a digitizing data capture I can send a trigger command from the PC like: “TRIG:IMM” the command string is terminated with a “Linefeed” character. This command is decoded in the instrument processor to signify that a TRIGger / IMMediate command should take place and the downstream processor starts the data acquisition process.

To keep the PC and instrument in sync, the digitizer then sends back an acknowledgment when the command has finished. The acknowledgment chosen is the same ‘Linefeed’ character.

This way the PC and the downstream processor can always stay in sync and they both know when the communication has finished because they both wait for the ‘Linefeed’ character before proceeding on. I typically don’t use any other handshaking (or flow control).

In my simplest case (as above) the command from the PC might be 1 to 20 characters long and the response is always just a single character (the ‘Linefeed’).

Case 2: Is when the digitizer has captured all the data and the data is sent back to the PC for analysis. Again a simple command is sent from the PC to the Instrument like: “TRAC:A?”, meaning: Get data TRACe for channel A. This is followed by the ‘Linefeed’ terminator and then the fun starts. There might be a lot of data captured in my instrument that has to be transferred back to the PC. The standard capture is 1024 bytes of 16 bit ADC data. These ASCII values of data are separated by commas so a pretty worst case transfer might be something like,

    “65535,” repeated 1024 times and terminated with a ‘Linefeed’

This is 6 x 1024 + 1 characters or 6145 characters total. With a setting the of 3 M BAUD the processor can pump this data out to the FT232 chip in a little over 20 mSec. This was confirmed with a scope, the processor can easily pass this amount of data in this time without interruption or gaps.

The minimum case would be if the ADC Data was all zeros. In this case the transfer would be,

    “0,” repeated 1024 times and terminated with a ‘Linefeed’

This is 2 x 1024 + 1 characters or 2049 characters total.

It can be seen that even with a fixed length of data points to send back to the PC, if leading zeros are suppressed then the data could be anywhere from 2049 to 6145 characters total. Any optimization would have to take this into account.


Optimizing The Parameters

For Case 1: Where the command size is something around 10 characters and the return is simply the ‘Linefeed’ character, the buffer will never fill and the only way to minimize the transfer time is to set the latency to 2 mSec or use one of the Flow Control Lines to force a transfer.

For Case 2: Where the upstream data is large the proper choice of Buffer and Latency is not so clear.

Naturally as an engineer I read the FTDI optimization application note [1] and took it’s suggestions for setting the latency and buffer size and tested this by measuring and averaging 100 transfers. To my surprise the ‘Improved’ settings gave about the same average transfer speed as the default settings.

So then I started hacking settings by changing the parameters 20% either way and looking at the results – still nothing conclusive and I wasn’t able to converge on a higher transfer rate. I started to wonder: If the maximum transfer rate might be a sharp function of some combination of the latency and transfer rate, how would I determine this? I would likely miss it by hacking a few values at random and this wasn’t getting me anywhere anyway.

I turned to my old Friend: “Monty Carlo” as in the “Monty Carlo Method”. This is a trusty way of randomly picking a lot of values, applying them and then and then seeing what the result is. Monty Carlo analysis is useful when you don’t have a clear understanding of the underlying functions that are controlling something. You will be less likely to miss some narrow response if you randomly pick enough values than if you use an orderly method to step the values.

I wrapped a loop around my benchmarking routine and set it out to capture 5000 random Latency and Buffer Size parameter variations. I also set the test program to run as a EXE and not in the development environment, to remove that as a source of variation, and I didn’t use the test PC for anything else during the run.

Just looking at the raw data, the fastest to the slowest transfer time was: 0.0306 to 0.348 Seconds or 11X speed difference. The Default data rate with a 16 mSec Latency Timer and 4k Buffer was: 0.054 Seconds.  Changing the default to the fastest setting could result in a 54/30 or 1.8X speed increase. That’s worthwhile pursuing.

Looking at the raw data some more, the fastest 21 transfer times all had buffer sizes of 64 bytes. There is a conclusion right there without even having to plot the data!

Being curious, I did plot the entire 5000 points of data and the results are shown in Figure 1. There are some outliers which can probably be explained by the Windows Operating System going off and doing something else during a data transfer, but the large majority of points fall in a small band of values.





 

Figure 1 – A random selection of points was made for the Buffer Size and Latency, at each point an average of 10, 6145 byte transfers was made and recorded (Vertical Axis). A few features can be seen: A ‘rift’ is visible along the Buffer Size axis. Generally the minimum transfer time is with small Buffer and Latency values (lower front corner of the plot).
 

The ‘rift’ is an interesting feature of Figure 1. Figure 2 is a zoomed in look at that feature with the plot rotated so that the Transfer Time variation is flattened.





Figure 2 – A zoomed and rotated in look of Figure 1. Now the effect of transfer speed (Vertical Axis) on buffer size can be clearly seen. It does in fact have a minimum at around 6145 bytes and sub-multiples of that. However a minimum can also be seen at the smallest buffer sizes. Note: Since the 3D curve was rotated down to get all the slope out of the curve, the transfer time (Vertical Axis) is not valid for values anymore, it only is a relative measure: Lower on the graph is a faster overall transfer time.
 

Figure 2 shows the 3D plot flattened on the Buffer Size Axis – A few clear trends are present, the overall transfer time is minimized as the transfer buffer is reduced, reaching a minimum at the right end of the scale or, 64 bytes. Also, there is a minimum at around 6145 bytes and sub-multiples of 6145 bytes. This is predicted by the FTDI application note [1].
 


Figure 3 – A zoomed in view of small buffer sizes and latency numbers with a 6145 character transfer. Here the minimum can be clearly seen. Setting the Buffer size to 64 bytes and the latency to less than 10 results in the minimum transfer time over the other cases by nearly 10%. The rightmost curve in the plot above tells the story.
Figure 3 shows a zoomed in portion of Figure 1, for small buffer sizes and small latencies. Here it can be seen that the lowest transfer time is when the Latency is set to the minimum. The lower total transfer time group is when the buffer is 64 Bytes (rightmost curve in figure 3).

To analyze these results fully, I sorted the data and found the 24 fastest transfers. These all had a 64 byte buffer, as figure 3 predicted. Then I plotted the transfer time versus latency as shown in figure 4.


Figure 4 – The 24 fastest transfers of 6145 characters all had a 64 byte buffer. Plotting this set of data versus Latency showed that there is a very small local minimum here, but the difference in transfer time from 2 to 7 mSec Latency setting is less than 1 part in 30.

Figure 4 showed that there is indeed a local minimum in transfer speed, but the difference is so small that there really isn’t any appreciable difference for any Latency Value from 2 to 10 mSec.

Figure 5 – As a verification, I also did a summary plot of 2049 character transfers to make sure that the optimization worked for the smallest typical data set too. This plot follows the same trend of figure 4.As before any Latency Value from 2 to 10 mSec results in a very low transfer time.

Summary

It’s is really easy, for this example to minimize the USB transfer time for the two use cases in my project just by setting the buffer size to 64 and the Latency to 2 mSec.

This is far simpler than what reference 1 would lead you to believe, but it has been proven by actual measurements. Using these settings also eliminates the need to use the control lines to force a transfer as this won’t shave any time off the transfer time when you have the latency set to its minimum anyway.

As for PC performance: Even on my low end i7 core based notebook, running Windows 7 / x64,  I don’t notice any operating system sluggishness or excessive processor loading using these settings. So there don’t seem to be any downsides to this.

If any sluggishness is noticed, the Latency can be set as high as 10 mSec (5X Higher) with no appreciable reduction in the large data transfer and only an 8 mSec penalty in response time for the single character case (Chase 1), which may not be noticed or that important in the overall scheme of things.

If a higher Latency option was selected it might be wise to wire up one of the FT232R’s flow control lines and to have the downstream processor toggle this at the end of every command to maximize the speed of the single character transfer case.

As a final note: The FTDI Latency and Buffer size settings can be changed at any time after a FTDI USB device is opened for communication, and they take effect immediately. The elapsed time to set both parameters is less than 1 mSec so there is not much time penalty in actively managing the settings as a program runs.

This exercise lowered the transfer time in my application for 6145 characters from 0.052 to 0.031 Seconds, a factor of 8X for Case 1, a small 1 character upstream transfer, to a 1.6X speed improvement for Case 2, 6145 character upstream data transfers. I can now achieve an overall 1.9 Million Bits per Second transfer rate without changing the hardware at all. That’s time well spent tweaking a few software parameters.

Caveat Emptor

It should be noted that the USB bus is a cooperative and possibly multi device system that has a limited overall bandwidth. My applications requirement always specify that my devices are the only devices on the bus consuming any bandwidth for maximum speed. This may not always be the case. For instance there may be a wireless Mouse attached or a Disk Drive, etc.

Even though you may think that your ‘widget’ is the only device in the world, you can never be sure what your customers may try to inter-operate with. It is wise therefore to test and to make sure that your application and its settings can withstand other things going on in the PC at the same time. I personally have written a USB disk drive file mover application that I run on the PC while doing stress testing of my applications. This application consumes a large amount of USB bandwidth by copying large files back and forth across the USB interface to a USB disk drive in the background while I run my application in the foreground looking for transfer issues.

References:

[1] “AN232B-04 Data Throughput, Latency and Handshaking”, Published by: Future Technology Devices International Limited, www.ftdichip.com

[2] “AN120 Aliasing VCP Baud Rates”, Published by: Future Technology Devices International Limited, www.ftdichip.com

[3] “D2XX Programmers Guide”, Published by: Future Technology Devices International Limited, www.ftdichip.com



Article By: Steve Hageman www.AnalogHome.com     
We design custom: Analog, RF and Embedded systems for a wide variety of industrial and commercial clients. Please feel free to contact us if we can help on your next project.  
This Blog does not use cookies (other than the edible ones). 


1 comment: