Description
I'm pretty sure I found a bug in the firmware for the Atmega 16U2 serial-to-usb converter. This appears to be present on the UNO, Mega and Due boards (and possibly others that use the 16U2 as a usb-to-serial converter).
I have only been able to reproduce this at high baud rates (specifically 250kbaud). Transmission seems to be fine if it only goes in one direction but occasional errors occur for bi-directional traffic. It is fairly easy to reproduce: Upload the following sketch to an Arduino UNO, obviously only one that has a 16U2 (not a clone with an FTDI or other USB-to-serial converter):
void setup() { Serial.begin(250000); }
void loop() { Serial.write('.'); }
Then open Putty, connect to the UNO at 250000 baud, using the USB port. Make sure local echo is disabled. The screen should fill with ".". Now in the Putty window hold down any key so it gets sent to the UNO. You should notice that every now and then the sequence of "." gets briefly interrupted by another character. If you manage to stop the Putty display before it scrolls off, you will see it is the character you pressed on the keyboard, followed by two (sometimes just one) other incorrect character. For example:
I was able to capture what's going on using a logic analyzer. It always happens when a USB "out"
request (host sending data) is followed closely by an "in" request (data sent to host) that is
64 bytes long. Here's an overview of such a transaction (the shorter block is the OUT request,
the longer one is the IN request):
Here's what the OUT request looks like:
Note that the data is 'X' (0x58) followed by the CRC 0x4541 (i.e. bytes 0x58, 0x41, 0x45)
Now look at the end of the data packet in response to the IN request:
You can see the sequence 0x58, 0x41, 0x45 embedded into the stream of 0x2E (".").
I was able to reproduce this multiple times with different characters - it is clear that the OUT character followed by its CRC checksum gets embedded into the IN data.
The curious part is the CRC bytes, since the software should not even have access to this, it should be transparently handled by the USB hardware in the 16U2. So how could this possibly get embedded into the IN data? The only explanations I could come up with were either a hardware fault or an invalid hardware configuration. Since I was able to reproduce it on multiple different boards (and assuming that there is no major fault in the 16U2's USB implementation) a configuration issue is the most likely cause.
Looking through the arduino-usbserial.c code I couldn't find any obvious issue (no wonder since this has been running on Arduinos for more than a decade). However, while reading the Atmega 16U2 datasheet, I came across a paragraph in section 20.6 (USB Memory management):
The reservation of an Endpoint can only be made in the increasing order (Endpoint 0 to the last Endpoint). The firmware shall thus configure them in the same order.
A bit further down in the same section an example is shown how incorrect memory management can lead to a memory conflict where two endpoints (IN/OUT) partially share the same memory region. This fits nicely with the observation of "OUT" data being embedded in an "IN" data packet, especially the CRC which is only visible to the USB hardware.
Looking at the arduino-usbserial code, especially the CDC_*_EPNUM definitions in Descriptors.h, the VirtualSerial_CDC_Interface definition in Arduino-usbserial.c and the CDC_Device_ConfigureEndpoints function in LUFA's CDC.c file it is clear that the endpoints get initialized in order: 3 (in), 4 (out), 2 (notification). This clearly violates the requirement of reserving the endpoints in increasing order.
As a test I changed the definitions in Descriptors.h such that CDC_TX_EPNUM (in) is 2, CDC_RX_EPNUM (out) is 3 and CDC_NOTIFICATION_EPNUM is 4 - reflecting the order in which the three endpoints get initialized in function CDC_Device_ConfigureEndpoints.
After that change I was not able to reproduce the issue anymore. Bidirectional traffic is no problem anymore.
I suspect that this is only a problem at higher baud rates because the memory overlap is towards the end of the IN buffer. For lower baud rates data does not arrive at the 16U2 serial port fast enough to fill the IN buffer up to the overlap region before the buffer gets cleared by the next USB "IN" request. And obviously this only happens if IN/OUT requests are intermingled. If data only gets transmitted one way then everything seems fine.
Hope this helps! Please let me know if you have any questions.