Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information on Wire.h speed (not an issue, but you don't have discussions enabled. #52

Open
SpenceKonde opened this issue Feb 28, 2023 · 1 comment

Comments

@SpenceKonde
Copy link

SpenceKonde commented Feb 28, 2023

Test results from AVR128DB48, a flagship modern AVR
This went on a bit longer than I intended.

The data

Wire Clock Pullups F_CPU Fill time Notes
100000 Omitted 24 MHz 370-380 ms Image doesn't change
200000 Omitted 24 MHz 370-380 ms Image doesn't change
100000 usePullups() 24 MHz 101 ms Standard Mode
200000 usePullups() 24 MHz 51.4 ms
400000 usePullups() 24 MHz 29.1 ms Fast Mode
400000 usePullups() 32 MHz 27.8 ms Overclocked Int. Osc
400000 usePullups() 16 MHz 31.6 ms
400000 4.7k 24 MHz 29.0 ms
600000 4.7k 24 MHz 21.0 ms
600000 4.7k 24 MHz 22.0 ms With scope on scl
800000 4.7k 24 MHz 17.0 ms
1000000 4.7k 24 MHz 15.0 ms Fast Mode Plus
1000000 2.2k 24 MHz 15.0 ms No improvement
1000000 2.2k 48 MHz 12.1 ms Massive Overclock
400000 2.2k 48 MHz 25.6 ms Massive Overclock

NanoDB background So I wired up a screen to one of my Nano DB+ boards that I had out for testing
(it's the Rev. - version, which I rarely get right - but these actually appear to have come out just about flawlessly. Out of a run of 20, only 2 needed any rework at all (a tombstoned capacitor and a single bridge on the 0.4mm pitch QFN. The same chip in TQFP I typically have to do minor rework after reflow on almost half of the boards to clear bridges, and the damned things have virtually no self-centering because of the lack of a center pad). Plus maybe a few that needed work on the USB connector, which is always a pain - but that's the price for USB connectors that have pins going through the board, so the board would break in half before the connector got ripped off the board. This new solder I've found is amazing and just works miracles - lead free, composition known, melts under 200C, and verified to be lead free with lead test swabs (the common "S600 Lead free 183C" solder is actually leaded according to the lead test swabs. It beat out the previous lead-free contender, which melted at "158C", tested negative for lead, but wasn't normal horribad SnBiAg (never bother with that solder that's 64-65% Sn, 35% Bi and 0-1% silver. It is absolutely abysmal.).

So they've joined the queue of parts waiting for me to write descriptions so I can sell them). It uses an AVR128DB48 with the extended temperature range. It's part of the current AVR flagship DB-series (neither of the next two announced parts are going to change that: EA and EB have a fancy new ADC but the EA only goes up to 48 pins, 20 MHz and 64k (and the EB has less flash and fewer pins), while the DB's go up to 64 pins, 128k flash, and 24 MHz (rated - they overclock insanely well)). I modified the sketch to print out the time that it recorded before the >> 8 and used that to calculate the numbers so I would get more precise measurements.

Back to the process
Being a bit groggy at the time, I failed to check whether there were pullups on the breakout board, and just wired it up.
I got speeds even more miserable than you recorded for Wire on the t85, and increasing the Wire clock didn't change it. And the image on the screen didn't change either: I think that 370ms time is how long it takes for however many I2C transactions it takes to fill the screen to time out. That pointed to insufficient pullups.

note about I2C master and scl clock
The I2C master actually waits for the clock to go high before pausing for a specified amount of time. That's why setClock() is is complicated and the numbers you pass to it aren't the actual clock speed. (notice how connecting a scope probe to it slows it down - I measured 606-616 kHz when I had the scope on it). That is NOT a trivial change either - the calculations suggest that 15-16 of those 21ms that it takes are the part that depends on the wire clock, so that scope probe slowed the bus by something on the order of 5-7%! On DxCore and megaTinyCore (which use the same Wire library - actually almost all the libraries that both ship with are identical), the Wire baud calculations behind setClock() as supplied by the arduino core for the uno wifi/nano every were wrong, and then were rewritten by me twice (both times incorrectly), then by a user via PR (which was much less wrong), and then again (correctly this time) by our hero, @MX682X who subsequently pretty much rewrote all of Wire so it supports multi-master on all modern AVRs, and Dual Mode can be used where the hardware supports it - I also added a few hacky mechanisms for a slave to get the information it needs (particularly the number of bytes the master read, so it can update it's pointer) to implement an I2C interface that feels like I2C rather than a weird serial port (until then, I never understood why all the I2C slave devices made with Arduino had such a crude interface... but stock Wire.h doesn't give you the tools you would need to make any other kind!). The new Wire does all that with a smaller flash footprint too). Coincidentally, it was the same finnicky, non-I2C-compliant part which was highly SCL clock sensitive that led to both of those individuals examining setClock.

On pullups and external pullups
Okay, so we need pullups. On parts with proper TWI peripherals, even classic AVRs, you could use the internal pullups and although the internal pullups are way weaker than the spec says you need, it's often enough for a small bus without much on it. On classic AVRs, the internal pullups are always turned on by the stock Wire.h library (and all the third party cores use that stock implementation); I specifically made sure not to automatically use the internal pullups, because doing that makes debugging harder by hiding a simpler a problem, causing it to fail only when the number of devices on the bus or the length of the wires is increased, which makes the problem very hard to debug, so I decided to provide usePullups() on mTC/DxC, so someone can rely on the internal pullups, but can't do so unknowingly - they have to effectively declare "I know this isn't above board, and if I2C breaks, the first thing I'll try is adding pullups".

notes on how much of the time is going to I2C
So, using the pullups made things work: The screen flickered and eventually displayed a number. That got the times down to 101 ms at default speed (which is 100khz/standard mode). Setting the clock to 200 kHz nearly halved the update time. I then kicked it up to 400kHz, the spec'ed "Fast Mode" speed, and got 29.1 ms. At that point I played with the F_CPU to try to get a better idea of how much of the time it spent sending and how much of the time was used by the CPU., running it at 16 MHz, and overclocking to 32 MHz. My eventual determination was that at 24 MHz, 5.5ms of the time it takes is CPU time, and the rest is used to update

And back to our main story
It was time to stop fucking around and put on some real pullups. I assembled a 1x4 stacking pin header (the good kind of pin header comes in 1x40 and 2x40 sizes. To make smaller ones, I pull the plastic retainer off and remove the pins (and discard the plastic parts), then similarly depin a male and female normal header, discarding the metal parts, then combine them to make the desired size and color of stacking header that doesn't suck (the 1x4 stacking header you see for sale - and indeed almost all stacking pin header on the market is Garbage (with a capitol G)). Soldered it to one of my 4x4 mini protoboards (I sell them for like $2 for 8 or something like that, among many other shapes and sizes of mini-protoboard in my tindie store). Added the resistors, took out UsePullups() and got the same speed in fast mode as I did with usePullups() - not super surprising, since the wires are short and there's only the screen on the bus.

Okay, let's see what this bad boy can do...
600 kHz? Yup! 21ms. I put a scope on SCL to see how fast it actually was and calculated that it was like 606-616 kHz - but the scope probe slowed it down by a non-trivial amount. It was likely morelike 630 without the probe (makes sense, because the setClock() function would underestimate the bus capacitance because of the short wires and single device

800 kHz? Yup! 17 ms

Okay, let's try full on Fast Mode Plus - 1 MHz! Hell yeah! It works like a charm! Although we're getting diminishing returns. I wondered if we might be getting thrown off by a weak pullup - - 4.7k is normal for Fast Mode, but lower values are more typical for FM+, so let's make a 2.2k set... Nope, no difference at all!

For the hell of it
And just for fun, I ran the test on chip with a 48 MHz crystal (yes, this specimen runs fine at TWICE THE RATED SPEED at room temperature (I'm sure it wouldn't do so at the 125C maximum temperature - though I don't think there is voltage dependance above 2.1V or so on the Dx. I won't be able to test that until I get my next batch of serial adapters * - not all parts run at 48, but it's not rare for E-spec parts to work at 48. I-spec parts don't seem to be quite as good for overclocking, which makes sense - At room temperature, which chip would you expect to run faster, the one rated for 24 MHz at 105C, or the one rated for 24 MHz at 125C?

48MHzI2CSpeedTest
48 MHz and Fast Mode Plus

Anyway - These numbers give an idea of the speeds achievable on modern AVRs. Note that not all parts and not all portmux options support Fast Mode Plus (Standard and Fast Mode have limited drive strength per the I2C spec that us mere mortals aren't allowed to read. Phillips makes you pay just to read the I2C spec. FM+ uses a higher (~20mA) output drive to get faster fall-times).

Concern about the numbers you present for speed, and some information about how Wire.h is implemented on different parts on ATTinyCore, which seems to be what you use
I am suspicious that in your tests where you got those awful numbers with Wire, you may not have had external pullups installed. You say that you were using a tiny85. When using the Useless Serial Interface (USI - I think the first word is supposed to be "universal", but "useless" is a better description of it) parts like the t85 does not provide a way to enable to pullups when using USI for I2C. I am also not sure how much of setClock is implemented for ATTinyCore; It's much less flexible where it is implemented. If it works on the USI, it only works at a few fixed speeds, because the the USI is not a very helpful peripheral - all it does is clock bits in or out of the shiftregister on the appropriate edge of the clock. You want it to generate the clock too? Fat chance. Not unless you sacrifice a timer to it. We can't afford that on... well, certainly not on any part with a USI. So the clock is generated in software by writing to a USI clock strobe bit, with cyclecounting delays to get the timing right-ish). There are three tinyAVR parts that have neither a USI nor a real TWI (like the lucky 88/48 do). These unfortunate parts have a slave only TWI and nothing for TWI master: the 841, 441, and the 828. The first of those is probably the best of the classic tinyAVRs (except for the lack of any hardware assistance for I2C), and is the only tinyAVR with three timers, which would make using one to clock a USI viable, if it had one. And the 828 is just a tragedy in general - this is just another way in which it exemplifies poor execution of a great vision). On those parts, the Wire library uses a fully software/bitbanged implementation of I2C. (the 1634 also got the stupid slave TWI, and the busted pin - though at least it's busted pin is one that doesn't matter - but it managed to hang on to the USI, so ATtC uses that)

There are AFAICT no parts with enough timers that one could be dedicated to that task without causing pain to the user such that you would not want Wire to do that. All the classic AVRs with more than 2 timers with the exception of the tiny841/44 are ATmegas, not ATtinys, and I don't think any ATmega's ever shipped with the USI - it was always a way to cheap out on the peripherals that the tinies got, by replacing TWI and SPI with something that did both of them substantially worse. Modern AVRs are less timer-deprived, but they also all have at least one TWI, and it's a substantially (though not overwhelmingly) better TWI. (Well, okay, the 0-series modern tinyAVRs are timer deprived, and 2/4/8k 1-series kinda are too, since the TCD is not an easy peripheral to work with - but no sane person would choose to use them. They're strictly worse versions of the 1-series...)

Footnote about that voltage dependance and overclocking mentioned above
* To test voltage dependance of overclocking, see, I need a serial chip with a VIO pin. Those are frequently present on chips, but rarely broken out, so I have to design one. I have a very bad record with serial adapters. I designed and rev'ed one version, then abandoned it as unfit years ago, then later designed a batch of 3 serial adapters based on 3 different chips. One of them was rev'ed once before I found all the undocumented silicon bugs in Holtek's garbage serial adapter (which did have Vio - but the drivers crashed constantly, it couldn't do anything approximating arbitrary baud rates, just a limited number, and the modem control inputs were backwards and it sometimes failed to program for no reason). I built only 2 of each rev, and only one of each still "works". Another design was abandoned, and the third I went on to rev 4 times before deciding it was not going to make a good product - the Rev. D had an atrocious assembly yield of 0/24 with no rework, and maybe 75-80% with rework on all boards. Oh and we screwed up and reversed 2 parts on 24 of them causing the voltage to always be 5v instead of switchable. And built half of them with the wrong part in another location. I also recently made the mistake of trying to use FTDI's 4-port chip. Too many external components, and it put a whole pile of things that need to be as close to the chip as possible... on adjacent pins. First version failed to connect one of the power pins, so I rev'ed that once before renewing my vow to never build anything with an FTDI product again. I tried again with another serial chip and that worked, but I decided it wasn't marketable (and didn't have Vio). I've got 3 more designs going out to the board house. maybe the 14th time will be the charm, or the 15th, or the 16th. There's a deluxe single port adapter, plus two very similar dual adapters - one version is just 2 serial ports straight up, and the other has one of them wired as a UPDI programmer for programming modern AVRs, because even though all the mod requires is one diode, problems seem to be fairly common (and I've replaced 4 diodes on modded boards that had cracked somehow). All three of these have a VIO so soon I'll be able to maintain communication with a chip while varying it's voltage and "make these parts do the limbo!" and see how low I can bring the voltage while still overclocking the hell out of them (I do have one project underway which I don't think will meet it's design goals at 24 MHz - but I used E-spec parts and I don't plan to sell it at all, much less to the uninitiated, so 32, 40, or maybe even 48 is an option, and at those speeds, I'd be home free. Especially if my background WS2812 sending scheme works...

@datacute
Copy link
Owner

datacute commented Apr 2, 2023

Thanks for raising this, and thanks for your testing. Those results are impressive.
I've updated the readme (and enabled discussions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants