Random thoughts

RSS

Steady as she goes

24 July 2017

The bit-babbler 0.7 software release is now officially tagged and uploaded. For most existing users there is no strongly compelling urgency to update to this one, once again it's mostly portability improvements for new platforms and for new releases of previously supported platforms.

This one brings confirmed support for MacOS (tested on El Capitan and Sierra), OpenBSD (tested on 6.1), and for the FreeBSD 11 release. It also fixes a corner case seen by RHEL/CentOS 6 users which occurs if they use a later version of libusb than what was shipped by default, with the default kernel (which contains partial backports of later usbdevfs API functionality). And there's workarounds for a systemd update which hit the shuffle button on filesystem locations of system utilities, and for a quirk in udev rules where != doesn't always really mean not equal.

Existing FreeBSD 10 users will probably notice the most significant changes if/when they update to FreeBSD 11, since it added USB hotplug support and so we also now support it on that platform too, via their own implementation of the libusb interfaces. USB hotplug is not perfect there yet – there are notable delays (of around 4 seconds) before device notifications are sent by it, and again when trying to close our connection to receiving them, and unplugging any device which is actively in use can result in a deadlock occurring in their own (apparently not threadsafe) implementation of libusb_bulk_transfer(), but we have workarounds in our code which should limit the effects of that from annoying users too much until these problems do get fixed in FreeBSD itself.

We also had to disable a few of the optimisations which are normally enabled by -O2 on FreeBSD 11, since with its version of gcc they appear to miscompile some seemingly-harmless constructs in a way that breaks proper stack unwinding and exception handling. [If there are any FreeBSD developers reading this, we can gladly provide all the gory details of these things to anyone interested in fixing bugs in future releases of that OS. We've not seen this with the default toolchains on any other OS release. For anyone casually interested, there are more details about all this in the package changelog too.]

The OpenBSD 6.1 release likewise had a few system-specific bugs and quirks we needed to work around. Most were just normal platform variability, or related to its limited support for locales other than C or en_US.UTF8, but we did discover a significant bug in its vfprintf() implementation, which is also apparently not actually threadsafe in practice. POSIX says that this function may be a thread cancellation point, and on OpenBSD 6.1 it is implemented as one. However if a thread does get cancelled there, then it can result in this leaving its internal _thread_flockfile mutex locked, which means any future call to it (or any other system function requiring that lock) will simply deadlock.

Since we can't really test for the presence of that bug in any way useful to us, the current workaround for it is to simply always disable thread cancellation explicitly before calls to that function and instead test for cancellation requests ourselves outside of it. That at least is a complete, and otherwise future-safe, mitigation for this one. But it's still a bug in OpenBSD that other code could also hit. [So if any of its developers are reading this and need more information than that to find it, we'll be here for you too!]

Many thanks again to all the people who made requests for their preferred platforms and diligently tested the release candidates on them and reported any issues they saw. And for your patience while we shook out all the issues we could find on them before actually pushing this one out more widely as a formal and public release.

Pick a number between 1 and …

19 June 2017

Can I use the BitBabbler to generate random numbers between 1 and N, for some value of N, is a question we've been asked often enough now that it should probably get an entry in the FAQ.

But since many people who might see that question would immediately think Of course it can, it's a random number generator, and not actually read the answer – and since the actually correct answer is a little more detailed than that, and since historically a lot of people really have done that poorly, sometimes with quite significant consequences … It seemed like it was worth putting something a little more detailed up about it, which people who do need to do this, and are thinking about it possibly for the first time, might be able to find. And thus dodge the easy trap to fall into.

The BitBabbler itself just generates a potentially infinite stream of random bits. Since that equates to a potentially infinitely large number, then clearly it can generate random numbers in any arbitrary range that you like. Where it gets a bit more tricky is if you want all of those numbers to have an equal probability of being selected.

If the range of numbers you want is a perfect power of 2 in size, then this really is very trivial, you just take the number of bits you need directly from the BitBabbler's output. If you want a number between 0 and 7, or between 1 and 8, or between 3 and 10, you can just grab 4 bits from the BitBabbler, add the starting value of your range to that, and you're done. Every number in that range will have an equal probability of being selected.

By far the more common case when people ask this question though, is that they want some range of numbers which isn't a perfect power of 2. And a statistically significant number of the people who try to solve that problem for the first time will immediately reach for the obvious, easy, and wrong, solution of using the modulus function.

That's almost equally trivial to do, you just take the random numbers that you have, in whatever range they might be (so long as it is larger than the range you want), and clamp them to the range that you do want. And at first glance that works perfectly. Except for the bit about them all being equally probable. The problem seen there should be quickly obvious if we take a more careful look at what that really does.

Say you want numbers in the range 0 to 4 (or 1 to 5, it's the same problem here). You'll need at least 4 bits from the BitBabbler to obtain at least that many values, and if you clamp that range with a modulus of 5, then what you get looks like this:

Some numerals are more equal than others.
Random InputOutputProbability
0 or 500.25
1 or 610.25
2 or 720.25
330.125
440.125

You correctly get output only in the range desired, but the values 3 and 4 have only a 1 in 8 chance of being selected, while the other values instead all have a 2 in 8 probability. None of them have the 1 in 5 chance (probability 0.2) which would normally be expected. So it's clearly not the ideal distribution of outcomes for most purposes where random numbers are needed.

There are a few ways to avoid this problem when you need uniformly distributed random numbers in an arbitrary range. Each has its own set of pros and cons. In the case at hand though, where you have a plentiful supply of entropy available, then possibly the simplest option, which is easy to implement correctly and has a trivial proof of its correctness, is to do the following:

  1. Obtain enough bits from the BitBabbler to have a number which is at least as large as the desired range.
  2. If that number is larger than the desired range, discard it and go back to step 1.
  3. Add the starting value of your range to it and return the result.

Using the same example as above, this would give you a 3 in 8 chance of needing to retry each attempt, but every value returned will have the same 1 in 5 chance of being any given number in the desired range.

There are other ways to do this which waste less of the raw entropy to make each selection, but if that's really a major concern for some particular use case, then you almost surely have other considerations which much be taken into special account too. So we'll leave those as an exercise for the reader to research. This one is only slightly less intuitive than using a modulus, while being comparably simple to implement without programming error, and roughly as fast to execute.

And for the benefit of people who just want a simple program to run which will do exactly this for them, we've added a new example to our software which does it in a general way for any selected range of numbers, and which includes a self-test mode to both test the actual implementation there and to reassure anyone that the results really do look like they would expect them to for the range they want. You'll find that in doc/examples/random_int.pl of our next release.

McSecurity

6 December 2016

Supporting any platform where people needed entropy has always been an important part of this for us. The software was written with easy portability in mind – but as always for anything non-trivial, especially where interfacing to hardware is involved, we know this still means that some tweaking can be needed, both for new platforms and as existing ones gain new functionality (like USB hotplug support which still isn't as widely available on every OS yet as you might reasonably have expected it would be). We've mostly been doing that on demand, as people have requested support for their preferred system(s). In some cases though, we aren't always able to test those ourselves, and we do need those people to help by reporting what they see if things don't already Just Work out of the box for them.

MacOS has been one of the latter variety. We'd had a few queries about whether it was known to work there – but the last Apple device I'd owned was an Apple][ (which with some irony was what set me on the path of All Things Open Source from an early age – it being such an awesome enabler to have the computer's full schematics and the source to its ROM available in the back of it's manual), and nobody else here currently owns a Mac either. It wasn't until recently that a few of those people did in fact step up to report on some real testing with it.

So thanks to their help, we can now confirm this really is known to work under MacOSX. The needed changes to the software were mostly minor, we did already have it working on FreeBSD and the differences to that weren't major. Just some small things missing or done slightly differently.

One thing MacOS did have which FreeBSD 10 didn't, was a documented interface for feeding entropy to the kernel. Apple's random(4) manual page described this and the SecurityServer daemon which used it. Adding support for that was simple enough, but what was missing from it was a documented way to know when its kernel actually wanted more entropy. Fortunately, the source for the kernel is freely available, so the next step was to grab that and see if there was in fact an undocumented way to do it, since it seemed probable that any sensible implementation of Apple's own SecurityServer would also want to have something like that available to it …

But what we instead found in there was a very different kind of revelation which caught all of us by surprise. It turned out that things didn't actually work in the way which Apple's documentation had indicated they would at all. And the more that we looked, the wider that gap quite evidently was.

If you examine the kernel source for the 10.12 Sierra release (xnu-3789.1.32), you'll find that in the file bsd/dev/random/randomdev.c there is indeed code to handle a write to the /dev/random device. The random_write() function takes chunks of up to 256 bytes at a time from what we give it and then passes them off to the function write_random(). That function is found in the file osfmk/prng/random.c where we see it has the following implementation:

int write_random(void* buffer, u_int numbytes) { #if 0 int retval = 0; prngContextp pp; lck_mtx_lock(gPRNGMutex); pp = current_prng_context(); if (ccdrbg_reseed(prng_infop(pp), pp->statep, bytesToInput, rdBuffer, 0, NULL) != 0) retval = EIO; lck_mtx_unlock(gPRNGMutex); return retval; #else #pragma unused(buffer, numbytes) return 0; #endif }
Nothing to see do here.

Oops. We'd been noting since the beginning that there was little point in feeding bits to the Windows CryptoAPI, since with it being a black box there was no way to know what if anything it would do with them, and that (only half-jokingly) it was quite possible it would simply throw them away and do nothing at all with them. But it was still a lot more stunning to really find ourselves looking at an auditable implementation which was in fact doing exactly that. And almost equally surprising that nobody else had already been pointing at this anywhere else that we could see.

So we got curious for some kind of explanation as to how and when this came to be the apparently Forgotten Work In Progress that we were now staring at in dissipating disbelief.

An archaeological dig into the publicly available record would seem to show that random_write was added in xnu-201 (MacOS 10.1), along with using Yarrow to replace the MINSTD LCG which in 10.0 was the only source of kernel random numbers. Things then remained that way, with writes to /dev/random being the only source of kernel entropy (aside from initial seeding using the system clock at boot), until xnu-2782.1.97 (MacOS 10.10) when the kernel Yarrow implementation was moved to osfmk/prng and refactored to be used as a pluggable PRNG factory option.

In that release, the initial random seed is now taken from a device-tree property set up by the boot loader, mixed with an initial clock timestamp again. This was the first release of the MacOS kernel to harvest entropy directly from system interrupts, and it does so by mixing in the lower 32 bits of the TSC cycle counter, reading it at the time they are handled. It does this indiscriminately for all interrupts that are raised on the master CPU. It presumably then also got rid of the SecurityServer daemon, since this release also added the (non-)implementation of write_random that we see now.

From there, things seem to be essentially unchanged right up to the present time (the 10.12 Sierra release) in the publicly available source.

So basically, unless you patch your kernel, there really isn't any point to feeding bits directly to it on MacOS either. Like the users on Windows systems, if you need strong entropy, you'll want to take it directly from the BitBabbler device itself rather than filtering that through the OS kernel's own pool.

But we can at least now definitely confirm that the BitBabbler devices and our supporting software have been tested and verified on the El Capitan and Sierra releases of MacOS, and that all other modes of operation are indeed working reliably and exactly as they are expected to.


DIY renovations

The code to feed a MacOS kernel will still be included in our software, for the benefit of anyone who is keen to experiment with patching their kernel to make use of it.

It's probably worth noting that if you do wish to do that, then it isn't enough to just uncomment the disabled code that already exists in write_random. If that wasn't already obvious, it would be quickly enough when it failed to compile. What is currently in there looks a lot like a quick cut'n'paste sketch of what it ought to do, which was then commented out to get it all to compile, and ostensibly forgotten about again when a release deadline loomed near.

From an untested, by-eye analysis though, it should be enough to just replace bytesToInput with numbytes, and rdBuffer with buffer. For future versions of the kernel it may be desirable to wrap the call to ccdrbg_reseed with the PRNG_CCDRBG macro, as is done in the Reseed() function, but so long as this is still using the (deprecated) Yarrow PRNG, that macro is a noop anyway. Similarly, checking the return value from ccdrbg_reseed isn't strictly needed there, since calls to the yarrow_reseed() function will always return CCDRBG_STATUS_OK, but some future implementation could possibly fail to perform a reseed request and return a real error indication.

It does seem fairly clear from what we've seen that the MacOS kernel would benefit from having a more reliable source of good entropy than just the CPU clock count when interrupts occur. Especially since, in some cases at least, the number of relatively predictable periodic interrupts could easily dominate any more randomly occurring ones.

But for people who really do want or need the strongest guarantees, this may still not be enough. While trying to confirm that write_random could reasonably be patched, we started to notice a few more things in the MacOS kernel which didn't look quite right at second glance either …


Achillea millefolium or Conium maculatum?

Two things are certain, Death and Taxonomy errors.

It is commonly claimed, and seemingly accepted as gospel truth, that MacOS uses the Yarrow PRNG algorithm. And indeed, if you take a casual look at its kernel source, you will find things that are referred to there as Yarrow. On a closer inspection though, it quickly becomes obvious that almost all of the significant components that are described as essential parts of the design in the Yarrow paper appear to be missing from it.

What is actually implemented instead is more like this:

So how did this happen? How did something claiming to implement Yarrow manage to look almost nothing at all like the published algorithm aside from using SHA1 in its entropy accumulator stage? To answer that, we needed to dig deeper into the origins of Yarrow itself.

On the Yarrow history page, we find not only the Yarrow paper, but also a link to some source code that is noted to implement an older version of Yarrow, not the one specified in the paper. And a quick look at that makes all the pieces of this puzzle start to rapidly fall into place.

Although MacOS switched from an LCG to Yarrow well after the formal paper defining it was published, they instead built their implementation of it on a copy of this experimental prerelease source from 1998, which was being used to test and refine the ideas behind what would ultimately become the Yarrow CSPRNG.

Even today, the core functions of what they are shipping are still mostly verbatim from what was in the Counterpane 0.8.71 source, but there are a few notable differences, beyond simply relicensing that public domain software under their own terms. Changes which mutate this into something significantly different again from the design that Counterpane had been initially experimenting with …

While the original was designed to take entropy input conservatively from multiple sources, the MacOS version only provides a single entropy source. The MacOS version then also disables the entropy estimation functionality (since it couldn't run the embedded zlib code in kernel space). It disables the slow poll sourcing of entropy. It disables the checking of whether there is sufficient entropy in the pool to safely perform a reseed (and instead always forces them to occur arbitrarily). And it seemingly ignores the documented warning noted in the original:

The biggest concern in the current design is the frequency with which reseed will be possible. For the suggested threshold value of 100 bits, only 12 bytes of output are guaranteed to be absolutely secure under this system. If this much entropy can not be acquired quickly enough (remembering that we are using a very conservative estimate of our entropy), the outputted keys,hashes,etc. could possibly be attacked more efficiently by brute-force cracking the generator state.

The current assumption (read: hope) is that those who are demanding values from the PRNG at a high rate are also producing entropy at a similar rate, or will be willing to wait longer for their values and allow a slow poll. This will need to be examined in light of the results of the testing of the quality of our current entropy sources, which is still underway (more details upon request).

Given the changes made to strengthen the design of Yarrow that are in the published paper (having separate fast and slow pools, using a strong block cipher for the output generator, emphasising the importance of not optimistically estimating entropy) it would seem that at least some of their remaining concerns could not be confidently dismissed.

Either way, we are still well short of demonstrating a trivial starvation attack (or any other) on the MacOS /dev/random device at this point, but there's certainly plenty of low hanging fruit for anyone who did want to pursue a more detailed analysis of it. I'd certainly be interested in seeing if XORing the TSC with itself as the only source of entropy amplifies any real correlation which may occur with that in practice under some conditions. Analysing a raw dump of that could be an easy and interesting thing to explore. But what we do know without doubt is that there's now decades of new research to draw from since any of this was even anywhere near being anything like best practice.

And I'd certainly be newly cautious of the advice commonly given, that unlike Linux, it's safe to just read as much entropy as you want from this device. An easy fix here is simple and obvious though. Fortuna was published in 2003 as a replacement for Yarrow that eliminated even more of the concerns its authors had with reliably doing this securely. And that was before SHA1 itself fell under a more strongly proven cloud of suspicion too.

I can't say for certain why Apple have not yet cared to give more of their attention to this, but it's possible that a shouty comment found in their source could be a clue:

WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! THIS FILE IS NEEDED TO PASS FIPS ACCEPTANCE FOR THE RANDOM NUMBER GENERATOR. IF YOU ALTER IT IN ANY WAY, WE WILL NEED TO GO THOUGH FIPS ACCEPTANCE AGAIN, AN OPERATION THAT IS VERY EXPENSIVE AND TIME CONSUMING. IN OTHER WORDS, DON'T MESS WITH THIS FILE. WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!
Enough said.

Update: 6 June 2017

We've since learned that the random(4) manpage was actually patched in Sierra, removing references to the SecurityServer and to the ability to write to /dev/random to contribute more entropy to the system. It also dropped the warning about the quality of its output being dependent on a sufficient supply of good entropy being available (a condition which hasn't actually changed).

And it adds a curious new recommendation … that using the arc4random(3) function should be preferred instead. Which can only be described as an interesting suggestion to make in late 2016 given that RFC 7465 was published in early 2015, and that known weaknesses in RC4 date back to at least 1995.

The obvious initial hope upon reading this was that, like OpenBSD 5.5, they had in fact replaced the actual implementation of it with something a bit stronger than RC4 (OpenBSD switched theirs to ChaCha20) – however if you believe Wikipedia and Apple's online manual pages again, then as of February 2015 at least, they were indeed still using RC4.

So I'm going to stop looking now. I think I am sufficiently convinced by exploring this that disaster fatigue is definitely a Real Thing. If anyone has some better news for Mac users about any of this, I'd certainly be glad to share that here as a further update on it in the future.

More portability

22 November 2016

The bit-babbler 0.6 software release is now available. This one mainly contains portability fixes for more platforms and for systems still using older versions of udev. For existing users where the previous releases have been working fine, the only possibly interesting changes in this one are a fix for the normalisation of QA statistics when processing large amounts of entropy on 32-bit systems, and a fix to the framing sanity check that is needed if the device is plugged into a USB 1.x port, for anyone who happens to still actually have one of those.

If you're using this on a 32-bit system and are likely to pull more than 2GB out of the device between restarts of the software, then we do recommend you update to this release, but otherwise you're unlikely to notice any real difference.

Many thanks to all the people who tested this on different systems and gave us good feedback on their needs and experiences. It's nice to have so many people share our interest in checking this all over as diligently as possible.

Virtual realities

18 January 2016

We've grown a lot of love for using virtual machines over the last few years. The number of things that they make easier and better is far too long to list here. But dealing directly with hardware is not yet one of them. And gathering good entropy in them has notable issues too.

Part of the trouble with obtaining real entropy in a virtualised environment, is that a VM usually is deliberately isolated from the hardware on the host. Which means that most of the physical sources of unpredictable events which the kernel will normally try to collect entropy from, are all now mediated by separately scheduled software – posing a big question about just how unpredictable they really still might be. And that's before we wonder what sort of correlations might occur with other VM guests that are running on the same host. The generally accepted solution is that one way or another we need a defined mechanism to import unique entropy from outside the software running the VM, and there's a few ways we can do that.

Things have surely improved since we last looked at it (the Linux kernel commit logs certainly say they have), but our initial attempts to use virtio-rng as a way to import entropy from the host machine into guests left a fair bit to be desired. Like, not crashing the system would have been nice. As would not greedily draining the host completely dry even when the guest was ostensibly idle. But lots of things related to this are a work in progress, and I don't have much of a right to grumble there, since we didn't dig deeply into debugging it further, or report it beyond commenting on it in IRC and not seeing much interest in more details. Which isn't ideal, but fixes to the kernel or QEMU would take time to become widely available, and we needed an answer which could work with what we had there and then. It's an unfortunate truth that we won't live long enough to chase every bug we see in someone else's code, and we can't just tell all of our users you need to be on a bleeding edge kernel, so we need to pick our pursuits wisely, and be diligent at doing our bit to keep our own house clean of them on all of the systems that our users really do need to support.

But the free software model works if everyone has the right measure of patience with others and scratching their own itches, so we needed a different plan to tie us over until that was ready for more general use. And we had a related itch we'd already started scratching at.

Since we'd already started experiments on what would become the BitBabbler hardware, the best answer there for us also already seemed clear. We just needed to be able to use them directly inside guest machines too. Which of course then made the VMs dealing directly with hardware problem become very much our problem too (though we already had an existing interest in that for our telephony hardware and other things as well).

I'm not going to go into too much detail on that here either, mainly because it's a Long Story, and if you really want to hear it (or even if you don't!) you'll find it in the documentation of how to set this up in the software package. And because this time I do plan to find the time to open another discussion about how we can improve this with the libvirt developers, so I don't want to get sick of repeating it before that has fully run its course.

The short version is, we now have a pretty close approximation to full USB hotplug functionality in libvirt managed KVM/QEMU virtual machines. It's not exactly what I'd call pretty on the inside, but it is easy to use and administer, and more robust and reliable than the previous set of hacks which we were using for this, and it makes BitBabbler devices assigned to guest machines behave just like you'd expect them to when using them from the host. No matter when you plug them in or remove them, or when you start or stop the guest.

The next step now is to try and get the missing functionality that we need supported more directly by libvirt, and we at least have a clear demonstration of why it's needed there and what sort of awful things people need to do if it isn't, which hopefully will help with that. Or at the very least we have something people can point at when explaining how silly we've been to miss the obvious easy answer that we should have been using instead. Then we can fix that and everyone wins from refining an example of current best practice.

But in the meantime, if this is something you need too, install the bit-babbler 0.5 release, and have a look at the bbvirt(1) man page and/or the virtual_machines document (in the doc directory of the source, or /usr/share/doc/bit-babbler directory that the Debian binary packages install). You'll find the longer story there, and a quick-start guide to getting it up and running as painlessly as possible.

It's not that hard being green

6 January 2016

Evolution. It's life's unrolling game where either you grow into your environment, or it grows all over you. Where even the rocks end up different to how they started – regardless of whether they'd ever gathered any moss or not.

Our starting point was wanting a cryptographically secure, high-quality entropy source that wouldn't starve and stall the system under heavy demand. So that naturally shaped our initial assumptions in both the supporting software and the hardware. The early focus in the software was largely on keeping throughput up, latency down, and having a regular supply of fresh entropy still being drawn from the hardware, analysed for anomalies, and mixed into the pools, even when it wasn't all being consumed from them (since otherwise, it would just be going to waste).

But there's also another species of important uses here too, where the primary interest is avoiding a different kind of waste. Wasted power. And I don't mean in the What have the Romans ever done for us? sense.

It's not like we actually draw a lot of it, even under peak usage, but in commercial data centers small numbers can have large multipliers, and there's also a growing interest in very low power home servers and in optimising them to be as efficient as they possibly can be. Where even if the current drawn by the device itself is low, waking the CPU to read from it when the system would otherwise be completely idle is still a cost that some people would, quite reasonably, like to avoid.

The good news is, doing a good job of catering for that type of use too really isn't a very big stretch from where we already were. The BitBabbler hardware itself has support for being idled into a very low power consumption mode (on the scale of microamps). Kernel support for suspending USB devices and controllers at runtime is a thing. And the frequency at which we opportunistically refresh the entropy pools when they aren't being drained was already being set by internally configurable options. So mostly we just needed to expose some more knobs to let people select the desired behaviour that most suits their own use case.

And this is exactly what the first set of changes in the bit-babbler 0.5 software release add. If you install it using the Debian packages, there are new udev rules which will enable the kernel autosuspend mode for BitBabbler devices, and if you pass the --low-power option to the daemon it will be much more conservative about reading from the devices when there isn't demand for entropy, and release them when they are idle so that the OS can suspend them (along with any controllers or hubs they are connected to).

If you want more direct control, the options which that is an alias for are all individually configurable too. There's a few caveats to using it still – some USB controllers don't handle being suspended as well as they probably should, and if you're doing this with the devices connected to an XHCI (USB3) port, then using a recent kernel is advisable. But it's working well enough to push out for broader testing.

Reduce · Reuse · Reconsider

22 December 2015

The bit-babbler 0.4 software release is now tagged and uploaded. If you're using the packages which ship with Debian, it should be available from the mirrors for Sid by the time you read this, and should migrate to Stretch in about 5 days time if nobody finds something silly we missed.

This one was originally planned to be just a few minor tweaks to get it building for the BSDs (and built for the Debian kFreeBSD port), but the best laid plans and all that … Getting it to build was easy, getting it to work proved to be a somewhat more involved task. The kFreeBSD port had packages for libftdi built with libusb-0.1, but if you'd assumed, like we did, that this implied they actually worked there, then you'd have been about as surprised as we were when they simply didn't at all. Everything builds and runs, it just can't see any USB devices – which isn't much use to any of us.

It turns out that FreeBSD is one of the few targets which the libusb source that most platforms use isn't directly ported to. Mostly because the FreeBSD developers have their own USB library (which is not-confusingly-at-all also named libusb), but which fortunately does also provide an API that is compatible with the libusb in use elsewhere.

So after some gnashing of teeth, a quick trip through all the stages of grief, and a hasty rescheduling of all the other things I had planned for that week, we decided to bite the bullet and switch to using the libusb-1.0 API directly, and the platform native implementation of it.

With the benefit of hindsight now, there seems to be no doubt that this was time well spent. By taking direct control over the device ourselves rather than going through the libftdi abstraction, we've been able to simplify things considerably, improve the error reporting and handling if things go wrong, be more efficient with getting data out of the device so CPU usage is reduced and maximum throughput is increased by notable margins, and we've further minimised the barriers to porting this to new platforms now too.

By using libusb-1.0 we can support some features more widely that were previously only available when built with libudev, like being able to identify devices by their physical address on the USB bus, and having hotplug support – and we get better support and lots of bugfixes for platforms other than Linux. So this unanticipated cake turned out to have plenty of delicious icing on it.

Of course a major refactoring like this isn't entirely without risk and this new code hasn't yet had as much time in long term testing as the previous releases did, but it's been running on all the servers here for a few weeks now without obvious trouble, and it makes building this for Windows users much easier, makes using it on BSD possible, and fixes a few minor issues on some of the more obscure architectures that the Debian buildds shook out, so we think it's ready to get some broader testing by more people and on more of the platforms that they want to use.

Hip to Chi-square

18 December 2015

The idea of there being a test, which when run just once doesn't actually give you a right or wrong, pass or fail result; where any single result that it outputs could be an indication of a good or bad outcome; and where the only way to know which is which is to run the test many times, and then run it again on its own results … isn't something that's necessarily intuitive to people who haven't seen that sort of thing before and had time to think a bit about why that's how it works.

So as we've had more people taking an interest in digging deeper into the details of this, it's become apparent that this was something we'd probably touched on a bit too briefly for anyone who isn't already familiar with the nature of statistical testing methods. And since it seems like a fundamentally important detail which warrants more than just a footnote on the FAQ page, we've instead added a better introduction to this to the description of the tests from the ENT suite, though it's not specific to only the Chi-square test included there.

If you already know how goodness of fit and significance testing works, then there's probably not a lot we've said there which will be news for you, we have tried to keep it as simple and accessible as possible – though if you'd like to proof read it and point out anything that we can say more clearly / better / less wrongly, that would be a welcome contribution too! It's tricky to explain this briefly in a way that's both still readable to people who it's new for, and formally correct for people skilled in the arts, so there might be some loose language there we could still tighten up or improve on. But as a starting point it should at least give people some extra clues to run with and search for if they do want to learn more about that.

Stretch goals

25 November 2015

Adrenaline. It's such a simple molecule, but it puts caffeine to shame in the effect it can have on our minds and bodies, and sometimes you don't even need to get up out of your seat to make it.

And apparently it doesn't matter how much of your life you've spent overdosing on it by putting your body into places and situations that people without a taste for it would rather avoid, and teaching your mind to deal with that. It can still consume you with its effects almost completely whenever your mind says, quietly or otherwise, Hey, this might be dangerous and what you're about to do next could get you into some trouble that you don't want to have …

Are you sure you really want to do this? There's still time to just turn around and back away quietly. If you think that would be better … Is that what you really think?

The nagging voice of self-doubt. It can be both good for you or bad for you depending on when you listen to it. And adrenaline amplifies it from quiet nagging to insistent urging that won't be ignored, however you try.

For some strange reason, publishing new software still often does that to me. Not always, but if it's something critical, where a mistake could cause real data loss, or where it's used in Failure Is Not An Option situations or other deployments with real consequences for not achieving that goal, or even just something that's completely new, then definitely more often than not. And we have plenty of those sort of users for our telephony gear, and for some of the other software I've authored or maintain, so it's not like I don't get enough practice at doing this.

It's strange because you can put me in the open door of an aircraft at 15,000 feet, about to step out of it, alone or with a group of some of the most completely crazy (and fun!) people that you're ever likely to meet, and it's easy to be totally calm and controlled about what I expect to happen next and what I need to do to make that all happen according to plan.

But put my finger over the button that is about to upload a piece of software, that I'm responsible for making, with the potential for ill consequences to other people – and I might as well be locked in a cage with a sleeping tiger. I know what I need to do is get out of there alive, but is what I'm going to do next going to wake it up, grumpy, startled, or hungry, before I do?

The difference between those two situations probably is almost that simple. In one case I (think) I know everything that is going to happen next, and whether it does or not will be determined largely by what I do.

In the other there is a far more pure uncertainty.

No matter how careful I am, no matter how careful I've been, what's going to happen next is a function of what happens in other people's minds. Not my own. And nothing in the world is really as scary as that. No matter how many times I put out some new piece of software, and no matter how many times it's either well received or simply goes almost entirely unnoticed, and nothing terrible actually happens, it's still an act of stepping, irrevocably, into a great new unknown. A world of fresh surprises and problems to learn how to avoid.

And I still love the rush that comes with that, whatever it is that brings it on. It never gets old.

Which is all perhaps just a long winded way of saying bit-babbler 0.3 is now in the incoming queue for Debian Stretch. But I'm writing under the influence of adrenaline and I know it well. My heart is racing. My mind is flitting wildly through every possible thing we might have forgotten. And we now await your judgement. Whenever and however it comes. At the time and mood of your choosing.

We've done all the preparation for this that we can reasonably think of doing. And now we've stepped out of the door with it into the airflow. There's no going back. All that remains is to see if we really can land it safely, and not hurt anyone else.

None shall pass(through)

16 November 2015

Just a quick heads-up for the people using USB passthrough to make their devices available inside libvirt managed virtual machines. If you're doing this on a system with cgroups enabled, or have updated your host machine to one (like if systemd is now your init system), then you'll need to make sure you've also added the devices you are passing through to the cgroup_device_acl array which is defined in /etc/libvirt/qemu.conf (or wherever that is done on your system).

There's more detail about all of that in the documentation for configuring virtual machines in the software package – but if you're wondering why passthrough suddenly stopped working after you updated your host machine, this is probably the reason. The USB devices aren't in the default set that the VMs are granted access to when cgroup access control is enabled. At least not until we get the extra support discussed here included in libvirt (at which point it should manage the needed cgroup ACL itself).

Update: If you're using the bit-babbler 0.5 release or later, and bbvirt to manage the passthrough, then you don't need to do this anymore. The needed permission will be managed automatically by libvirt.

Release early, release often

5 November 2015

It's a lovely aphorism, for things that aren't mission-critical and for developers who prefer to outsource testing to their end users rather than spend their own precious time on boring things like that – but when it comes to hardware it's really more of a euphemism for wasting a lot of time and money on product recall and replacement.

So we've been rather publicity-shy, until we'd convinced ourselves that we'd checked, and checked the checks, and stuck our own fingers into all the places where something might lurk that would bite them – because the prize for being the first to have a massive embarrassing recall has long ago been won, and there really is nothing more tedious than having to rework large numbers of units because you'd made a stupid mistake that would have been easy to find and avoid if you'd actually tested it. We've been there in years gone by with other hardware, and it's not a place we ever want to go back to, even for a quick visit.

But we're well into testing the new production sized run of White devices that we did recently, and the results we're seeing for those are so far nicely consistent with what we saw from the prototype run. We've had some really good feedback from an excellent and diverse group of early adopters who already found us and wanted devices for their own, and so we're starting to feel a bit more comfortable that if there's something we've still missed, it's not going to be an instant show-stopper that will be a major pain to remedy.

The software is performing well, and we're not getting any requests that hint at it needing some sort of major redesign to really be useful for a wide range of people and applications. The sort of wish list things that are currently on the horizon should all fit into it quite well without disruption to anyone who has already deployed it if they update.

The biggest complaint we're now getting is along the lines of why isn't this actually in Debian yet? Which is a good sign that it really is time to fix that very shortly now. It's not that hard to build your own packages of it, but it's still a lot more convenient to not have to. Thanks to everyone who has been patient with us over that and has given us good feedback on the version 0.2 snapshots. When we get through the last of what's still pending for that (which isn't much now), we'll tag version 0.3 and push that one out for inclusion in the Debian Stretch release.

She cannae take much more Capt'n

15 October 2015

It looks like we're going to need external power for the USB hubs if we want to run more than about 60 devices in most of the machines that we presently have set up for testing them – which wasn't an entirely unexpected limit with all things considered. The good news is, the hubs we bought have a socket for external power. The bad news is, after looking inside them, that socket is connected directly to the VBUS rail coming from the host motherboard, with no isolation for either it or the data line pull-up when they are running self-powered.

And where by running, I mean for the brief instant between when you plug it in and when things probably go badly downhill from there. Who lets these people design and build things for others to use … Don't cross the streams isn't a hard design rule to grok.

The happy news, is we can run 60 BitBabbler White devices, all streaming random bits out at the default maximum rate, in the same machine as we have four Octal ISDN cards running high load callgen testing. That's about 5.5 million phone calls a day on 960 telephony channels, 1.2 terabytes a day of audio processed, and 30 gigabytes an hour of raw entropy, all happily purring away together on a fairly cheap consumer-grade motherboard that we bought off the shelf a few years ago from the local computer store.

We like to use low-end hardware for routine stress testing, because if it all works peachy there, then scaling things up further, to be Carrier-grade and Web-scale like all the cool kids are, is just a matter of throwing as many dollars at the problem as it takes to feel like you are. We know our system won't sweat it.

Please sir, can I have a little more

28 September 2015

So the long term testing of the first hundred units we made has still been looking really good. Beyond what I'd even dared to hope for in fact. When doing the initial tuning to see just how fast we could clock bits out of these, we found some devices could be pushed notably harder than others before the quality of their output would start to degrade. But even with the fairly conservative defaults that we settled on, I'd been expecting that a few of the devices at the opposite end of that spectrum might eventually show some sign of weakness in their output if we just let them run for long enough and accumulated enough trillions of bits for that to finally become statistically significant in one or more of the QA tests.

But that hasn't been the case. We've been monitoring them continuously, graphing statistics on the QA tests with munin, and so far we've had exactly zero devices fail the long term testing. Which would normally be a really surprising result for just about any hardware project – until I'm reminded exactly how many prototypes we did actually build before doing this run, and the hell we put them all through before settling on a design to sample in larger quantities. So yeah, maybe not quite so surprising, but still a very pleasing result for the first batch.

This of course leaves us with only one sensible course of action. Build and test more of them! We've had lots of interest in the White devices, so we've ordered the parts to do another run of 500, and ordered a stack of extra USB hubs to fill with them once they come back from our fab. The next big test will be to see just how many of them we can cram into each machine in the rack before smoke starts pouring out of something or circuit breakers start tripping.

Anything that isn't tested is broken

15 September 2015

And if we had any lingering doubt whatsoever in the eternal truth of that, it would have been quickly dispelled once more Windows users came along who actually did want to use the native build there!

Having not owned or had to write software for Windows now for … well, let's not think about how many years ago that was now; and since wine support for USB devices is still basically non-existent, it wasn't really much of a shock that there were a few teething problems still to sort out with that. But it was pleasantly close, and with the help of a very patient user who relayed details of what did and didn't work on their system we got this actually tested and confirmed working as expected there.

I'd almost forgotten just how many amusing idiosyncrasies it has with respect to otherwise standard functions, and either I really have forgotten or it appears to have grown even more of them since I wrote code for it last, but so far it appears to be working well and we haven't had any new reports of trouble there yet.

Socket to me

28 July 2015

Well that didn't take long. Sorry BSD people, but the Windows users asked us to support their platform before you did. Which surprised me a little, but the squeaky wheel gets the grease, and so the first round of portability tweaks goes to them.

The response we've had from people so far has actually been rather awesome, thanks to all of you for the kind words of appreciation about the effort we've put into this and the suggestions for things it would also be useful to support. It caught us a bit off guard really, we'd barely had the website up for a week, and hadn't really told anybody about it except for our bank and shipping company (who wanted to see it before they'd talk to us about using the BitBabbler name with our accounts), when the first few people already started emailing us asking if we still had any we could sell. So getting the website completed, and posting updates here, has sort of played second fiddle to improving the software further in response to plenty of new user feedback.

The first major change was adding the ability obtain entropy directly from the device output pool via a UDP socket too. Having only options to send it to stdout or to the kernel was fine for our own needs, but neither of those were going to be much use for anyone wanting to use this on Windows. This is also useful for more than just those people though, since it means you now don't have to choose between using a device to feed entropy to the kernel or reading raw bits from it directly, you can just timeshare it to do both simultaneously if you ever need that.

It also means you don't actually have to run it on Windows to use the entropy from it in Windows applications. And so the first round of porting this to Windows stopped without it ever actually being tested there, with the BitBabbler instead running on a small ARM board, with a minimal install of Debian on it, feeding entropy to Windows applications over a private network segment. But the architectural changes that were needed for that got done, and it was successfully building with the mingw-w64 toolchain from Debian Stretch.

We've now added a udev rule and a system group to the Debian package, so that normal users without elevated privilege (other than being placed in the bit-babbler group) can access the device directly. We got a lot of requests from people who wanted good random numbers for purposes other than feeding entropy to the kernel, so this will make things a bit more flexible and user-friendly for them as well.