BitBabbler - Random thoughts

Plus ça change

2 November 2021

The bit-babbler 0.9 software release, which has been lurking in the background for quite some time now, is officially out. If the version that you're currently using is still working just fine for what you need, then there's probably not a lot to get awfully excited about in this one for you. It's mostly many small bits of polishing, along with the usual grind of keeping in sync with a world that is constantly changing around and out from under us.

There are a few things you might notice, be thankful for, or just be affected by if you do update to this one:

We install the udev rules with make install now, on platforms that use udev for device hot-plugging, because we finally should be able to rely on them being more widely portable than they ever used to be in practice, and there's no need to make people who package this for other distros, or those just installing it manually, fish those out of the debian-specific packaging to use them.

Likewise with the sysctl configuration file which we use to set the Linux kernel write_wakeup_threshold to a more suitable value for tapping plentiful entropy.

What I'd like to be able to say is: We install both of those to the vendor (or admin depending on the $prefix used) location, so if you have local overrides for them in /etc they will be preserved… But sadly I can't, because prior to the systemd v242 release, udev would not look in all of those locations, so unless you configure with --prefix=/usr for a vendor package build, the udev rules will be installed to /etc/udev/rules.d – and if you want to override those, for now you'll need to name your rules file differently to ours, or repair your local modifications each time you overwrite them with make install. Sorry about that. In a few years time when we can expect a (hopefully) saner systemd version in more of the active distro releases, then we can handle this better by default. In the meantime, if you need to override that for automated builds, you can set UDEV_RULES_DIR at configure time to your preferred location for them.

There are some more interesting changes for users of FreeBSD 13. The USB support there has greatly improved since FreeBSD 10 and 11, so we could get rid of a lot of the workarounds we needed for that platform and people can expect device hotplugging to Just Work, a lot more like it does on other platforms.

We jump through a few extra hoops to try and avoid race conditions that are inherent to the design and promoted use styles of systemd, because the libvirt package switched to using a socket activation unit in the Debian Bullseye release and inadvertently created a new opportunity for otherwise unrelated things to deadlock at boot. If the zombie socket (that it now creates long before before the ordering dependencies of libvirt itself are satisfied) is accessed (as it might be for devices which should be assigned to a VM instead of the host machine), then that operation will block until the real daemon is started and can respond to requests. When that being able to happen is on the other side of some other process waiting for its device notifications, then we all get to wait until systemd solves this class of problems it has by timing out one or more of the waiting operations. I've filed bug reports requesting libvirt help close the race window further and that ifupdown avoids being a victim of its fallout to try and deal with this a bit more comprehensively, so we'll see how all that unfolds as the next rounds of distro and software releases emerge.

Last but not least, there's a small update for the munin script to cater for a behaviour change in the JSON::XS 4.0 release, which makes the NUL terminator we use in our control socket protocol messages now be considered trailing garbage to flag as an error, instead of a normal C-string terminator that it can silently ignore (which was what it did in previous versions).

As usual, if there's anything we've missed in this one, or something different that you need for your particular use case or preferred platform, do let us know and we'll see what we can do about it.

Fully pneumatic

12 February 2018

The bit-babbler 0.8 software release mostly brings improvements to interfacing with external systems. There were again no significant changes or needed bug fixes to the core functionality, just some more polishing to support a broader range of uses and environments.

A simple but significant tweak enables the bbvirt tool and QEMU hook to support any valid libvirt guest domain name, not just those which are also valid as shell variable names. This change will not break any existing configuration, it's a pure extension which allows you to use something like DOMAIN_NAME_foo="Fußball" to declare that the configuration options for identifier foo will apply to the libvirt domain Fußball instead of to one named foo. This has become more important with libvirt itself actively fixing bugs that prevented it from being able to reliably use such names.

Note that if you had previously installed the libvirt QEMU hook from an earlier version of this package, and you need this functionality, then you will need to manually update it to use the new revision from this release. Due to the fact that libvirt supports only a single QEMU hook file, for everything that the local admin might want to do in it, we can't safely do that automatically as part of the normal install process for the rest of the software.

The larger change in this release is that seedd can now also read its configuration options from a file instead of having to pass them all explicitly on the command line. Originally we'd elected to avoid having configuration files for this, partly to keep the code as simple and easy to audit as possible, but also because the more options that something provides to choose from, the more probable it becomes that someone might choose poorly, or accidentally make some mistake they hadn't intended. So we'd aimed to keep the number of options to a minimum overall too. But the evolved reality, on two fronts, is that we do now have enough options, that can be genuinely needed to cater for real use cases, to make defining them in a configuration file become a convenient and less error prone alternative to repeatedly typing them on a command line. And that for users who will run this as a daemon under systemd, it seems wrong to require them to jump through the hoops needed to safely modify its configuration, just to tweak some options which seedd is run with (let alone needing to jump through different hoops to do the same thing on different systems).

While there are some people who might be comfortable doing that, it seems like the sort of FAQ-bait which we long ago learned to always try hard to avoid. Particularly for ordinary users who may not be familiar with the too-often surprising interactions that can occur between the 2492 entries in 13 sections, referring to 243 individual manual pages which control systemd's behaviour. Either way, that's a whole lot of not-so-light prerequisite reading to just safely tweak the basic options which control how your TRNG(s) should be used – so it shouldn't be the only way, or the recommended way, to be able do that. The sanest solution here is for us to provide a well tested service unit which normal users should almost never need to modify, and to keep our own configuration supplied by something simpler and more tightly focussed on portably doing the one job that it is required to perform, regardless of what process starts and manages the daemons that are running on your system.

Circular polarisation

We're not going to get tangled up in the question of whether or not you should be using systemd. There are intelligent people (yeah, and sadly also some bozos…) with strong opinions at both extremes of that spectrum, and so like any other issue of portability, our job here is to support all of them as equally as we can, to the best degree that it is possible to do so.

For the people who… let's go with aren't particularly enamoured with systemd, all you really need to know about it here is that we don't depend on it, we won't depend on it, we aren't going to force you to have it on your system, and nobody is going to be missing any functionality from our software as a result of their own choices regarding that. We still ship a SysV init script in the Debian packaging, though since it isn't portable, other platforms still do need to provide their own solution for that, as they always have before now. And we will still welcome tested patches to optimally support other platforms, also as we always have. In the same respectful way that we are now providing fuller support for systemd to the people who do fall somewhere on the other side of that spectrum, wherever they might stand between using it and zealously evangelising it. There's no build or runtime configuration option needed to turn it on or off. If seedd is started by systemd, it will Work As Expected, and if it is started by something other than systemd, it will still Work As Expected, no matter what is providing your init process at the time. Nobody gets a short straw here, and nobody has to ride in second class due to this.

But now we are going to talk about systemd support for a bit. Because it is new with this release, at least as part of the package that you can download from us here. And there are a few special snowflake things too which are out of the ordinary enough to note, and overall things are a bit different now with respect to it, so anybody who had already added a unit locally or for their own distro packaging deserves the courtesy of some pointers to what has changed, before we accidentally stomp on each other's toes in that space.

Two households, both alike in dignity

The 0.8 release installs two service unit files by default, but it does not try to enable either of them. That is task is left to the local admin, or to policy determined by the distro packaging and its environment when the packaged version is installed.

The seedd.service unit is responsible for starting the seedd daemon as early as possible in the boot sequence, using configuration options read from the /etc/bit-babbler/seedd.conf file. It will not try to start it if that file does not exist (and it will fail to start if there is an error in that configuration file). Typically it will actually be started so early that udev may not have announced the USB devices to the system yet, but it will be ready to make use of them immediately at the soonest opportunity when they do become available. The Debian packages will enable this unit by default when systemd is playing the role of init, and it is the equivalent of the existing SysV init script which will do the same job when it is not. If NOTIFY_SOCKET is set in the environment (as it will be when started by systemd with Type=notify, then seedd will send start up and shut down progress notifications to that socket. It will limit the capabilities of seedd to the minimum required for normal functioning when it is feeding seed entropy to the OS kernel and providing a control socket for querying QA results (which is a superset of those needed for all its other functionality).

If the udp-out option is used, or if the control socket is provided on a TCP address rather than as a Unix Domain socket, then you'll probably also want to enable the ip-freebind option, since it will now also typically be started before any network interfaces come up. It is not an error for there to be no devices available at all when the service is started – in the same way that the SysV init script always has, the daemon will happily still idle and wait for them to be hotplugged. However changes to network interfaces are not actively monitored and responded to in this code, and these sockets are not managed by systemd (because they are not a trigger for it to be activated, they are secondary service interfaces that it provides).

The ordering dependencies will ensure that seedd is started before all ordinary services, and before as many things as possible which may need good seed entropy (it will be started at the same point in the boot sequence where systemd's own systemd-random-seed attempts to load the seed entropy that it saved to disk at the last clean shutdown), but the seedd.service unit itself will not prevent the boot from continuing (otherwise) normally if seedd fails to start, or if it can find no (properly working) device to obtain QA-verified fresh entropy from. This is what most people would normally want to happen if they are expecting a machine will only have a BitBabbler device plugged into it occasionally, or if they do still expect to be able to perform other ordinary tasks on it at any time when one is not, even if the supporting software for them is still installed and remains configured and running, waiting for devices to be hotplugged.

The seedd-wait.service unit is provided for people who really do need stronger guarantees of good seed entropy at every boot. This one should not normally be enabled by default in distro packaging that is intended for general use. It provides a pass/fail sequence point which can be used to delay or prevent other services, or even the whole machine, from starting normally until good fresh seed entropy is able to be acquired.

The softest guarantee that it provides is obtained by simply enabling it. This will potentially pause the boot sequence, delaying everything which has been scheduled to start after local-fs.target (which is most services other than udev), until the OS kernel pool has been seeded with a minimum quantity of QA checked fresh entropy from the available BitBabbler device(s). It is still just a soft guarantee, because it will only wait for up to 30 seconds to obtain the needed seed entropy, after which it will timeout and fail, but then allow the rest of the boot sequence to proceed normally. This means that in normal operation, services which need good entropy will not be started until after the kernel has definitely been seeded – but that failure or absence of a BitBabbler device at boot will not entirely prevent the machine from booting, it will just delay it until the timeout expires. If you normally expect that a BitBabbler device will always be attached, and have things which do benefit from having good entropy immediately as soon as the system is booted, but which don't depend on it so critically that they should not run without new seed material obtained from the TRNG, then this is a reasonable balance to meet those requirements. The extra time that it may add to booting when a device is normally available should be minimal (and can be monitored for each boot using systemd-analyze plot). Mostly it will depend on how quickly your system makes the USB devices available for use.

A hard guarantee can be obtained for individual services or groups of services by declaring them to have a Requires relationship with the seedd-wait.service, which will prevent them from starting at all if the initial seed entropy is not obtained. Or if the entire system really should be prevented from booting normally if this fails, then a failure of this service unit can be used to divert the boot to an alternative to the normal default target. For example, by using something like OnFailure=emergency.target, the system will boot into single user mode if this test should fail for any reason.

Glory and Consequence

If you do plan to use the seedd-wait.service in any of the stronger configurations than just enabling it, then you probably are still going to need to carefully read and understand at least some of systemd's 240+ manual pages, for the version of systemd running on your system, to properly get your head around all the subtleties of how its dependencies are really calculated and intertwined in practice. And you should carefully test the success, failure, and recovery restart behaviour of what you do declare, because it can be surprisingly easy (even for people who are familiar with all this) to accidentally create dependency loops which systemd won't warn you about until it just decides to not start something critical, like your network. And there can be conditions where a service will still be started even when something that it Requires has failed, if there isn't also a direct and explicit ordering relationship linking them too.

My personal favourite to date was inspecting a system which had switched to the emergency.target single user mode – from the comfort of the shells of the multiple remote users that it had allowed to remain logged in after the switch… It turns out that ending up in that state is a fairly trivial thing to do, anything which reverts to the emergency.target after early boot can enable it to happen (and this isn't a simple bug as such, but rather an emergent feature of other design decisions). There are lots of options here for ways you can configure your system to behave in the event of a failure to obtain guaranteed good initial seed entropy – but if it is Mission Critical, just be careful to test that everything you do really does what you actually expect it to in all circumstances, if you haven't already retuned your natural intuition to precisely match how systemd actually behaves when satisfying its calculated dependencies based on what you declared to it.

And of course, as we promised above, none of this extra functionality that we've made easy to implement is actually systemd specific. The seedd-wait.service is just a oneshot service wrapper around making a call to bbctl --waitfor, which is what does the actual work of waiting for QA checked entropy to be provided to the OS kernel. It can be used in the same ways we've described above from any other init or service supervision system if desired. It's easy to talk about using it with systemd here, because in theory what we've described above should work the same way on any system where it is the init process – but if people have recipes for other systems which they think are worth sharing, or other requirements than what is already possible to do with this now, then we'll be happy to include those for other users too. For most people though, the appropriate thing to do is going to largely depend on the exact requirements of the system that they are doing it on anyway, but the overview here should give you some ideas for what is possible if and when you need it.

Steady as she goes

24 July 2017

The bit-babbler 0.7 software release is now officially tagged and uploaded. For most existing users there is no strongly compelling urgency to update to this one, once again it's mostly portability improvements for new platforms and for new releases of previously supported platforms.

This one brings confirmed support for MacOS (tested on El Capitan and Sierra), OpenBSD (tested on 6.1), and for the FreeBSD 11 release. It also fixes a corner case seen by RHEL/CentOS 6 users which occurs if they use a later version of libusb than what was shipped by default, with the default kernel (which contains partial backports of later usbdevfs API functionality). And there's workarounds for a systemd update which hit the shuffle button on filesystem locations of system utilities, and for a quirk in udev rules where != doesn't always really mean not equal.

Existing FreeBSD 10 users will probably notice the most significant changes if/when they update to FreeBSD 11, since it added USB hotplug support and so we also now support it on that platform too, via their own implementation of the libusb interfaces. USB hotplug is not perfect there yet – there are notable delays (of around 4 seconds) before device notifications are sent by it, and again when trying to close our connection to receiving them, and unplugging any device which is actively in use can result in a deadlock occurring in their own (apparently not threadsafe) implementation of libusb_bulk_transfer(), but we have workarounds in our code which should limit the effects of that from annoying users too much until these problems do get fixed in FreeBSD itself.

We also had to disable a few of the optimisations which are normally enabled by -O2 on FreeBSD 11, since with its version of gcc they appear to miscompile some seemingly-harmless constructs in a way that breaks proper stack unwinding and exception handling. [If there are any FreeBSD developers reading this, we can gladly provide all the gory details of these things to anyone interested in fixing bugs in future releases of that OS. We've not seen this with the default toolchains on any other OS release. For anyone casually interested, there are more details about all this in the package changelog too.]

The OpenBSD 6.1 release likewise had a few system-specific bugs and quirks we needed to work around. Most were just normal platform variability, or related to its limited support for locales other than C or en_US.UTF8, but we did discover a significant bug in its vfprintf() implementation, which is also apparently not actually threadsafe in practice. POSIX says that this function may be a thread cancellation point, and on OpenBSD 6.1 it is implemented as one. However if a thread does get cancelled there, then it can result in this leaving its internal _thread_flockfile mutex locked, which means any future call to it (or any other system function requiring that lock) will simply deadlock.

Since we can't really test for the presence of that bug in any way useful to us, the current workaround for it is to simply always disable thread cancellation explicitly before calls to that function and instead test for cancellation requests ourselves outside of it. That at least is a complete, and otherwise future-safe, mitigation for this one. But it's still a bug in OpenBSD that other code could also hit. [So if any of its developers are reading this and need more information than that to find it, we'll be here for you too!]

Many thanks again to all the people who made requests for their preferred platforms and diligently tested the release candidates on them and reported any issues they saw. And for your patience while we shook out all the issues we could find on them before actually pushing this one out more widely as a formal and public release.

Pick a number between 1 and …

19 June 2017

Can I use the BitBabbler to generate random numbers between 1 and N, for some value of N, is a question we've been asked often enough now that it should probably get an entry in the FAQ.

But since many people who might see that question would immediately think Of course it can, it's a random number generator, and not actually read the answer – and since the actually correct answer is a little more detailed than that, and since historically a lot of people really have done that poorly, sometimes with quite significant consequences … It seemed like it was worth putting something a little more detailed up about it, which people who do need to do this, and are thinking about it possibly for the first time, might be able to find. And thus dodge the easy trap to fall into.

The BitBabbler itself just generates a potentially infinite stream of random bits. Since that equates to a potentially infinitely large number, then clearly it can generate random numbers in any arbitrary range that you like. Where it gets a bit more tricky is if you want all of those numbers to have an equal probability of being selected.

If the range of numbers you want is a perfect power of 2 in size, then this really is very trivial, you just take the number of bits you need directly from the BitBabbler's output. If you want a number between 0 and 7, or between 1 and 8, or between 3 and 10, you can just grab 4 bits from the BitBabbler, add the starting value of your range to that, and you're done. Every number in that range will have an equal probability of being selected.

By far the more common case when people ask this question though, is that they want some range of numbers which isn't a perfect power of 2. And a statistically significant number of the people who try to solve that problem for the first time will immediately reach for the obvious, easy, and wrong, solution of using the modulus function.

That's almost equally trivial to do, you just take the random numbers that you have, in whatever range they might be (so long as it is larger than the range you want), and clamp them to the range that you do want. And at first glance that works perfectly. Except for the bit about them all being equally probable. The problem seen there should be quickly obvious if we take a more careful look at what that really does.

Say you want numbers in the range 0 to 4 (or 1 to 5, it's the same problem here). You'll need at least 4 bits from the BitBabbler to obtain at least that many values, and if you clamp that range with a modulus of 5, then what you get looks like this:

Some numerals are more equal than others.
Random Input	Output	Probability
0 or 5	0	0.25
1 or 6	1	0.25
2 or 7	2	0.25
3	3	0.125
4	4	0.125

You correctly get output only in the range desired, but the values 3 and 4 have only a 1 in 8 chance of being selected, while the other values instead all have a 2 in 8 probability. None of them have the 1 in 5 chance (probability 0.2) which would normally be expected. So it's clearly not the ideal distribution of outcomes for most purposes where random numbers are needed.

There are a few ways to avoid this problem when you need uniformly distributed random numbers in an arbitrary range. Each has its own set of pros and cons. In the case at hand though, where you have a plentiful supply of entropy available, then possibly the simplest option, which is easy to implement correctly and has a trivial proof of its correctness, is to do the following:

Obtain enough bits from the BitBabbler to have a number which is at least as large as the desired range.
If that number is larger than the desired range, discard it and go back to step 1.
Add the starting value of your range to it and return the result.

Using the same example as above, this would give you a 3 in 8 chance of needing to retry each attempt, but every value returned will have the same 1 in 5 chance of being any given number in the desired range.

There are other ways to do this which waste less of the raw entropy to make each selection, but if that's really a major concern for some particular use case, then you almost surely have other considerations which much be taken into special account too. So we'll leave those as an exercise for the reader to research. This one is only slightly less intuitive than using a modulus, while being comparably simple to implement without programming error, and roughly as fast to execute.

And for the benefit of people who just want a simple program to run which will do exactly this for them, we've added a new example to our software which does it in a general way for any selected range of numbers, and which includes a self-test mode to both test the actual implementation there and to reassure anyone that the results really do look like they would expect them to for the range they want. You'll find that in doc/examples/random_int.pl of our next release.

McSecurity

6 December 2016

Supporting any platform where people needed entropy has always been an important part of this for us. The software was written with easy portability in mind – but as always for anything non-trivial, especially where interfacing to hardware is involved, we know this still means that some tweaking can be needed, both for new platforms and as existing ones gain new functionality (like USB hotplug support which still isn't as widely available on every OS yet as you might reasonably have expected it would be). We've mostly been doing that on demand, as people have requested support for their preferred system(s). In some cases though, we aren't always able to test those ourselves, and we do need those people to help by reporting what they see if things don't already Just Work out of the box for them.

MacOS has been one of the latter variety. We'd had a few queries about whether it was known to work there – but the last Apple device I'd owned was an Apple][ (which with some irony was what set me on the path of All Things Open Source from an early age – it being such an awesome enabler to have the computer's full schematics and the source to its ROM available in the back of it's manual), and nobody else here currently owns a Mac either. It wasn't until recently that a few of those people did in fact step up to report on some real testing with it.

So thanks to their help, we can now confirm this really is known to work under MacOSX. The needed changes to the software were mostly minor, we did already have it working on FreeBSD and the differences to that weren't major. Just some small things missing or done slightly differently.

One thing MacOS did have which FreeBSD 10 didn't, was a documented interface for feeding entropy to the kernel. Apple's random(4) manual page described this and the SecurityServer daemon which used it. Adding support for that was simple enough, but what was missing from it was a documented way to know when its kernel actually wanted more entropy. Fortunately, the source for the kernel is freely available, so the next step was to grab that and see if there was in fact an undocumented way to do it, since it seemed probable that any sensible implementation of Apple's own SecurityServer would also want to have something like that available to it …

But what we instead found in there was a very different kind of revelation which caught all of us by surprise. It turned out that things didn't actually work in the way which Apple's documentation had indicated they would at all. And the more that we looked, the wider that gap quite evidently was.

If you examine the kernel source for the 10.12 Sierra release (xnu-3789.1.32), you'll find that in the file bsd/dev/random/randomdev.c there is indeed code to handle a write to the /dev/random device. The random_write() function takes chunks of up to 256 bytes at a time from what we give it and then passes them off to the function write_random(). That function is found in the file osfmk/prng/random.c where we see it has the following implementation:

int write_random(void* buffer, u_int numbytes) { #if 0 int retval = 0; prngContextp pp; lck_mtx_lock(gPRNGMutex); pp = current_prng_context(); if (ccdrbg_reseed(prng_infop(pp), pp->statep, bytesToInput, rdBuffer, 0, NULL) != 0) retval = EIO; lck_mtx_unlock(gPRNGMutex); return retval; #else #pragma unused(buffer, numbytes) return 0; #endif }

Nothing to ~~see~~ do here.

Oops. We'd been noting since the beginning that there was little point in feeding bits to the Windows CryptoAPI, since with it being a black box there was no way to know what if anything it would do with them, and that (only half-jokingly) it was quite possible it would simply throw them away and do nothing at all with them. But it was still a lot more stunning to really find ourselves looking at an auditable implementation which was in fact doing exactly that. And almost equally surprising that nobody else had already been pointing at this anywhere else that we could see.

So we got curious for some kind of explanation as to how and when this came to be the apparently Forgotten Work In Progress that we were now staring at in dissipating disbelief.

An archaeological dig into the publicly available record would seem to show that random_write was added in xnu-201 (MacOS 10.1), along with using Yarrow to replace the MINSTD LCG which in 10.0 was the only source of kernel random numbers. Things then remained that way, with writes to /dev/random being the only source of kernel entropy (aside from initial seeding using the system clock at boot), until xnu-2782.1.97 (MacOS 10.10) when the kernel Yarrow implementation was moved to osfmk/prng and refactored to be used as a pluggable PRNG factory option.

In that release, the initial random seed is now taken from a device-tree property set up by the boot loader, mixed with an initial clock timestamp again. This was the first release of the MacOS kernel to harvest entropy directly from system interrupts, and it does so by mixing in the lower 32 bits of the TSC cycle counter, reading it at the time they are handled. It does this indiscriminately for all interrupts that are raised on the master CPU. It presumably then also got rid of the SecurityServer daemon, since this release also added the (non-)implementation of write_random that we see now.

From there, things seem to be essentially unchanged right up to the present time (the 10.12 Sierra release) in the publicly available source.

So basically, unless you patch your kernel, there really isn't any point to feeding bits directly to it on MacOS either. Like the users on Windows systems, if you need strong entropy, you'll want to take it directly from the BitBabbler device itself rather than filtering that through the OS kernel's own pool.

But we can at least now definitely confirm that the BitBabbler devices and our supporting software have been tested and verified on the El Capitan and Sierra releases of MacOS, and that all other modes of operation are indeed working reliably and exactly as they are expected to.

DIY renovations

The code to feed a MacOS kernel will still be included in our software, for the benefit of anyone who is keen to experiment with patching their kernel to make use of it.

It's probably worth noting that if you do wish to do that, then it isn't enough to just uncomment the disabled code that already exists in write_random. If that wasn't already obvious, it would be quickly enough when it failed to compile. What is currently in there looks a lot like a quick cut'n'paste sketch of what it ought to do, which was then commented out to get it all to compile, and ostensibly forgotten about again when a release deadline loomed near.

From an untested, by-eye analysis though, it should be enough to just replace bytesToInput with numbytes, and rdBuffer with buffer. For future versions of the kernel it may be desirable to wrap the call to ccdrbg_reseed with the PRNG_CCDRBG macro, as is done in the Reseed() function, but so long as this is still using the (deprecated) Yarrow PRNG, that macro is a noop anyway. Similarly, checking the return value from ccdrbg_reseed isn't strictly needed there, since calls to the yarrow_reseed() function will always return CCDRBG_STATUS_OK, but some future implementation could possibly fail to perform a reseed request and return a real error indication.

It does seem fairly clear from what we've seen that the MacOS kernel would benefit from having a more reliable source of good entropy than just the CPU clock count when interrupts occur. Especially since, in some cases at least, the number of relatively predictable periodic interrupts could easily dominate any more randomly occurring ones.

But for people who really do want or need the strongest guarantees, this may still not be enough. While trying to confirm that write_random could reasonably be patched, we started to notice a few more things in the MacOS kernel which didn't look quite right at second glance either …

Achillea millefolium or Conium maculatum?

Two things are certain, Death and Taxonomy errors.

It is commonly claimed, and seemingly accepted as gospel truth, that MacOS uses the Yarrow PRNG algorithm. And indeed, if you take a casual look at its kernel source, you will find things that are referred to there as Yarrow. On a closer inspection though, it quickly becomes obvious that almost all of the significant components that are described as essential parts of the design in the Yarrow paper appear to be missing from it.

There is only a single entropy accumulation pool, not the pairing of fast and slow pools described for Yarrow.
There is no meaningful estimation of the amount of entropy, either in the bits input to the pool, or in the pool itself before its content is used for reseeding.
There is no block cipher used for the output generator.
There's far more code than just write_random which is either deliberately disabled, or simply doesn't actually do anything at all in practice.

What is actually implemented instead is more like this:

At each interrupt, the low 32 bits of the TSC are mixed into a 64 byte circular buffer by rotating the bits of the existing content at the buffer head (right, by 9 bits) then XORing the new counter bits to that.
Each time that 17597 bytes (defined by RESEED_BYTES) have been output by the generator a reseed is forced using the current content of that circular buffer (without consideration as to whether any new entropy at all has been added to it since the last reseed). The circular buffer is then churned by XORing each 32bit word in it with the one before it in the buffer.
The bits from that buffer are then added to the PRNG's entropy pool, which is a 160 bit SHA1 digest of all the bits added to it since the last reseed was performed. That pool is then also churned repeatedly by adding more bits to it from the output of the PRNG itself, then mixing its own current digest in as a final input.
The final digest state of the pool is then used as a new IV for the output generator, and its state is reset to complete the reseed.
The output generator functions by feeding the 160 bit IV and the previous 160 bits that were output through SHA1 to create the next 160 bits for output. After 500 bytes are output (without a reseed), it replaces the IV with its current output to protect against backtracking.

So how did this happen? How did something claiming to implement Yarrow manage to look almost nothing at all like the published algorithm aside from using SHA1 in its entropy accumulator stage? To answer that, we needed to dig deeper into the origins of Yarrow itself.

On the Yarrow history page, we find not only the Yarrow paper, but also a link to some source code that is noted to implement an older version of Yarrow, not the one specified in the paper. And a quick look at that makes all the pieces of this puzzle start to rapidly fall into place.

Although MacOS switched from an LCG to Yarrow well after the formal paper defining it was published, they instead built their implementation of it on a copy of this experimental prerelease source from 1998, which was being used to test and refine the ideas behind what would ultimately become the Yarrow CSPRNG.

Even today, the core functions of what they are shipping are still mostly verbatim from what was in the Counterpane 0.8.71 source, but there are a few notable differences, beyond simply relicensing that public domain software under their own terms. Changes which mutate this into something significantly different again from the design that Counterpane had been initially experimenting with …

While the original was designed to take entropy input conservatively from multiple sources, the MacOS version only provides a single entropy source. The MacOS version then also disables the entropy estimation functionality (since it couldn't run the embedded zlib code in kernel space). It disables the slow poll sourcing of entropy. It disables the checking of whether there is sufficient entropy in the pool to safely perform a reseed (and instead always forces them to occur arbitrarily). And it seemingly ignores the documented warning noted in the original:

The biggest concern in the current design is the frequency with which reseed will be possible. For the suggested threshold value of 100 bits, only 12 bytes of output are guaranteed to be absolutely secure under this system. If this much entropy can not be acquired quickly enough (remembering that we are using a very conservative estimate of our entropy), the outputted keys,hashes,etc. could possibly be attacked more efficiently by brute-force cracking the generator state.
The current assumption (read: hope) is that those who are demanding values from the PRNG at a high rate are also producing entropy at a similar rate, or will be willing to wait longer for their values and allow a slow poll. This will need to be examined in light of the results of the testing of the quality of our current entropy sources, which is still underway (more details upon request).

Given the changes made to strengthen the design of Yarrow that are in the published paper (having separate fast and slow pools, using a strong block cipher for the output generator, emphasising the importance of not optimistically estimating entropy) it would seem that at least some of their remaining concerns could not be confidently dismissed.

Either way, we are still well short of demonstrating a trivial starvation attack (or any other) on the MacOS /dev/random device at this point, but there's certainly plenty of low hanging fruit for anyone who did want to pursue a more detailed analysis of it. I'd certainly be interested in seeing if XORing the TSC with itself as the only source of entropy amplifies any real correlation which may occur with that in practice under some conditions. Analysing a raw dump of that could be an easy and interesting thing to explore. But what we do know without doubt is that there's now decades of new research to draw from since any of this was even anywhere near being anything like best practice.

And I'd certainly be newly cautious of the advice commonly given, that unlike Linux, it's safe to just read as much entropy as you want from this device. An easy fix here is simple and obvious though. Fortuna was published in 2003 as a replacement for Yarrow that eliminated even more of the concerns its authors had with reliably doing this securely. And that was before SHA1 itself fell under a more strongly proven cloud of suspicion too.

I can't say for certain why Apple have not yet cared to give more of their attention to this, but it's possible that a shouty comment found in their source could be a clue:

WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! THIS FILE IS NEEDED TO PASS FIPS ACCEPTANCE FOR THE RANDOM NUMBER GENERATOR. IF YOU ALTER IT IN ANY WAY, WE WILL NEED TO GO THOUGH FIPS ACCEPTANCE AGAIN, AN OPERATION THAT IS VERY EXPENSIVE AND TIME CONSUMING. IN OTHER WORDS, DON'T MESS WITH THIS FILE. WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!

Enough said.

Update: 6 June 2017

We've since learned that the random(4) manpage was actually patched in Sierra, removing references to the SecurityServer and to the ability to write to /dev/random to contribute more entropy to the system. It also dropped the warning about the quality of its output being dependent on a sufficient supply of good entropy being available (a condition which hasn't actually changed).

And it adds a curious new recommendation … that using the arc4random(3) function should be preferred instead. Which can only be described as an interesting suggestion to make in late 2016 given that RFC 7465 was published in early 2015, and that known weaknesses in RC4 date back to at least 1995.

The obvious initial hope upon reading this was that, like OpenBSD 5.5, they had in fact replaced the actual implementation of it with something a bit stronger than RC4 (OpenBSD switched theirs to ChaCha20) – however if you believe Wikipedia and Apple's online manual pages again, then as of February 2015 at least, they were indeed still using RC4.

So I'm going to stop looking now. I think I am sufficiently convinced by exploring this that disaster fatigue is definitely a Real Thing. If anyone has some better news for Mac users about any of this, I'd certainly be glad to share that here as a further update on it in the future.

More portability

22 November 2016

The bit-babbler 0.6 software release is now available. This one mainly contains portability fixes for more platforms and for systems still using older versions of udev. For existing users where the previous releases have been working fine, the only possibly interesting changes in this one are a fix for the normalisation of QA statistics when processing large amounts of entropy on 32-bit systems, and a fix to the framing sanity check that is needed if the device is plugged into a USB 1.x port, for anyone who happens to still actually have one of those.

If you're using this on a 32-bit system and are likely to pull more than 2GB out of the device between restarts of the software, then we do recommend you update to this release, but otherwise you're unlikely to notice any real difference.

Many thanks to all the people who tested this on different systems and gave us good feedback on their needs and experiences. It's nice to have so many people share our interest in checking this all over as diligently as possible.

Virtual realities

18 January 2016

We've grown a lot of love for using virtual machines over the last few years. The number of things that they make easier and better is far too long to list here. But dealing directly with hardware is not yet one of them. And gathering good entropy in them has notable issues too.

Part of the trouble with obtaining real entropy in a virtualised environment, is that a VM usually is deliberately isolated from the hardware on the host. Which means that most of the physical sources of unpredictable events which the kernel will normally try to collect entropy from, are all now mediated by separately scheduled software – posing a big question about just how unpredictable they really still might be. And that's before we wonder what sort of correlations might occur with other VM guests that are running on the same host. The generally accepted solution is that one way or another we need a defined mechanism to import unique entropy from outside the software running the VM, and there's a few ways we can do that.

Things have surely improved since we last looked at it (the Linux kernel commit logs certainly say they have), but our initial attempts to use virtio-rng as a way to import entropy from the host machine into guests left a fair bit to be desired. Like, not crashing the system would have been nice. As would not greedily draining the host completely dry even when the guest was ostensibly idle. But lots of things related to this are a work in progress, and I don't have much of a right to grumble there, since we didn't dig deeply into debugging it further, or report it beyond commenting on it in IRC and not seeing much interest in more details. Which isn't ideal, but fixes to the kernel or QEMU would take time to become widely available, and we needed an answer which could work with what we had there and then. It's an unfortunate truth that we won't live long enough to chase every bug we see in someone else's code, and we can't just tell all of our users you need to be on a bleeding edge kernel, so we need to pick our pursuits wisely, and be diligent at doing our bit to keep our own house clean of them on all of the systems that our users really do need to support.

But the free software model works if everyone has the right measure of patience with others and scratching their own itches, so we needed a different plan to tie us over until that was ready for more general use. And we had a related itch we'd already started scratching at.

Since we'd already started experiments on what would become the BitBabbler hardware, the best answer there for us also already seemed clear. We just needed to be able to use them directly inside guest machines too. Which of course then made the VMs dealing directly with hardware problem become very much our problem too (though we already had an existing interest in that for our telephony hardware and other things as well).

I'm not going to go into too much detail on that here either, mainly because it's a Long Story, and if you really want to hear it (or even if you don't!) you'll find it in the documentation of how to set this up in the software package. And because this time I do plan to find the time to open another discussion about how we can improve this with the libvirt developers, so I don't want to get sick of repeating it before that has fully run its course.

The short version is, we now have a pretty close approximation to full USB hotplug functionality in libvirt managed KVM/QEMU virtual machines. It's not exactly what I'd call pretty on the inside, but it is easy to use and administer, and more robust and reliable than the previous set of hacks which we were using for this, and it makes BitBabbler devices assigned to guest machines behave just like you'd expect them to when using them from the host. No matter when you plug them in or remove them, or when you start or stop the guest.

The next step now is to try and get the missing functionality that we need supported more directly by libvirt, and we at least have a clear demonstration of why it's needed there and what sort of awful things people need to do if it isn't, which hopefully will help with that. Or at the very least we have something people can point at when explaining how silly we've been to miss the obvious easy answer that we should have been using instead. Then we can fix that and everyone wins from refining an example of current best practice.

But in the meantime, if this is something you need too, install the bit-babbler 0.5 release, and have a look at the bbvirt(1) man page and/or the virtual_machines document (in the doc directory of the source, or /usr/share/doc/bit-babbler directory that the Debian binary packages install). You'll find the longer story there, and a quick-start guide to getting it up and running as painlessly as possible.

It's not that hard being green

6 January 2016

Evolution. It's life's unrolling game where either you grow into your environment, or it grows all over you. Where even the rocks end up different to how they started – regardless of whether they'd ever gathered any moss or not.

Our starting point was wanting a cryptographically secure, high-quality entropy source that wouldn't starve and stall the system under heavy demand. So that naturally shaped our initial assumptions in both the supporting software and the hardware. The early focus in the software was largely on keeping throughput up, latency down, and having a regular supply of fresh entropy still being drawn from the hardware, analysed for anomalies, and mixed into the pools, even when it wasn't all being consumed from them (since otherwise, it would just be going to waste).

But there's also another species of important uses here too, where the primary interest is avoiding a different kind of waste. Wasted power. And I don't mean in the What have the Romans ever done for us? sense.

It's not like we actually draw a lot of it, even under peak usage, but in commercial data centers small numbers can have large multipliers, and there's also a growing interest in very low power home servers and in optimising them to be as efficient as they possibly can be. Where even if the current drawn by the device itself is low, waking the CPU to read from it when the system would otherwise be completely idle is still a cost that some people would, quite reasonably, like to avoid.

The good news is, doing a good job of catering for that type of use too really isn't a very big stretch from where we already were. The BitBabbler hardware itself has support for being idled into a very low power consumption mode (on the scale of microamps). Kernel support for suspending USB devices and controllers at runtime is a thing. And the frequency at which we opportunistically refresh the entropy pools when they aren't being drained was already being set by internally configurable options. So mostly we just needed to expose some more knobs to let people select the desired behaviour that most suits their own use case.

And this is exactly what the first set of changes in the bit-babbler 0.5 software release add. If you install it using the Debian packages, there are new udev rules which will enable the kernel autosuspend mode for BitBabbler devices, and if you pass the --low-power option to the daemon it will be much more conservative about reading from the devices when there isn't demand for entropy, and release them when they are idle so that the OS can suspend them (along with any controllers or hubs they are connected to).

If you want more direct control, the options which that is an alias for are all individually configurable too. There's a few caveats to using it still – some USB controllers don't handle being suspended as well as they probably should, and if you're doing this with the devices connected to an XHCI (USB3) port, then using a recent kernel is advisable. But it's working well enough to push out for broader testing.

Reduce · Reuse · Reconsider

22 December 2015

The bit-babbler 0.4 software release is now tagged and uploaded. If you're using the packages which ship with Debian, it should be available from the mirrors for Sid by the time you read this, and should migrate to Stretch in about 5 days time if nobody finds something silly we missed.

This one was originally planned to be just a few minor tweaks to get it building for the BSDs (and built for the Debian kFreeBSD port), but the best laid plans and all that … Getting it to build was easy, getting it to work proved to be a somewhat more involved task. The kFreeBSD port had packages for libftdi built with libusb-0.1, but if you'd assumed, like we did, that this implied they actually worked there, then you'd have been about as surprised as we were when they simply didn't at all. Everything builds and runs, it just can't see any USB devices – which isn't much use to any of us.

It turns out that FreeBSD is one of the few targets which the libusb source that most platforms use isn't directly ported to. Mostly because the FreeBSD developers have their own USB library (which is not-confusingly-at-all also named libusb), but which fortunately does also provide an API that is compatible with the libusb in use elsewhere.

So after some gnashing of teeth, a quick trip through all the stages of grief, and a hasty rescheduling of all the other things I had planned for that week, we decided to bite the bullet and switch to using the libusb-1.0 API directly, and the platform native implementation of it.

With the benefit of hindsight now, there seems to be no doubt that this was time well spent. By taking direct control over the device ourselves rather than going through the libftdi abstraction, we've been able to simplify things considerably, improve the error reporting and handling if things go wrong, be more efficient with getting data out of the device so CPU usage is reduced and maximum throughput is increased by notable margins, and we've further minimised the barriers to porting this to new platforms now too.

By using libusb-1.0 we can support some features more widely that were previously only available when built with libudev, like being able to identify devices by their physical address on the USB bus, and having hotplug support – and we get better support and lots of bugfixes for platforms other than Linux. So this unanticipated cake turned out to have plenty of delicious icing on it.

Of course a major refactoring like this isn't entirely without risk and this new code hasn't yet had as much time in long term testing as the previous releases did, but it's been running on all the servers here for a few weeks now without obvious trouble, and it makes building this for Windows users much easier, makes using it on BSD possible, and fixes a few minor issues on some of the more obscure architectures that the Debian buildds shook out, so we think it's ready to get some broader testing by more people and on more of the platforms that they want to use.

Hip to Chi-square

18 December 2015

The idea of there being a test, which when run just once doesn't actually give you a right or wrong, pass or fail result; where any single result that it outputs could be an indication of a good or bad outcome; and where the only way to know which is which is to run the test many times, and then run it again on its own results … isn't something that's necessarily intuitive to people who haven't seen that sort of thing before and had time to think a bit about why that's how it works.

So as we've had more people taking an interest in digging deeper into the details of this, it's become apparent that this was something we'd probably touched on a bit too briefly for anyone who isn't already familiar with the nature of statistical testing methods. And since it seems like a fundamentally important detail which warrants more than just a footnote on the FAQ page, we've instead added a better introduction to this to the description of the tests from the ENT suite, though it's not specific to only the Chi-square test included there.

If you already know how goodness of fit and significance testing works, then there's probably not a lot we've said there which will be news for you, we have tried to keep it as simple and accessible as possible – though if you'd like to proof read it and point out anything that we can say more clearly / better / less wrongly, that would be a welcome contribution too! It's tricky to explain this briefly in a way that's both still readable to people who it's new for, and formally correct for people skilled in the arts, so there might be some loose language there we could still tighten up or improve on. But as a starting point it should at least give people some extra clues to run with and search for if they do want to learn more about that.

Stretch goals

25 November 2015

Adrenaline. It's such a simple molecule, but it puts caffeine to shame in the effect it can have on our minds and bodies, and sometimes you don't even need to get up out of your seat to make it.

And apparently it doesn't matter how much of your life you've spent overdosing on it by putting your body into places and situations that people without a taste for it would rather avoid, and teaching your mind to deal with that. It can still consume you with its effects almost completely whenever your mind says, quietly or otherwise, Hey, this might be dangerous and what you're about to do next could get you into some trouble that you don't want to have …

Are you sure you really want to do this? There's still time to just turn around and back away quietly. If you think that would be better … Is that what you really think?

The nagging voice of self-doubt. It can be both good for you or bad for you depending on when you listen to it. And adrenaline amplifies it from quiet nagging to insistent urging that won't be ignored, however you try.

For some strange reason, publishing new software still often does that to me. Not always, but if it's something critical, where a mistake could cause real data loss, or where it's used in Failure Is Not An Option situations or other deployments with real consequences for not achieving that goal, or even just something that's completely new, then definitely more often than not. And we have plenty of those sort of users for our telephony gear, and for some of the other software I've authored or maintain, so it's not like I don't get enough practice at doing this.

It's strange because you can put me in the open door of an aircraft at 15,000 feet, about to step out of it, alone or with a group of some of the most completely crazy (and fun!) people that you're ever likely to meet, and it's easy to be totally calm and controlled about what I expect to happen next and what I need to do to make that all happen according to plan.

But put my finger over the button that is about to upload a piece of software, that I'm responsible for making, with the potential for ill consequences to other people – and I might as well be locked in a cage with a sleeping tiger. I know what I need to do is get out of there alive, but is what I'm going to do next going to wake it up, grumpy, startled, or hungry, before I do?

The difference between those two situations probably is almost that simple. In one case I (think) I know everything that is going to happen next, and whether it does or not will be determined largely by what I do.

In the other there is a far more pure uncertainty.

No matter how careful I am, no matter how careful I've been, what's going to happen next is a function of what happens in other people's minds. Not my own. And nothing in the world is really as scary as that. No matter how many times I put out some new piece of software, and no matter how many times it's either well received or simply goes almost entirely unnoticed, and nothing terrible actually happens, it's still an act of stepping, irrevocably, into a great new unknown. A world of fresh surprises and problems to learn how to avoid.

And I still love the rush that comes with that, whatever it is that brings it on. It never gets old.

Which is all perhaps just a long winded way of saying bit-babbler 0.3 is now in the incoming queue for Debian Stretch. But I'm writing under the influence of adrenaline and I know it well. My heart is racing. My mind is flitting wildly through every possible thing we might have forgotten. And we now await your judgement. Whenever and however it comes. At the time and mood of your choosing.

We've done all the preparation for this that we can reasonably think of doing. And now we've stepped out of the door with it into the airflow. There's no going back. All that remains is to see if we really can land it safely, and not hurt anyone else.

None shall pass(through)

16 November 2015

Just a quick heads-up for the people using USB passthrough to make their devices available inside libvirt managed virtual machines. If you're doing this on a system with cgroups enabled, or have updated your host machine to one (like if systemd is now your init system), then you'll need to make sure you've also added the devices you are passing through to the cgroup_device_acl array which is defined in /etc/libvirt/qemu.conf (or wherever that is done on your system).

There's more detail about all of that in the documentation for configuring virtual machines in the software package – but if you're wondering why passthrough suddenly stopped working after you updated your host machine, this is probably the reason. The USB devices aren't in the default set that the VMs are granted access to when cgroup access control is enabled. At least not until we get the extra support discussed here included in libvirt (at which point it should manage the needed cgroup ACL itself).

Update: If you're using the bit-babbler 0.5 release or later, and bbvirt to manage the passthrough, then you don't need to do this anymore. The needed permission will be managed automatically by libvirt.

Release early, release often

5 November 2015

It's a lovely aphorism, for things that aren't mission-critical and for developers who prefer to outsource testing to their end users rather than spend their own precious time on boring things like that – but when it comes to hardware it's really more of a euphemism for wasting a lot of time and money on product recall and replacement.

So we've been rather publicity-shy, until we'd convinced ourselves that we'd checked, and checked the checks, and stuck our own fingers into all the places where something might lurk that would bite them – because the prize for being the first to have a massive embarrassing recall has long ago been won, and there really is nothing more tedious than having to rework large numbers of units because you'd made a stupid mistake that would have been easy to find and avoid if you'd actually tested it. We've been there in years gone by with other hardware, and it's not a place we ever want to go back to, even for a quick visit.

But we're well into testing the new production sized run of White devices that we did recently, and the results we're seeing for those are so far nicely consistent with what we saw from the prototype run. We've had some really good feedback from an excellent and diverse group of early adopters who already found us and wanted devices for their own, and so we're starting to feel a bit more comfortable that if there's something we've still missed, it's not going to be an instant show-stopper that will be a major pain to remedy.

The software is performing well, and we're not getting any requests that hint at it needing some sort of major redesign to really be useful for a wide range of people and applications. The sort of wish list things that are currently on the horizon should all fit into it quite well without disruption to anyone who has already deployed it if they update.

The biggest complaint we're now getting is along the lines of why isn't this actually in Debian yet? Which is a good sign that it really is time to fix that very shortly now. It's not that hard to build your own packages of it, but it's still a lot more convenient to not have to. Thanks to everyone who has been patient with us over that and has given us good feedback on the version 0.2 snapshots. When we get through the last of what's still pending for that (which isn't much now), we'll tag version 0.3 and push that one out for inclusion in the Debian Stretch release.

She cannae take much more Capt'n

15 October 2015

It looks like we're going to need external power for the USB hubs if we want to run more than about 60 devices in most of the machines that we presently have set up for testing them – which wasn't an entirely unexpected limit with all things considered. The good news is, the hubs we bought have a socket for external power. The bad news is, after looking inside them, that socket is connected directly to the V_BUS rail coming from the host motherboard, with no isolation for either it or the data line pull-up when they are running self-powered.

And where by running, I mean for the brief instant between when you plug it in and when things probably go badly downhill from there. Who lets these people design and build things for others to use … Don't cross the streams isn't a hard design rule to grok.

The happy news, is we can run 60 BitBabbler White devices, all streaming random bits out at the default maximum rate, in the same machine as we have four Octal ISDN cards running high load callgen testing. That's about 5.5 million phone calls a day on 960 telephony channels, 1.2 terabytes a day of audio processed, and 30 gigabytes an hour of raw entropy, all happily purring away together on a fairly cheap consumer-grade motherboard that we bought off the shelf a few years ago from the local computer store.

We like to use low-end hardware for routine stress testing, because if it all works peachy there, then scaling things up further, to be Carrier-grade and Web-scale like all the cool kids are, is just a matter of throwing as many dollars at the problem as it takes to feel like you are. We know our system won't sweat it.

Please sir, can I have a little more

28 September 2015

So the long term testing of the first hundred units we made has still been looking really good. Beyond what I'd even dared to hope for in fact. When doing the initial tuning to see just how fast we could clock bits out of these, we found some devices could be pushed notably harder than others before the quality of their output would start to degrade. But even with the fairly conservative defaults that we settled on, I'd been expecting that a few of the devices at the opposite end of that spectrum might eventually show some sign of weakness in their output if we just let them run for long enough and accumulated enough trillions of bits for that to finally become statistically significant in one or more of the QA tests.

But that hasn't been the case. We've been monitoring them continuously, graphing statistics on the QA tests with munin, and so far we've had exactly zero devices fail the long term testing. Which would normally be a really surprising result for just about any hardware project – until I'm reminded exactly how many prototypes we did actually build before doing this run, and the hell we put them all through before settling on a design to sample in larger quantities. So yeah, maybe not quite so surprising, but still a very pleasing result for the first batch.

This of course leaves us with only one sensible course of action. Build and test more of them! We've had lots of interest in the White devices, so we've ordered the parts to do another run of 500, and ordered a stack of extra USB hubs to fill with them once they come back from our fab. The next big test will be to see just how many of them we can cram into each machine in the rack before smoke starts pouring out of something or circuit breakers start tripping.

Anything that isn't tested is broken

15 September 2015

And if we had any lingering doubt whatsoever in the eternal truth of that, it would have been quickly dispelled once more Windows users came along who actually did want to use the native build there!

Having not owned or had to write software for Windows now for … well, let's not think about how many years ago that was now; and since wine support for USB devices is still basically non-existent, it wasn't really much of a shock that there were a few teething problems still to sort out with that. But it was pleasantly close, and with the help of a very patient user who relayed details of what did and didn't work on their system we got this actually tested and confirmed working as expected there.

I'd almost forgotten just how many amusing idiosyncrasies it has with respect to otherwise standard functions, and either I really have forgotten or it appears to have grown even more of them since I wrote code for it last, but so far it appears to be working well and we haven't had any new reports of trouble there yet.

Socket to me

28 July 2015

Well that didn't take long. Sorry BSD people, but the Windows users asked us to support their platform before you did. Which surprised me a little, but the squeaky wheel gets the grease, and so the first round of portability tweaks goes to them.

The response we've had from people so far has actually been rather awesome, thanks to all of you for the kind words of appreciation about the effort we've put into this and the suggestions for things it would also be useful to support. It caught us a bit off guard really, we'd barely had the website up for a week, and hadn't really told anybody about it except for our bank and shipping company (who wanted to see it before they'd talk to us about using the BitBabbler name with our accounts), when the first few people already started emailing us asking if we still had any we could sell. So getting the website completed, and posting updates here, has sort of played second fiddle to improving the software further in response to plenty of new user feedback.

The first major change was adding the ability obtain entropy directly from the device output pool via a UDP socket too. Having only options to send it to stdout or to the kernel was fine for our own needs, but neither of those were going to be much use for anyone wanting to use this on Windows. This is also useful for more than just those people though, since it means you now don't have to choose between using a device to feed entropy to the kernel or reading raw bits from it directly, you can just timeshare it to do both simultaneously if you ever need that.

It also means you don't actually have to run it on Windows to use the entropy from it in Windows applications. And so the first round of porting this to Windows stopped without it ever actually being tested there, with the BitBabbler instead running on a small ARM board, with a minimal install of Debian on it, feeding entropy to Windows applications over a private network segment. But the architectural changes that were needed for that got done, and it was successfully building with the mingw-w64 toolchain from Debian Stretch.

We've now added a udev rule and a system group to the Debian package, so that normal users without elevated privilege (other than being placed in the bit-babbler group) can access the device directly. We got a lot of requests from people who wanted good random numbers for purposes other than feeding entropy to the kernel, so this will make things a bit more flexible and user-friendly for them as well.