Random thoughts

RSS

More portability

22 November 2016

The bit-babbler 0.6 software release is now available. This one mainly contains portability fixes for more platforms and for systems still using older versions of udev. For existing users where the previous releases have been working fine, the only possibly interesting changes in this one are a fix for the normalisation of QA statistics when processing large amounts of entropy on 32-bit systems, and a fix to the framing sanity check that is needed if the device is plugged into a USB 1.x port, for anyone who happens to still actually have one of those.

If you're using this on a 32-bit system and are likely to pull more than 2GB out of the device between restarts of the software, then we do recommend you update to this release, but otherwise you're unlikely to notice any real difference.

Many thanks to all the people who tested this on different systems and gave us good feedback on their needs and experiences. It's nice to have so many people share our interest in checking this all over as diligently as possible.

Virtual realities

18 January 2016

We've grown a lot of love for using virtual machines over the last few years. The number of things that they make easier and better is far too long to list here. But dealing directly with hardware is not yet one of them. And gathering good entropy in them has notable issues too.

Part of the trouble with obtaining real entropy in a virtualised environment, is that a VM usually is deliberately isolated from the hardware on the host. Which means that most of the physical sources of unpredictable events which the kernel will normally try to collect entropy from, are all now mediated by separately scheduled software – posing a big question about just how unpredictable they really still might be. And that's before we wonder what sort of correlations might occur with other VM guests that are running on the same host. The generally accepted solution is that one way or another we need a defined mechanism to import unique entropy from outside the software running the VM, and there's a few ways we can do that.

Things have surely improved since we last looked at it (the Linux kernel commit logs certainly say they have), but our initial attempts to use virtio-rng as a way to import entropy from the host machine into guests left a fair bit to be desired. Like, not crashing the system would have been nice. As would not greedily draining the host completely dry even when the guest was ostensibly idle. But lots of things related to this are a work in progress, and I don't have much of a right to grumble there, since we didn't dig deeply into debugging it further, or report it beyond commenting on it in IRC and not seeing much interest in more details. Which isn't ideal, but fixes to the kernel or QEMU would take time to become widely available, and we needed an answer which could work with what we had there and then. It's an unfortunate truth that we won't live long enough to chase every bug we see in someone else's code, and we can't just tell all of our users you need to be on a bleeding edge kernel, so we need to pick our pursuits wisely, and be diligent at doing our bit to keep our own house clean of them on all of the systems that our users really do need to support.

But the free software model works if everyone has the right measure of patience with others and scratching their own itches, so we needed a different plan to tie us over until that was ready for more general use. And we had a related itch we'd already started scratching at.

Since we'd already started experiments on what would become the BitBabbler hardware, the best answer there for us also already seemed clear. We just needed to be able to use them directly inside guest machines too. Which of course then made the VMs dealing directly with hardware problem become very much our problem too (though we already had an existing interest in that for our telephony hardware and other things as well).

I'm not going to go into too much detail on that here either, mainly because it's a Long Story, and if you really want to hear it (or even if you don't!) you'll find it in the documentation of how to set this up in the software package. And because this time I do plan to find the time to open another discussion about how we can improve this with the libvirt developers, so I don't want to get sick of repeating it before that has fully run its course.

The short version is, we now have a pretty close approximation to full USB hotplug functionality in libvirt managed KVM/QEMU virtual machines. It's not exactly what I'd call pretty on the inside, but it is easy to use and administer, and more robust and reliable than the previous set of hacks which we were using for this, and it makes BitBabbler devices assigned to guest machines behave just like you'd expect them to when using them from the host. No matter when you plug them in or remove them, or when you start or stop the guest.

The next step now is to try and get the missing functionality that we need supported more directly by libvirt, and we at least have a clear demonstration of why it's needed there and what sort of awful things people need to do if it isn't, which hopefully will help with that. Or at the very least we have something people can point at when explaining how silly we've been to miss the obvious easy answer that we should have been using instead. Then we can fix that and everyone wins from refining an example of current best practice.

But in the meantime, if this is something you need too, install the bit-babbler 0.5 release, and have a look at the bbvirt(1) man page and/or the virtual_machines document (in the doc directory of the source, or /usr/share/doc/bit-babbler directory that the Debian binary packages install). You'll find the longer story there, and a quick-start guide to getting it up and running as painlessly as possible.

It's not that hard being green

6 January 2016

Evolution. It's life's unrolling game where either you grow into your environment, or it grows all over you. Where even the rocks end up different to how they started – regardless of whether they'd ever gathered any moss or not.

Our starting point was wanting a cryptographically secure, high-quality entropy source that wouldn't starve and stall the system under heavy demand. So that naturally shaped our initial assumptions in both the supporting software and the hardware. The early focus in the software was largely on keeping throughput up, latency down, and having a regular supply of fresh entropy still being drawn from the hardware, analysed for anomalies, and mixed into the pools, even when it wasn't all being consumed from them (since otherwise, it would just be going to waste).

But there's also another species of important uses here too, where the primary interest is avoiding a different kind of waste. Wasted power. And I don't mean in the What have the Romans ever done for us? sense.

It's not like we actually draw a lot of it, even under peak usage, but in commercial data centers small numbers can have large multipliers, and there's also a growing interest in very low power home servers and in optimising them to be as efficient as they possibly can be. Where even if the current drawn by the device itself is low, waking the CPU to read from it when the system would otherwise be completely idle is still a cost that some people would, quite reasonably, like to avoid.

The good news is, doing a good job of catering for that type of use too really isn't a very big stretch from where we already were. The BitBabbler hardware itself has support for being idled into a very low power consumption mode (on the scale of microamps). Kernel support for suspending USB devices and controllers at runtime is a thing. And the frequency at which we opportunistically refresh the entropy pools when they aren't being drained was already being set by internally configurable options. So mostly we just needed to expose some more knobs to let people select the desired behaviour that most suits their own use case.

And this is exactly what the first set of changes in the bit-babbler 0.5 software release add. If you install it using the Debian packages, there are new udev rules which will enable the kernel autosuspend mode for BitBabbler devices, and if you pass the --low-power option to the daemon it will be much more conservative about reading from the devices when there isn't demand for entropy, and release them when they are idle so that the OS can suspend them (along with any controllers or hubs they are connected to).

If you want more direct control, the options which that is an alias for are all individually configurable too. There's a few caveats to using it still – some USB controllers don't handle being suspended as well as they probably should, and if you're doing this with the devices connected to an XHCI (USB3) port, then using a recent kernel is advisable. But it's working well enough to push out for broader testing.

Reduce · Reuse · Reconsider

22 December 2015

The bit-babbler 0.4 software release is now tagged and uploaded. If you're using the packages which ship with Debian, it should be available from the mirrors for Sid by the time you read this, and should migrate to Stretch in about 5 days time if nobody finds something silly we missed.

This one was originally planned to be just a few minor tweaks to get it building for the BSDs (and built for the Debian kFreeBSD port), but the best laid plans and all that … Getting it to build was easy, getting it to work proved to be a somewhat more involved task. The kFreeBSD port had packages for libftdi built with libusb-0.1, but if you'd assumed, like we did, that this implied they actually worked there, then you'd have been about as surprised as we were when they simply didn't at all. Everything builds and runs, it just can't see any USB devices – which isn't much use to any of us.

It turns out that FreeBSD is one of the few targets which the libusb source that most platforms use isn't directly ported to. Mostly because the FreeBSD developers have their own USB library (which is not-confusingly-at-all also named libusb), but which fortunately does also provide an API that is compatible with the libusb in use elsewhere.

So after some gnashing of teeth, a quick trip through all the stages of grief, and a hasty rescheduling of all the other things I had planned for that week, we decided to bite the bullet and switch to using the libusb-1.0 API directly, and the platform native implementation of it.

With the benefit of hindsight now, there seems to be no doubt that this was time well spent. By taking direct control over the device ourselves rather than going through the libftdi abstraction, we've been able to simplify things considerably, improve the error reporting and handling if things go wrong, be more efficient with getting data out of the device so CPU usage is reduced and maximum throughput is increased by notable margins, and we've further minimised the barriers to porting this to new platforms now too.

By using libusb-1.0 we can support some features more widely that were previously only available when built with libudev, like being able to identify devices by their physical address on the USB bus, and having hotplug support – and we get better support and lots of bugfixes for platforms other than Linux. So this unanticipated cake turned out to have plenty of delicious icing on it.

Of course a major refactoring like this isn't entirely without risk and this new code hasn't yet had as much time in long term testing as the previous releases did, but it's been running on all the servers here for a few weeks now without obvious trouble, and it makes building this for Windows users much easier, makes using it on BSD possible, and fixes a few minor issues on some of the more obscure architectures that the Debian buildds shook out, so we think it's ready to get some broader testing by more people and on more of the platforms that they want to use.

Hip to Chi-square

18 December 2015

The idea of there being a test, which when run just once doesn't actually give you a right or wrong, pass or fail result; where any single result that it outputs could be an indication of a good or bad outcome; and where the only way to know which is which is to run the test many times, and then run it again on its own results … isn't something that's necessarily intuitive to people who haven't seen that sort of thing before and had time to think a bit about why that's how it works.

So as we've had more people taking an interest in digging deeper into the details of this, it's become apparent that this was something we'd probably touched on a bit too briefly for anyone who isn't already familiar with the nature of statistical testing methods. And since it seems like a fundamentally important detail which warrants more than just a footnote on the FAQ page, we've instead added a better introduction to this to the description of the tests from the ENT suite, though it's not specific to only the Chi-square test included there.

If you already know how goodness of fit and significance testing works, then there's probably not a lot we've said there which will be news for you, we have tried to keep it as simple and accessible as possible – though if you'd like to proof read it and point out anything that we can say more clearly / better / less wrongly, that would be a welcome contribution too! It's tricky to explain this briefly in a way that's both still readable to people who it's new for, and formally correct for people skilled in the arts, so there might be some loose language there we could still tighten up or improve on. But as a starting point it should at least give people some extra clues to run with and search for if they do want to learn more about that.

Stretch goals

25 November 2015

Adrenaline. It's such a simple molecule, but it puts caffeine to shame in the effect it can have on our minds and bodies, and sometimes you don't even need to get up out of your seat to make it.

And apparently it doesn't matter how much of your life you've spent overdosing on it by putting your body into places and situations that people without a taste for it would rather avoid, and teaching your mind to deal with that. It can still consume you with its effects almost completely whenever your mind says, quietly or otherwise, Hey, this might be dangerous and what you're about to do next could get you into some trouble that you don't want to have …

Are you sure you really want to do this? There's still time to just turn around and back away quietly. If you think that would be better … Is that what you really think?

The nagging voice of self-doubt. It can be both good for you or bad for you depending on when you listen to it. And adrenaline amplifies it from quiet nagging to insistent urging that won't be ignored, however you try.

For some strange reason, publishing new software still often does that to me. Not always, but if it's something critical, where a mistake could cause real data loss, or where it's used in Failure Is Not An Option situations or other deployments with real consequences for not achieving that goal, or even just something that's completely new, then definitely more often than not. And we have plenty of those sort of users for our telephony gear, and for some of the other software I've authored or maintain, so it's not like I don't get enough practice at doing this.

It's strange because you can put me in the open door of an aircraft at 15,000 feet, about to step out of it, alone or with a group of some of the most completely crazy (and fun!) people that you're ever likely to meet, and it's easy to be totally calm and controlled about what I expect to happen next and what I need to do to make that all happen according to plan.

But put my finger over the button that is about to upload a piece of software, that I'm responsible for making, with the potential for ill consequences to other people – and I might as well be locked in a cage with a sleeping tiger. I know what I need to do is get out of there alive, but is what I'm going to do next going to wake it up, grumpy, startled, or hungry, before I do?

The difference between those two situations probably is almost that simple. In one case I (think) I know everything that is going to happen next, and whether it does or not will be determined largely by what I do.

In the other there is a far more pure uncertainty.

No matter how careful I am, no matter how careful I've been, what's going to happen next is a function of what happens in other people's minds. Not my own. And nothing in the world is really as scary as that. No matter how many times I put out some new piece of software, and no matter how many times it's either well received or simply goes almost entirely unnoticed, and nothing terrible actually happens, it's still an act of stepping, irrevocably, into a great new unknown. A world of fresh surprises and problems to learn how to avoid.

And I still love the rush that comes with that, whatever it is that brings it on. It never gets old.

Which is all perhaps just a long winded way of saying bit-babbler 0.3 is now in the incoming queue for Debian Stretch. But I'm writing under the influence of adrenaline and I know it well. My heart is racing. My mind is flitting wildly through every possible thing we might have forgotten. And we now await your judgement. Whenever and however it comes. At the time and mood of your choosing.

We've done all the preparation for this that we can reasonably think of doing. And now we've stepped out of the door with it into the airflow. There's no going back. All that remains is to see if we really can land it safely, and not hurt anyone else.

None shall pass(through)

16 November 2015

Just a quick heads-up for the people using USB passthrough to make their devices available inside libvirt managed virtual machines. If you're doing this on a system with cgroups enabled, or have updated your host machine to one (like if systemd is now your init system), then you'll need to make sure you've also added the devices you are passing through to the cgroup_device_acl array which is defined in /etc/libvirt/qemu.conf (or wherever that is done on your system).

There's more detail about all of that in the documentation for configuring virtual machines in the software package – but if you're wondering why passthrough suddenly stopped working after you updated your host machine, this is probably the reason. The USB devices aren't in the default set that the VMs are granted access to when cgroup access control is enabled. At least not until we get the extra support discussed here included in libvirt (at which point it should manage the needed cgroup ACL itself).

Update: If you're using the bit-babbler 0.5 release or later, and bbvirt to manage the passthrough, then you don't need to do this anymore. The needed permission will be managed automatically by libvirt.

Release early, release often

5 November 2015

It's a lovely aphorism, for things that aren't mission-critical and for developers who prefer to outsource testing to their end users rather than spend their own precious time on boring things like that – but when it comes to hardware it's really more of a euphemism for wasting a lot of time and money on product recall and replacement.

So we've been rather publicity-shy, until we'd convinced ourselves that we'd checked, and checked the checks, and stuck our own fingers into all the places where something might lurk that would bite them – because the prize for being the first to have a massive embarrassing recall has long ago been won, and there really is nothing more tedious than having to rework large numbers of units because you'd made a stupid mistake that would have been easy to find and avoid if you'd actually tested it. We've been there in years gone by with other hardware, and it's not a place we ever want to go back to, even for a quick visit.

But we're well into testing the new production sized run of White devices that we did recently, and the results we're seeing for those are so far nicely consistent with what we saw from the prototype run. We've had some really good feedback from an excellent and diverse group of early adopters who already found us and wanted devices for their own, and so we're starting to feel a bit more comfortable that if there's something we've still missed, it's not going to be an instant show-stopper that will be a major pain to remedy.

The software is performing well, and we're not getting any requests that hint at it needing some sort of major redesign to really be useful for a wide range of people and applications. The sort of wish list things that are currently on the horizon should all fit into it quite well without disruption to anyone who has already deployed it if they update.

The biggest complaint we're now getting is along the lines of why isn't this actually in Debian yet? Which is a good sign that it really is time to fix that very shortly now. It's not that hard to build your own packages of it, but it's still a lot more convenient to not have to. Thanks to everyone who has been patient with us over that and has given us good feedback on the version 0.2 snapshots. When we get through the last of what's still pending for that (which isn't much now), we'll tag version 0.3 and push that one out for inclusion in the Debian Stretch release.

She cannae take much more Capt'n

15 October 2015

It looks like we're going to need external power for the USB hubs if we want to run more than about 60 devices in most of the machines that we presently have set up for testing them – which wasn't an entirely unexpected limit with all things considered. The good news is, the hubs we bought have a socket for external power. The bad news is, after looking inside them, that socket is connected directly to the VBUS rail coming from the host motherboard, with no isolation for either it or the data line pull-up when they are running self-powered.

And where by running, I mean for the brief instant between when you plug it in and when things probably go badly downhill from there. Who lets these people design and build things for others to use … Don't cross the streams isn't a hard design rule to grok.

The happy news, is we can run 60 BitBabbler White devices, all streaming random bits out at the default maximum rate, in the same machine as we have four Octal ISDN cards running high load callgen testing. That's about 5.5 million phone calls a day on 960 telephony channels, 1.2 terabytes a day of audio processed, and 30 gigabytes an hour of raw entropy, all happily purring away together on a fairly cheap consumer-grade motherboard that we bought off the shelf a few years ago from the local computer store.

We like to use low-end hardware for routine stress testing, because if it all works peachy there, then scaling things up further, to be Carrier-grade and Web-scale like all the cool kids are, is just a matter of throwing as many dollars at the problem as it takes to feel like you are. We know our system won't sweat it.

Please sir, can I have a little more

28 September 2015

So the long term testing of the first hundred units we made has still been looking really good. Beyond what I'd even dared to hope for in fact. When doing the initial tuning to see just how fast we could clock bits out of these, we found some devices could be pushed notably harder than others before the quality of their output would start to degrade. But even with the fairly conservative defaults that we settled on, I'd been expecting that a few of the devices at the opposite end of that spectrum might eventually show some sign of weakness in their output if we just let them run for long enough and accumulated enough trillions of bits for that to finally become statistically significant in one or more of the QA tests.

But that hasn't been the case. We've been monitoring them continuously, graphing statistics on the QA tests with munin, and so far we've had exactly zero devices fail the long term testing. Which would normally be a really surprising result for just about any hardware project – until I'm reminded exactly how many prototypes we did actually build before doing this run, and the hell we put them all through before settling on a design to sample in larger quantities. So yeah, maybe not quite so surprising, but still a very pleasing result for the first batch.

This of course leaves us with only one sensible course of action. Build and test more of them! We've had lots of interest in the White devices, so we've ordered the parts to do another run of 500, and ordered a stack of extra USB hubs to fill with them once they come back from our fab. The next big test will be to see just how many of them we can cram into each machine in the rack before smoke starts pouring out of something or circuit breakers start tripping.

Anything that isn't tested is broken

15 September 2015

And if we had any lingering doubt whatsoever in the eternal truth of that, it would have been quickly dispelled once more Windows users came along who actually did want to use the native build there!

Having not owned or had to write software for Windows now for … well, let's not think about how many years ago that was now; and since wine support for USB devices is still basically non-existent, it wasn't really much of a shock that there were a few teething problems still to sort out with that. But it was pleasantly close, and with the help of a very patient user who relayed details of what did and didn't work on their system we got this actually tested and confirmed working as expected there.

I'd almost forgotten just how many amusing idiosyncrasies it has with respect to otherwise standard functions, and either I really have forgotten or it appears to have grown even more of them since I wrote code for it last, but so far it appears to be working well and we haven't had any new reports of trouble there yet.

Socket to me

28 July 2015

Well that didn't take long. Sorry BSD people, but the Windows users asked us to support their platform before you did. Which surprised me a little, but the squeaky wheel gets the grease, and so the first round of portability tweaks goes to them.

The response we've had from people so far has actually been rather awesome, thanks to all of you for the kind words of appreciation about the effort we've put into this and the suggestions for things it would also be useful to support. It caught us a bit off guard really, we'd barely had the website up for a week, and hadn't really told anybody about it except for our bank and shipping company (who wanted to see it before they'd talk to us about using the BitBabbler name with our accounts), when the first few people already started emailing us asking if we still had any we could sell. So getting the website completed, and posting updates here, has sort of played second fiddle to improving the software further in response to plenty of new user feedback.

The first major change was adding the ability obtain entropy directly from the device output pool via a UDP socket too. Having only options to send it to stdout or to the kernel was fine for our own needs, but neither of those were going to be much use for anyone wanting to use this on Windows. This is also useful for more than just those people though, since it means you now don't have to choose between using a device to feed entropy to the kernel or reading raw bits from it directly, you can just timeshare it to do both simultaneously if you ever need that.

It also means you don't actually have to run it on Windows to use the entropy from it in Windows applications. And so the first round of porting this to Windows stopped without it ever actually being tested there, with the BitBabbler instead running on a small ARM board, with a minimal install of Debian on it, feeding entropy to Windows applications over a private network segment. But the architectural changes that were needed for that got done, and it was successfully building with the mingw-w64 toolchain from Debian Stretch.

We've now added a udev rule and a system group to the Debian package, so that normal users without elevated privilege (other than being placed in the bit-babbler group) can access the device directly. We got a lot of requests from people who wanted good random numbers for purposes other than feeding entropy to the kernel, so this will make things a bit more flexible and user-friendly for them as well.