Sep 3, 2019 - golang: finding a race condition in Docker

Comments

In OpenShift’s client utilities, we use some vendored Docker code to extract data from a container image. Several images could be extracted concurrently, and we were running into an issue where only on RHEL 8 clients, occassionally a user would see a panic:

panic: runtime error: slice bounds out of range

goroutine 163 [running]:
bufio.(*Reader).fill(0xc000f35c80)
        /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/bufio/bufio.go:89 +0x211

We didn’t know why we only saw it on RHEL 8 clients, and why it only happened sometimes. I wanted a better traceback than the original bug report gave us, and maybe a coredump so I could poke around in gdb. To be honest, I didn’t really know what I’d be looking for in gdb. I’d only ever used it with C, and even in C, I’m generally a printf debuggerer.

However, I hadn’t been able to reproduce the problem myself, so I wanted to get as much information as I could.

We added export GOTRACEBACK=crash to our development scripts, and waited until someone saw it again. It wasn’t too much longer when we got a report of it again, and I was able to see a much longer stack trace that showed me all of the running goroutines, as well as getting a coredump.

It looked like code in go itself was reading past the end of it’s own buffer: what? Was there a bug in go? I started researching this some more, and I was still a bit lost, until I stumbled upon an entry in the longer stack trace that pointed me to Docker’s code using a pool of buffers.

Docker maintains a pool of *bufio.Reader to reduce memory usage. If these were being recycled, and some previous holder of it tried to write to it after giving it back, and someone else got a hold of it very quickly – this all sounded somewhat familiar, and reminded me of my Operating System’s class. Was this a race condition?

Identifying what kind of problem I was dealing with made things a lot easier. In retrospect, maybe I should’ve realized it was a race condition sooner, but now that I knew what it was, I wanted to know how people might uncover a race condition in golang.

Go includes tools for detecting these cases, by simply building or running your go code with the -race argument. After doing that, and running locally, my program exited successfully with no warnings about any kind of race condition. Theoretically, this tooling was supposed to identify the potential race even if it wasn’t causing a panic.

I even tried it on a RHEL 8 virtual machine, just like the reporters of the bugs were using. Nothing.

As a last resort, I asked a coworker if I could experiment in an environment that he seemed to encounter the problem once a day or so. I wrote a script that would run the command over and over again, hoping that it crashed. I used the binary that had been built with the -race flag.

Sure enough, on his system, go enthusiastically reported “WARNING: DATA RACE”, with a traceback telling me exactly where.

WARNING: DATA RACE
Write at 0x00c00115b320 by goroutine 94:
  bufio.(*Reader).reset()
      /usr/local/go/src/bufio/bufio.go:75 +0xe0
  bufio.(*Reader).Reset()
      /usr/local/go/src/bufio/bufio.go:71 +0xd1
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/pools.(*BufioReaderPool).Put()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/pools/pools.go:54 +0x5b
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/pools.(*BufioReaderPool).NewReadCloserWrapper.func1()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/pools/pools.go:93 +0x140
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/ioutils.(*ReadCloserWrapper).Close()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/ioutils/readers.go:20 +0x5e
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive.wrapReadCloser.func1()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive/archive.go:180 +0x80
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/ioutils.(*ReadCloserWrapper).Close()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/ioutils/readers.go:20 +0x5e
  github.com/openshift/oc/pkg/cli/image/extract.layerByEntry()
      /go/src/github.com/openshift/oc/pkg/cli/image/extract/extract.go:524 +0x975
  github.com/openshift/oc/pkg/cli/image/extract.(*Options).Run.func1.1.2()
      /go/src/github.com/openshift/oc/pkg/cli/image/extract/extract.go:455 +0xa0f
  github.com/openshift/oc/pkg/cli/image/extract.(*Options).Run.func1.1()
      /go/src/github.com/openshift/oc/pkg/cli/image/extract/extract.go:467 +0x31f8
  github.com/openshift/oc/pkg/cli/image/workqueue.(*worker).Try.func1()
      /go/src/github.com/openshift/oc/pkg/cli/image/workqueue/workqueue.go:137 +0x6d
  github.com/openshift/oc/pkg/cli/image/workqueue.(*workQueue).run.func1()
      /go/src/github.com/openshift/oc/pkg/cli/image/workqueue/workqueue.go:51 +0x35d

Previous read at 0x00c00115b320 by goroutine 8:
  bufio.(*Reader).writeBuf()
      /usr/local/go/src/bufio/bufio.go:525 +0xc7
  bufio.(*Reader).WriteTo()
      /usr/local/go/src/bufio/bufio.go:506 +0x5e1
  io.copyBuffer()
      /usr/local/go/src/io/io.go:384 +0x13c
  io.Copy()
      /usr/local/go/src/io/io.go:364 +0x10a
  os/exec.(*Cmd).stdin.func1()
      /usr/local/go/src/os/exec/exec.go:243 +0xfa
  os/exec.(*Cmd).Start.func1()
      /usr/local/go/src/os/exec/exec.go:409 +0x3d

Goroutine 94 (running) created at:
  github.com/openshift/oc/pkg/cli/image/workqueue.(*workQueue).run()
      /go/src/github.com/openshift/oc/pkg/cli/image/workqueue/workqueue.go:43 +0xd8

Goroutine 8 (running) created at:
  os/exec.(*Cmd).Start()
      /usr/local/go/src/os/exec/exec.go:408 +0x16c2
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive.cmdStream()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive/archive.go:1224 +0x243
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive.gzDecompress()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive/archive.go:174 +0x52e
  github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive.DecompressStream()
      /go/src/github.com/openshift/oc/vendor/github.com/docker/docker/pkg/archive/archive.go:207 +0x806
  github.com/openshift/oc/pkg/cli/image/extract.layerByEntry()
      /go/src/github.com/openshift/oc/pkg/cli/image/extract/extract.go:486 +0xa1
  github.com/openshift/oc/pkg/cli/image/extract.(*Options).Run.func1.1.2()
      /go/src/github.com/openshift/oc/pkg/cli/image/extract/extract.go:455 +0xa0f
  github.com/openshift/oc/pkg/cli/image/extract.(*Options).Run.func1.1()
      /go/src/github.com/openshift/oc/pkg/cli/image/extract/extract.go:467 +0x31f8
  github.com/openshift/oc/pkg/cli/image/workqueue.(*worker).Try.func1()
      /go/src/github.com/openshift/oc/pkg/cli/image/workqueue/workqueue.go:137 +0x6d
  github.com/openshift/oc/pkg/cli/image/workqueue.(*workQueue).run.func1()
      /go/src/github.com/openshift/oc/pkg/cli/image/workqueue/workqueue.go:51 +0x35d
==================

Ok: why did his system do it and not mine? After examining the traceback, I noticed that this was happening in the code that Docker uses to decompress a stream of compressed data. And in that code for gzipped files, it can use the native Golang gzip library, or shell out to unpigz which is a super fast, parallel utility. unpgiz was not present on any of my test systems; however it was there on his. Installing the package on my server instantly reproduced the problem.

What was different? The code running unpigz was using one of those shared buffers I mentioned earlier. There was a case where the context for a command was cancelled, and the buffer was returned to the pool. However, with contexts and CommandContext in Go, merely cancelling the context does not guarantee the command is fully done. You also need to wait for cmd.Wait() to finish before returning any buffers to the pool. Writing a fix that ensured that happened resolved our problem.

May 14, 2019 - Replacing Nest

Comments

The big announcement at Google I/O ‘19 is that Google is discontinuing the Works with Nest API program. Nest is the only no-local-control ecosystem I have in my house, and it was so very clearly a mistake in hindsight. In order for an end user to integrate Nest with a third party product like Home Assistant, one needs a developer account. Right off the bat, Nest assumed that the only people who would want to do such a thing is other cloud-based services like Hue, or Amazon who would interact with Nest on behalf of a user, rather than a user themselves.

I currently have a developer account, and I use it with Home Assistant to do a number of automations tied to my Nest devices. There’s a lot of customized control over when the HVAC goes into away (eco) or home mode based on the arrival and departure - or expected arrival - of my home’s residents. More interestingly, a number of automations around Nest Protect enhance the safety of my home by unlocking the doors if people are home, turning on lights, and disabling the HVAC system in the event of an incident.

None of these would be possible if Nest shuts down the Works with Nest program, other than whatever specific partner integrations they allow. This is all in the name of “privacy,” which in Google’s mind is that your private information is only known by you and Google. Why, then, have they not assauged the hobbyist communitys concerns by announcing a local API option?

The protocols that Nest devices use are open source-ish, called Weave and Thread. Certainly they could open them up to allow me to control my own devices. Anyway, even if Google eventually figures it out and does the right not-evil thing, I figured I should look at alternatives.

For now, I’ve already replaced my thermostat with a Venstar ColorTouch T7850. It features a local API, and a home assistant component.

It’s not as pretty as a Nest’s round, sleek appearance but it’s works well and has full local control. The Nest Protects’ most likely replacement is a Z-wave device from First Alert, but for now I’ve kept them. I do like their voice announcements of the problem - in Spanish and English - and the pathlight feature when one walks down the hallway at night.

Apr 29, 2019 - UEFI HTTP Boot with Libvirt

Comments

In UEFI 2.5, HTTP boot was introduced. This feature allows a UEFI host to network boot using HTTP instead of TFTP. If you enroll a trusted certificate authority on the server, then you can boot securely using HTTPS. This is a vast improvement over older mechanisms that make use of insecure protocols like TFTP.

Lukáš from the Foreman project proposed an RFC to enable this functionality in Foreman. Much of this is now implemented: Foreman has an HTTPBoot Smart Proxy module that serves the TFTP boot directory via HTTP, and makes Foreman aware of various DHCP settings. There are still some issues to be resolved before this is ready for users to use.

This blog post is mostly my notes from us researching how HTTP boot works, how grub2 supports HTTP boot, and how to test with libvirt. We used the edk2 firmware for QEMU/KVM, although much of these notes are generally applicable to hardware as well - we’ve tested on at least one real world baremetal server and was able to provision end-to-end using HTTPS.

Configure libvirt

Out of the box, QEMU will use BIOS. However, you can install the Tianocore firmware to get UEFI. This package is called edk2-ovmf on Fedora.

If you are on CentOS 7 or want to use the latest nightlies, you can get them from this fedora documentation. Last I checked, they don’t have TLS support compiled, which means you can’t enroll a TLS certificate to make HTTPS boot work. The Fedora firmware does support this.

After installing the firmware package on CentOS, you’ll also need to configure the nvram setting in /etc/libvirt/qemu.conf. Newer Fedoras are already aware of and will look in this path for firmwares:

nvram = [
  "/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd:/usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd"
]

Once you do that, you can create an UEFI VM, by selecting a UEFI x86_64 firmware:

UEFI Firmware Selection in Virt-Manager

Configure DHCP

Your DHCP configuration must be aware of HTTP clients in order to set the filename to an URL. The relevant snippet from my own DHCP config is below. It’s important to set the vendor-class-identifier as HTTPClient, otherwise your host will not use the filename as an HTTP URL.

option arch code 93 = unsigned integer 16; # RFC4578

# This is for UEFI HTTP:
class "httpclients" {
  match if substring (option vendor-class-identifier, 0, 10) = "HTTPClient";
  log(info, "HTTP UEFI Client Detected");
  option vendor-class-identifier "HTTPClient";

  if option arch = 00:0F {
    filename "http://example.com/bootia32.efi";
  } else if option arch = 00:10 {
    filename "http://example.com/bootx64.efi";
  }
}

Boot loader

I’ve tested HTTP boot with both iPXE and grub2. If iPXE supports your network card, you might consider using it. It supports UEFI HTTP Boot well.

If you want to use grub2, hang on to your hat - there’s a number of bugs in any of the latest shipped versions, including in Fedora 30. Fixes that enable using relative paths will ship in Fedora 31. The bug for that is here.

If your grub2 configuration uses fully qualified paths in all places, you won’t need this patch, but you won’t be able to use your grub2 configuration for both legacy TFTP and HTTP clients.

Enrolling a CA Certificate

For libvirt, I created a VFAT image, and stored a copy of my CA certificate there:

$ dd if=/dev/zero of=/tmp/ca.img bs=1440K count=1
1+0 records in
1+0 records out
1474560 bytes (1.5 MB, 1.4 MiB) copied, 0.000856816 s, 1.7 GB/s
$ mkfs.vfat /tmp/ca.img
mkfs.fat 4.1 (2017-01-24)
$ sudo mount -o loop /tmp/ca.img /tmp/mnt
$ sudo cp /tmp/ca/ca.crt /tmp/mnt
$ sudo umount /tmp/mnt
$ sudo cp /tmp/ca.img /var/lib/libvirt/images

I then attached it to libvirt:

Floppy

and enrolled the certificate in the Device Manager menu:

Enrollment

Assuming your DHCP configuration is setup correctly, then you can select HTTP boot from the Boot Manager, or reboot the host, and you will boot via HTTPS.