Sep 25, 2020 - Slack is Toxic


Do other people have conversations with their managers about how to have healthier relationships with tools like Slack, or is it just me? It can’t just be me. I’ve been thinking for a while about what frustrates me most about Slack and similar systems, and after more than a year of full-time use, I have some thoughts.

First, you are locked in to their client. Why can’t I use Pidgin or irssi, without jumping through hoops – and even then half the features don’t work.

There’s the oft-complained about @here and @channel, for which you can only turn off notifications per-channel. Usually people forget, and in large public channels you see a flurry of departures after someone uses one of those wide-distribution aliases for the first time.

Naked pings or just saying hello are disruptive as well. Then, there’s behavior like looping others into threads that are hundreds of messages long, with a simple “cc” or “FYI”. I’m calling those “naked FYI’s.”

But those are just individual annoyances: there’s something more sinister about the premise of the entire product – maybe intentional or not – that is harmful to individuals. One of the design decisions of Slack is that your user is always just there – waiting to get a message, or magicked into a new channel with a simple ping. You can never just not be there.

Did you ever use AOL Instant Messenger? Offline messages were not possible for many years. You’d long to hear the sound of a favorite person signed on and was open to get messages. Not with Slack.

Have a co-worker in a different time zone? You can send a message at 3:00 a.m.. She can have her status as away, or do not disturb, but her digital presence is still there. Slack helpfully sends her an e-mail with just a snippet of the conversation – just enough to get their attention. Maybe she sees it on her phone while eating breakfast, and Slack helpfully refers her to install their app.

Yes, the mobile access is convenient. I can take off to a doctor’s appointment and still talk to a coworker about an on-going issue. That’s not the only time I use it, though. I open it before bed. I have a look at dinner. In this day and age, we are used to instant gratification, we like to get it ourselves so we want to give it back, too.

Not to mention the fear of missing out. You’re on holiday? What important discussions are you missing on Slack while you’re sitting on the beach? Well, thankfully you can just pop in for a second to check it out since you have the mobile app! If you forget to check the app on vacation, the notification bubble will be there when you get back, as well as all the Slack e-mails urgently telling you all the stuff you missed.

People have a responsibility to use tools appropriately, and manage their work-life balance, but at a certain point the evidence is so overwhelming the system is designed to be used irresponsibly that you just need to reconsider it’s use.

People complain about IRC – clunky CLI clients and it’s ephemeral nature, but the latter is a part of why it’s such a right-sized tool. It’s temporary nature drives longer form discussions to different places. Fear of missing out is diminished when there’s not an expectation to read back history. The lack of threads encourages summaries when reaching out to include someone else in a discussion.

The problem, of course, is that Slack is here to stay. What do you all do to have a healthier relationship with it?

Mar 5, 2020 - Tarbombs considered harmful

So, one day you hear about this great new open source project, and visit the company’s web site and download the latest version of their software tofu-wonder.tar.gz, and extract it in your home directory:

$ tar xvf tofu-wonder.tar.gz

You just got tarbombed. In older versions of tar, tarballs could even contain absolute paths and potentially overwrite existing files on your file system. These days, most versions of tar prevent this unless explicitly allowed, so the worst that happens is a particular tar archive litters it’s files in whatever unfortunate directory you were in when you extracted it. Have fun cleaning that up.

Ok - so how to avoid it? I now include this line in my .zshrc:

export TAR_OPTIONS="--one-top-level"

This option extracts all files into a directory named by the basename. In the example above, it’d now look like this:

$ tar xvf tofu-wonder.tar.gz

Perfect! But, it’s better not to make users do this. The first way to prevent this is to include the top-level directory when you’re creating a tarball:

tar czvf tofu-wonder.tar.gz tofu-wonder/

Another option is to use transform and replace . with something else:

tar czvf tofu-wonder.tar.gz --transform "s?^\.?tofu-wonder-0.1.1?"  .

Sep 3, 2019 - golang: finding a race condition in Docker


In OpenShift’s client utilities, we use some vendored Docker code to extract data from a container image. Several images could be extracted concurrently, and we were running into an issue where only on RHEL 8 clients, occassionally a user would see a panic:

panic: runtime error: slice bounds out of range

goroutine 163 [running]:
        /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/bufio/bufio.go:89 +0x211

We didn’t know why we only saw it on RHEL 8 clients, and why it only happened sometimes. I wanted a better traceback than the original bug report gave us, and maybe a coredump so I could poke around in gdb. To be honest, I didn’t really know what I’d be looking for in gdb. I’d only ever used it with C, and even in C, I’m generally a printf debuggerer.

However, I hadn’t been able to reproduce the problem myself, so I wanted to get as much information as I could.

We added export GOTRACEBACK=crash to our development scripts, and waited until someone saw it again. It wasn’t too much longer when we got a report of it again, and I was able to see a much longer stack trace that showed me all of the running goroutines, as well as getting a coredump.

It looked like code in go itself was reading past the end of it’s own buffer: what? Was there a bug in go? I started researching this some more, and I was still a bit lost, until I stumbled upon an entry in the longer stack trace that pointed me to Docker’s code using a pool of buffers.

Docker maintains a pool of *bufio.Reader to reduce memory usage. If these were being recycled, and some previous holder of it tried to write to it after giving it back, and someone else got a hold of it very quickly – this all sounded somewhat familiar, and reminded me of my Operating System’s class. Was this a race condition?

Identifying what kind of problem I was dealing with made things a lot easier. In retrospect, maybe I should’ve realized it was a race condition sooner, but now that I knew what it was, I wanted to know how people might uncover a race condition in golang.

Go includes tools for detecting these cases, by simply building or running your go code with the -race argument. After doing that, and running locally, my program exited successfully with no warnings about any kind of race condition. Theoretically, this tooling was supposed to identify the potential race even if it wasn’t causing a panic.

I even tried it on a RHEL 8 virtual machine, just like the reporters of the bugs were using. Nothing.

As a last resort, I asked a coworker if I could experiment in an environment that he seemed to encounter the problem once a day or so. I wrote a script that would run the command over and over again, hoping that it crashed. I used the binary that had been built with the -race flag.

Sure enough, on his system, go enthusiastically reported “WARNING: DATA RACE”, with a traceback telling me exactly where.

Write at 0x00c00115b320 by goroutine 94:
      /usr/local/go/src/bufio/bufio.go:75 +0xe0
      /usr/local/go/src/bufio/bufio.go:71 +0xd1*BufioReaderPool).Put()
      /go/src/ +0x5b*BufioReaderPool).NewReadCloserWrapper.func1()
      /go/src/ +0x140*ReadCloserWrapper).Close()
      /go/src/ +0x5e
      /go/src/ +0x80*ReadCloserWrapper).Close()
      /go/src/ +0x5e
      /go/src/ +0x975*Options).Run.func1.1.2()
      /go/src/ +0xa0f*Options).Run.func1.1()
      /go/src/ +0x31f8*worker).Try.func1()
      /go/src/ +0x6d*workQueue).run.func1()
      /go/src/ +0x35d

Previous read at 0x00c00115b320 by goroutine 8:
      /usr/local/go/src/bufio/bufio.go:525 +0xc7
      /usr/local/go/src/bufio/bufio.go:506 +0x5e1
      /usr/local/go/src/io/io.go:384 +0x13c
      /usr/local/go/src/io/io.go:364 +0x10a
      /usr/local/go/src/os/exec/exec.go:243 +0xfa
      /usr/local/go/src/os/exec/exec.go:409 +0x3d

Goroutine 94 (running) created at:*workQueue).run()
      /go/src/ +0xd8

Goroutine 8 (running) created at:
      /usr/local/go/src/os/exec/exec.go:408 +0x16c2
      /go/src/ +0x243
      /go/src/ +0x52e
      /go/src/ +0x806
      /go/src/ +0xa1*Options).Run.func1.1.2()
      /go/src/ +0xa0f*Options).Run.func1.1()
      /go/src/ +0x31f8*worker).Try.func1()
      /go/src/ +0x6d*workQueue).run.func1()
      /go/src/ +0x35d

Ok: why did his system do it and not mine? After examining the traceback, I noticed that this was happening in the code that Docker uses to decompress a stream of compressed data. And in that code for gzipped files, it can use the native Golang gzip library, or shell out to unpigz which is a super fast, parallel utility. unpgiz was not present on any of my test systems; however it was there on his. Installing the package on my server instantly reproduced the problem.

What was different? The code running unpigz was using one of those shared buffers I mentioned earlier. There was a case where the context for a command was cancelled, and the buffer was returned to the pool. However, with contexts and CommandContext in Go, merely cancelling the context does not guarantee the command is fully done. You also need to wait for cmd.Wait() to finish before returning any buffers to the pool. Writing a fix that ensured that happened resolved our problem.