Quick links:

LINBIT Blogs: Early write submit for more performance

LINBIT Blogs: DRBDmanage installation is now easier!

Recent Content on Xaprb: Respectful Introductions and Recommendations

Recent Content on Xaprb: Amber Alert: Worse Than Nothing?

Recent Content on Xaprb: Bloom Filters Made Easy

LINBIT Blogs: DRBDManage release 0.10

Recent Content on Xaprb: MySQL, SQL, NoSQL, Open Source And Beyond: a Google Tech Talk

Recent Content on Xaprb: A simple rule for sane timestamps in MySQL

Recent Content on Xaprb: Generating Realistic Time Series Data

Recent Content on Xaprb: Speaking at Percona Live

Recent Content on Xaprb: On Crossfit and Safety

Recent Content on Xaprb: How to Tune A Guitar (Or Any Instrument)

Recent Content on Xaprb: A review of Bose, Sony, and Sennheiser noise-cancelling headphones

Recent Content on Xaprb: Xaprb now uses Hugo

Recent Content on Xaprb: Immutability, MVCC, and garbage collection

Recent Content on Xaprb: Early-access books: a double-edged sword

Recent Content on Xaprb: Napkin math: How much waste does Celestial Seasonings save?

Recent Content on Xaprb: Secure your accounts and devices

Recent Content on Xaprb: How is the MariaDB Knowledge Base licensed?

Recent Content on Xaprb: S**t sales engineers say

Recent Content on Xaprb: Props to the MySQL Community Team

Recent Content on Xaprb: EXPLAIN UPDATE in MySQL 5.6

LINBIT Blogs: DRBD-Manager

Recent Content on Xaprb: Freeing some Velocity videos

LINBIT Blogs: DRBD and the sync rate controller, part 2

Recent Content on Xaprb: Looking for a freelancer

Recent Content on Xaprb: Get out of your comfort zone

LINBIT Blogs: DRBD Proxy 3.1: Performance improvements

LINBIT Blogs: “umount is too slow”

Arrfab's Blog » Cluster: Rolling updates with Ansible and Apache reverse proxies

Early write submit for more performance

Posted in LINBIT Blogs by flip at April 01, 2014 08:07 AM

In the ongoing 8.4. efforts we’re currently testing the effects of using early write submits – both to local storage and to the peer node(s).

Ie., when DRBD can guess in advance that write requests will soon be sent, it can prematurely send the data pages to the other node; so if the application then does write to storage, all that is needed is a small “do it” packet. The smaller packet size can be transmitted over the (much faster) meta-data DRBD connection, and so reduces latency by a fair amount.

See this performance improvements in a trial run:

Early-write performance improvements

Early-write performance improvements

For configuration there’s a new item in the disk section: early-write, using a time value in the usual tenths-of-a-second unit. Eg. 10 will cause DRBD to send the data one second before the application tries to write it.

You can expect that feature in the next proprietary 8.4.5 release of DRBD, so stay tuned!

DRBDmanage installation is now easier!

Posted in LINBIT Blogs by flip at March 21, 2014 05:03 PM

In the last blog post about DRBDmanage we mentioned

Initial setup is a bit involved (see the README)

… with the new release, this is no longer true!

All that’s needed is now one command to initialize a new DRBDmanage control volume:

nodeA# drbdmanage init «local-ip-address»

You are going to initalize a new drbdmanage cluster.
CAUTION! Note that:
  * Any previous drbdmanage cluster information may be removed
  * Any remaining resources managed by a previous drbdmanage
    installation that still exist on this system will no longer
    be managed by drbdmanage



Acknowledging that question will (still) print a fair bit of data, ie. the output of the commands that are run in the background; if everything works, you’ll get a freshly initialized DRBDmanage control volume, with the current node already registered.

Well, a single node is boring … let’s add further nodes!

nodeA# drbdmanage new-node «nodeB» «its-ip-address»

Join command for node nodeB:
  drbdmanage join some arguments ....

Now you copy and paste the one command line on the new node:

nodeB# drbdmanage join «arguments as above....»
You are going to join an existing drbdmanage cluster.
CAUTION! Note that:

Another yes and enter – and you’re done! Every further node is just one command on the existing cluster, which will give you the command line to use on the to-be-added node.

So, another major point is fixed … there are a few more things to be done, of course, but that was a big step (in the right direction) ;)

Respectful Introductions and Recommendations

Posted in Recent Content on Xaprb at February 23, 2014 12:00 AM

In the last few years of my career, I’ve increasingly been involved in meeting people. This often involves requests or offers for recommendations, introductions, and so forth.

I’ve learned to be very careful about making or accepting such offers or requests, and I’d like to share my current thoughts about that with you, because a lot of trouble can come of a seemingly innocent request or offer.


The Stakes Are High

“Martha, that’s so great that you are starting a business in the diabetes care industry! You should really meet my friend Jack. He could be extremely helpful to you, and I am sure he would appreciate knowing about your startup.”

Does that sound so dangerous? Believe me, it is. A lot is on the line for you, Martha, and Jack. Consider:

  • How well do you know Martha? What if you’re wrong about her business? If you introduce her to Jack, things might go wrong. They could turn out to be competitors. Or perhaps Martha is doing something that Jack thinks is unethical or is the wrong way to treat diabetes.
  • How well do you know Jack? Is he really going to be helpful to Martha? Does he want to be? Does he have time to spend with her?

Factors such as these mean that you are putting your relationship with Jack at risk. You may wear out your welcome and Jack may not want to be your friend anymore.

You are also risking your relationship with Martha. It sounds like this is a new relationship, so you’re on tenuous ground here; the risk of a misstep is high. If it’s more established, then you may be at lower risk, but the damage to you is potentially higher.

You are, finally, putting Martha and Jack at risk in various ways. You’re imposing on Jack’s time and potentially risking his reputation, and ditto for Martha.

A seemingly innocent offer to help is actually writing a metaphorical check against the Relationship Banking Account you have with Martha and Jack, and asking them to extend credit to each other as well. Since they don’t have a Relationship Bank Account with each other, the credit they’ll extend to each other comes out of your accounts with them. If it’s not a productive match, then that withdrawal becomes a bounced check.

In my experiences over the last year or so, in particular, I’ve been introduced to a lot of people with management experience, a desire to find companies to invest in, and so on. Most of the time, courtesy obliges me to have a phone call or meet them for a meal or similar. Very few of these meetings have been truly helpful to either of us. I’m not saying anyone wasted my time, but I’ve ended up talking to dozens of people and the only outcomes have been pleasant conversations.

Offering Versus Asking

The above scenario is when you offer a favor. You can see how much is at risk in that scenario. Consider how much more is at risk when you ask for a favor, such as an introduction. You are asking your friend to broker a meeting with someone from whom you presumably also want to ask some favor. “Do me a favor, would you, and ask your friend to do you a favor so I can ask him to do me a favor?” It should be obvious how much credit you’re asking everyone to extend everyone else. You need to treat this with great care, because again if things don’t work out, you are not the only one who can get hurt.

Likewise, if you’re the one who’s being asked for an introduction, consider the consequences. If I said yes every time someone asked me to introduce them to a friend of mine, soon I wouldn’t have any friends. I need to know all of the people well and be pretty certain that there will be a positive outcome for all of them and myself.

This often means that I might need to check with my friend and see if they’d like to meet. Or, instead of putting them in touch with each other directly, I might just send the asker’s contact information on with a brief note. Or if I don’t have enough relationship built up with the person who’s asking the favor, I might need to explain that to them. This is certainly a more positive way to build my relationship with them in the future.

In general, you need to have earned the favor with the person from whom you’re asking it. Otherwise you risk ending up with a negative balance in your Relationship Bank Account.


Now I’m going to share a few examples of when things have gone wrong, as sort of case studies. These are all real stories, and names have been slightly abbreviated although sometimes the parties don’t deserve protection.

The Pushy Investor

It is true that “ask and you shall receive.” If you don’t ask, nothing happens. But if all you do is ask, you end up persona non grata pretty fast.

In 2013 Kyle (my co-founder) and I were fundraising from venture capitalists. A particular investor, Max G, listened to our pitch and said he’d like to follow up on it by calling our advisors, one of whom is at a large company. The next thing we heard was this email from the advisor:

“Phone call just ended. I think he was more interested in figuring out what technologies my employer is interested in which wasn’t the purpose for the call.”

That never went any further as far as fundraising/investment went, but not too long after that I got an email from Max G:

Subject: Quick Ask


One of our fastest growing software companies is looking for a VP for training/educational services. Who are the rock stars in the open source world that you can recommend? Are you at liberty to tell me who the best from […] is? I can keep my source confidential.

Translation: he’s asking me to help him poach from someone I respect highly. Unfortunately, I did not think of it that way at the time. My first thought was that I could help a friend get a dream job, which is something I have done several times in the past. So I responded. Not in a way that I feel badly about now, but still, I wouldn’t do it again. (How much would you bet that he wrote an email that started out, “Baron said that you and I should talk…” ?)

Radio silence for a while.

Then, another email from Max G:

Subject: Company you should look at

[investor] and [name-drop] recently backed a high-profile entrepreneur who most recently […]

It is super early (pre-product), but we’re eager to connect with some of the best people in demand gen to run the concept by. […]’s name came up at […]. Could you run this by […] to see if he’s open for an intro? Would love to make the connection with him.

I never replied. Max G already had a negative account balance with me and I never should have let it get that far.

Moral of this story is to be careful when someone asks you for a favor, lest you end up granting it and then wishing you hadn’t, as I did.

The Veiled Sales Request

I got a connection request on LinkedIn from someone named Paul S, whose request said:

We are a startup having developed a reactive programming based backend service [database-specific details followed]. We are interested in your feedback on this and some new products we are working on. Can we please set up a time on Webex for next week.

I have often enjoyed giving feedback on products and their marketing position, both to companies developing the products, friends trying to figure out what products they need, investors doing diligence, etc. This sounded almost like that, but something made an alarm bell go off. I responded:

Can you be more specific what kind of feedback you’re looking for? My expertise is fairly narrow.

The reply from Paul S:

Our target market is database developer and architects and from your profile it looks like you have lots of experience there. We are keen to get your feedback on the usability of our product for these audience and the language we ought to use to reach them. Also we are working on a new product and keen to get your feedback on that.

Hope that helps and you can spare 30-45 minutes and see what we can show you and give us some feedback.

The guy didn’t even have the decency to comply with my request and be honest that he was going to give me a sales pitch. I disconnected from him on LinkedIn.

The Shotgun Blast

Someone named Nick emailed me this through LinkedIn:

Could we please talk tomorrow, We started […] in EMEA and I would appreciate a call?

The thing is, the company he mentioned is someone I have a long and good relationship with. So even though this request was out of the blue, there was no context or reason why I’d want to comply, I had no idea what he wanted, etc I was reluctant to just mark it as spam. If I ignored him, I would potentially be offering some offense or lack of grace to that company and my relationships there, however slight, even if nobody could really blame me for declining.

I opted to write back:

Happy to talk, but I’d like to know the topic first.

Nick responded that he was just looking to get to know me, and later in his message mentioned that he is looking around for a job in management and sales. I replied that we are too early for that. But I’d never hire someone with such poor communications skills for a job like that anyway.

The Request To Promote To My Friends

We use a company called Intercom to embed a contact widget in our app. I haven’t really been all that thrilled with the product; it’s not bad, but it’s not amazing. This is relevant, because while I was asking them to fix bugs for me, they implemented a new feature instead:


Intercom definitely had not earned my recommendation. What was more irritating, and continues to be, is that the feature itself is buggy. I had to email them and ask how to disable it, because it kept showing on every page load. After following their instructions, it continues to reappear every now and then.

They are being very foolish by continuing to show me this request. It’s pushed me more towards the point of leaving them and going to a competitor.

The moral of this story is that you had better earn the recommendation you ask for.

Requests For My Address Book

Once upon a time, I was looking to add my existing network to my LinkedIn account. I thought, sure, I’ll give them my Gmail login, why not? I’ll just revoke it and change my password right afterwards. No harm there, right?

Then I learned what they did with that: LinkedIn is “breaking into” user emails, spamming contacts – lawsuit. And even after I deleted all of the imported contacts from my account, I know they still have them, because they continue to show me “connection recommendations” for people who don’t have LinkedIn accounts and whose names and emails are distinctively the same as the information from my address book.

I will never again do something like this. I will never again install a LinkedIn app on my phone (because it requires access to my contact list). I do not trust LinkedIn. It is a vital tool for my professional work and career, but still, I’ve come close to deleting my account out of disgust and anger.

I similarly distrust many other services that have anything to gain from getting access to my contact list, and that keeps me from installing a lot of their apps on my phone.

Moral of the story: yuck.

The Favor Followed By An Awkward Moment

I’ll end with a story where I was in the wrong.

While Kyle and I were fundraising I asked someone who had been very helpful to me in the past for introductions to a couple of top-tier investors. This person knows my work very well and has great stature in the industry. He could make a strong representation that we were interesting to investors. His introductions were very helpful.

Later I hired one of his employees – someone who lived practically next door to me, whose wife had a close relationship with my wife. The last thing I wanted to do was burn any bridges for him; what if he got a better counter-offer, or decided at the last minute not to join? So although I normally would have said something to his boss at some safe point, I opted to stay completely out of it.

I then got an email from his boss, to the effect of, “I don’t mind that you hired my employee. But after all I’ve done to help you, I think I deserved the courtesy to hear about it from you first.”

He was right, and I learned a valuable lesson from that. This was a delicate situation, but I should have gotten advice on it. In hindsight I wish I’d waited for the all-clear from the new employee and given this person a call as soon as the new employee said it was OK.

Moral? Well… I’m still sad about this one. The moral is sometimes you can’t put Humpty Dumpty back together again.

Photo Credits

Amber Alert: Worse Than Nothing?

Posted in Recent Content on Xaprb at February 12, 2014 12:00 AM

In the last few years, there’s been a lot of discussion about alerts in the circles I move in. There’s general agreement that a lot of tools don’t provide good alerting mechanisms, including problems such as unclear alerts, alerts that can’t be acted upon, and alerts that lack context.

Yesterday and today at the Strata conference, my phone and lots of phones around me started blaring klaxon sounds. When I looked at my phone, I saw something like this (the screenshot is from a later update, but otherwise similar):


I’ve seen alerts like this before, but they were alerts about severe weather events, such as tornado watches. This one, frankly, looked like someone hacked into the Verizon network and sent out spam alarms. Seriously — what the hell, a license plate? What?

Besides, it says AMBER, which is a cautionary color. It’s not a red alert, after all. It can’t be anything serious, right?

The second time it happened I looked at the details:


This is even less informative. It’s an amber alert (not an urgent color like red). But it’s a sigificant threat to my life or property? I’m supposed to respond to it immediately? Oh wait, my response is to “monitor” and “attend to information sources.” Almost everything on this whole screen is conflicting. What a cluster-fudge of useless non-information!

Later I looked up some information online and found that an amber alert is a child abduction alert. This one turned out to be a false alarm.

All of this raises an obvious question: why on earth would someone think that making a bunch of people’s cellphones quack with a cryptic message would convey useful information? For something as critical as a child abduction, they should get to the point and state it directly. Judging by reactions around me, and people I spoke to, almost nobody knows what an amber alert is. I certainly didn’t. When I tweeted about it, only one person in my network seemed to be aware of it.

How can anyone take something like this seriously? All this does is make people like me find the preferences for alerts and disable them.

In my opinion, this is an example of complete failure in alert design. I don’t think I can overstate how badly done this is. I want to say only a politician could have dreamed up something so stupid…

But then I remember: oh, yeah. Pingdom alerts (we’ll email you that your site is down, but we won’t tell you an HTTP status code or anything else remotely useful.) Nagios alerts (we’ll tell you DISK CRITICAL and follow that with (44% inode=97%) – anyone know what that means?). And so on.

Perhaps the amber alert system was designed by a system administrator, not a politician.

Bloom Filters Made Easy

Posted in Recent Content on Xaprb at February 11, 2014 12:00 AM

I mentioned Bloom Filters in my talk today at Strata. Afterwards, someone told me it was the first time he’d heard of Bloom Filters, so I thought I’d write a little explanation of what they are, what they do, and how they work.

But then I found that Jason Davies already wrote a great article about it. Play with his live demo. I was able to get a false positive through luck in a few keystrokes: add alice, bob, and carol to the filter, then test the filter for candiceaklda.

Why would you use a Bloom filter instead of, say…

  • Searching the data for the value? Searching the data directly is too slow, especially if there’s a lot of data.
  • An index? Indexes are more efficient than searching the whole dataset, but still too costly. Indexes are designed to minimize the number of times some data needs to be fetched into memory, but in high-performance applications, especially over huge datasets, that’s still bad. It typically represents random-access to disk, which is catastrophically slow and doesn’t scale.

DRBDManage release 0.10

Posted in LINBIT Blogs by flip at February 06, 2014 03:11 PM

As already announced in another blog post, we’re preparing a new tool to simplify DRBD administration. Now we’re publishing its first release! Prior to DRBD Manage, in order to deploy a DRBD resource you’d have to create a config file and copy it to all the necessary nodes.  As The Internet says “ain’t nobody got time for that”.  Using DRBD Manage, all you need to do is execute the following command:

drbdmanage new-volume vol0 4 --deploy 3

Here is what happens on the back-end:

  • It chooses three nodes from the available set1;
  • drbdmanage creates a 4GiB LV on all these nodes;
  • generates DRBD configuration files;
  • writes the DRBD meta-data into the LV;
  • starts the initial sync, and
  • makes the volume on a node Primary so that it can be used right now.

This process takes only a few seconds.

Please note that there are some things to take into consideration:

  • drbdmanage is a lot to type; however, an alias dm=drbdmanage in your ~/.*shrc takes care of that ;)
  • Initial setup is a bit involved (see the README) 2
  • You’ll need at least DRBD 9.0.0-pre7.
  • Being that both DRBD Manage and DRBD9 are still under heavy development there are more than likely some undiscovered bugs.  Bug reports, ideas, wishes, or any other feedback, are welcome.

Anyway – head over to the DRBD-Manage homepage and fetch your source tarballs (a few packages are prepared, too), or a GIT checkout if you plan to keep up-to-date. For questions please use the drbd-user mailing list; patches, or other development-related topics are welcome on the drbd-dev mailing list.

What do you think? Drop us a note!

MySQL, SQL, NoSQL, Open Source And Beyond: a Google Tech Talk

Posted in Recent Content on Xaprb at February 05, 2014 12:00 AM

I’ve been invited to give a Tech Talk at Google next Thursday, February 13th, from 11:00 to 12:30 Pacific time. Unfortunately the talk won’t be streamed live, nor is it open to the general public, but it will be recorded and hosted on YouTube afterwards. I’ve also been told that a small number of individuals might be allowed to attend from outside Google. If you would like me to try to get a guest pass for you, please tweet that to @xaprb.

The topic is, roughly, databases. Officially,

MySQL, SQL, NoSQL, and Open Source in 2014 and Beyond

Predictions are hard to get right, especially when they involve the future. Rather than predict the future, I’ll explain how I view the relational and NoSQL database worlds today, especially the MySQL product and community, but including open-source and proprietary data management technologies about which I know enough to get in trouble. I’ll explain how my self-image as a practitioner and contributor has changed, especially as I’ve moved from consulting (where I tell people what they should do) into being a company founder (where I sometimes wish someone would tell me what to do). As for the future, I’ll express my preferences for specific outcomes, and try to be careful what I wish for.

I am excited and a bit nervous. A Google Tech Talk! Wow! Thanks for inviting me, Google!

A simple rule for sane timestamps in MySQL

Posted in Recent Content on Xaprb at January 30, 2014 12:00 AM

Do you store date or time values in MySQL?

Would you like to know how to avoid many possible types of pain, most of which you cannot even begin to imagine until you experience them in really fun ways?

Then this blog post is for you. Here is a complete set of rules for how you can avoid aforementioned pain:

  1. All date and time columns shall be INT UNSIGNED NOT NULL, and shall store a Unix timestamp in UTC.

Enjoy all the spare time you’ll have to do actually useful things as a result.

Generating Realistic Time Series Data

Posted in Recent Content on Xaprb at January 24, 2014 12:00 AM

I am interested in compiling a list of techniques to generate fake time-series data that looks and behaves realistically. The goal is to make a mock API for developers to work against, without needing bulky sets of real data, which are annoying to deal with, especially as things change and new types of data are needed.

To achieve this, I think several specific things need to be addressed:

  1. What common classes or categories of time-series data are there? For example,
    • cyclical (ex: traffic to a web server day-over-day)
    • apparently random (ex: stock ticker)
    • generally increasing (ex: stock ticker for an index)
    • exponentially decaying (ex: unix load average)
    • usually zero, with occasional nonzero values (ex: rainfall in a specific location)
  2. What parameters describe the data’s behavior? Examples might include an exponential decay, periodicity, distribution of values, distribution of intervals between peaks, etc.
  3. What techniques can be used to deterministically generate data that approximates a given category of time-series data, so that one can generate mock sources of data without storing real examples? For a simplistic example, you could seed a random number generator for determinism, and use something like y_n = rand() * 10 + 100 for data that fluctuates randomly between 90 and 100.

To make the mock API, I imagine we could catalog a set of metrics we want to be able to generate, with the following properties for each:

  • name
  • type
  • dimensions
  • parameters
  • random seed or other initializer

This reduces the problem from what we currently do (keeping entire data sets, which need to be replaced as our data gathering techniques evolve) into just a dictionary of metrics and their definitions.

Then the mock API would accept requests for a set of metrics, the time range desired, and the resolution desired. The metrics would be computed and returned.

To make this work correctly, the metrics need to be generated deterministically. That is, if I ask for metrics from 5am to 6am on a particular day, I should always get the same values for the metrics. And if I ask for a different time range, I’d get different values. What this means, in my opinion, is that there needs to be a closed-form function that produces the metric’s output for a given timestamp. (I think one-second resolution of data is fine enough for most purposes.)

Does anyone have suggestions for how to do this?

The result will be open-sourced, so everyone who’s interested in such a programmatically generated dataset can benefit from it.

Speaking at Percona Live

Posted in Recent Content on Xaprb at January 23, 2014 12:00 AM

I’m excited to be speaking at the Percona Live MySQL Conference again this year. I’ll present two sessions: Developing MySQL Applications with Go and Knowing the Unknowable: Per-Query Metrics. The first is a walk-through of everything I’ve learned over the last 18 months writing large-scale MySQL-backed applications with Google’s Go language. The second is about using statistical techniques to find out things you can’t even measure, such as how much CPU a query really causes MySQL to use. There are great reasons that this is both desirable to know, and impossible to do directly in the server itself.

I’m also looking forward to the conference overall. Take a few minutes and browse the selection of talks. As usual, it’s a fantastic program; the speakers are really the top experts from the MySQL world. The conference committee and Percona have done a great job again this year! See you in Santa Clara.

On Crossfit and Safety

Posted in Recent Content on Xaprb at January 20, 2014 12:00 AM

I’ve been a happy CrossFiter for a few years now. I met my co-founder and many friends in CrossFit Charlottesville, completely changed my level of fitness and many key indicators of health such as my hemoglobin A1C and vitamin D levels, am stronger than I’ve ever been, feel great, and now my wife does CrossFit too. It’s fantastic. It’s community, fun, health, fitness. It’s the antidote to the boring gyms I forced myself to go to for years and hated every minute.


But there is a fringe element in CrossFit, which unfortunately looks mainstream to some who don’t really have enough context to judge. From the outside, CrossFit can look almost cult-like. It’s easy to get an impression of people doing dangerous things with little caution or training. To hear people talk about it, everyone in CrossFit works out insanely until they vomit, pushing themselves until their muscles break down and vital organs go into failure modes.

That’s not what I’ve experienced. I’ve never seen anyone vomit, or even come close to it as far as I know. I think that part of this dichotomy comes from certain people trying to promote CrossFit as a really badass thing to do, so they not only focus on extreme stories, they even exaggerate stories to sound more extreme.

Last week there was a tragic accident: Denver CrossFit coach Kevin Ogar injured himself badly. This has raised the issue of CrossFit safety again.

To be clear, I think there is something about CrossFit that deserves to be looked at. It’s just not the mainstream elements, that’s all. The things I see about CrossFit, which I choose not to participate in personally, are:

  1. The hierarchy and structure above the local gyms. If you look at local gyms and local events, things look good. Everyone’s friends and nobody does stupid things. But when you get into competitions, people are automatically elevated into the realms of the extreme. This reaches its peak at the top levels of the competitions. Why? Because there’s something to gain besides just fitness. When someone has motivations (fame, endorsements and sponsorship, financial rewards) beyond just being healthy, bad things are going to happen. There’s talk now about cheating and performance-enhancing drugs and all kinds of “professional sports issues.” Those are clear signs that it’s not about fitness and health.
  2. Some inconsistencies in the underlying philosophy from the founders of CrossFit. I’m not sure how much this gets discussed, but a few of the core concepts (which I agree with, by the way) are that varied, functional movements are good. The problem is, the workout movements aren’t all functional. A few of them are rather hardcore and very technical movements chosen from various mixtures of disciplines.
  3. Untempered enthusiasm about, and ignorant promotion of, things such as the so-called Paleo Diet. I’m biased about this by being married to an archaeologist, but it isn’t the diet that is the issue. It’s the fanaticism that some people have about it, which can be off-putting to newcomers.

I’m perfectly fine when people disagree with me on these topics. Lots of people are really enthusiastic about lots of things. I choose to take what I like about CrossFit and leave the rest. I would point out, however, that the opinions of those who don’t really know CrossFit first-hand tend to be colored by the extremism that’s on display.

Now, there is one issue I think that’s really important to talk about, and that’s the safety of the movements. This comes back to point #2 in my list above. I’d especially like to pick out one movement that is done in a lot of CrossFit workouts.

The Snatch

If you’re not familiar with the snatch, it’s an Olympic weightlifting movement where the barbell is pulled from the floor as high as possible in one movement. The athlete then jumps under the barbell, catching it in a deep squat with arms overhead, and stands up to complete the movement with the bar high overhead. Here’s an elite Olympic lifter just after catching the bar at the bottom of the squat.


The snatch is extremely technical. It requires factors such as balance, timing, strength, and flexibility to come together flawlessly. Many of these factors are not just necessary in moderate quantities. For example, the flexibility required is beyond what most people are capable of without a lot of training. If you don’t have the mobility to pull off the snatch correctly, your form is compromised and it’s dangerous.

The snatch is how Kevin Ogar got hurt. Keep in mind this guy is a CrossFit coach himself. He’s not a novice.

I challenge anyone to defend the snatch as a functional movement. Tell me one time in your life when you needed to execute a snatch, and be serious about it. I can see the clean-and-jerk’s utility. But not the snatch. It’s ridiculous.

The snatch is also inherently very dangerous. You’re throwing a heavy weight over your head and getting under it, fast. You’re catching it in an extremely compromised position. And if you drop it, which is hard not to do, where’s it going to go? It’s going to fall on you. Here’s another Olympic athlete catching hundreds of pounds with his neck when a snatch went a little bit wrong. A split second later this picture looked much worse, but I don’t want to gross you out.


The next issue is that the snatch features prominently in many CrossFit workouts, especially competition workouts. This is not a small problem. Think about it: in competition, when these extreme athletes have raised the bar to such an extent that weeding out the best of the best requires multi-day performances few mortals could ever achieve, we’re throwing high-rep, heavy-weight snatches into the mix. What’s astonishing isn’t that Kevin Ogar got seriously injured. What’s amazing is that we don’t have people severing their spines on the snatch all the time.

What on earth is wrong with these people? What do they expect?

You might think this is an issue that’s only present in the competitions. But that’s not true. I generally refuse to do snatches in workouts at the gym. I will substitute them for other movements. Why? Take a look at one sample snatch workout:

AMRAP (as many rounds as possible) in 12 Minutes of:

  1. Snatch x 10
  2. Double Under x 50
  3. Box Jump x 10
  4. Sprint

That’s 12 minutes of highly challenging movements (to put it in perspective, most non-CrossFitters, and even many CrossFitters, would not be able to do the double-unders or box-jumps). You’re coming off a sprint and you’re going to throw off 10 snatches in a row, and you’re going to do it with perfect form? Unlikely. This is just asking for injury.

Or we could look at the “named WODs” that are benchmarks for CrossFitters everywhere. There’s Amanda, for example: 9, 7, and 5 reps of muscle-ups and snatches, as fast as possible. Or Isabel: 30 reps of 135-pound snatches, as fast as possible. To get a sense for how insane that actually is, take a look at Olympic weightlifting competitor Kendrick Farriss doing Isabel. The man is a beast and he struggles. And his form breaks down. I’m over-using italics. I’m sorry, I’ll cool down.

My point is that I think this extremely technical, very dangerous movement should have limited or no place in CrossFit workouts. I think it does very little but put people into a situation where they’re at very high risk of getting injured. I do not think it makes people more fit more effectively than alternative movements. I think one can get the same or better benefits from much safer movements.

Doing the snatch is an expert stunt. I personally think that I’ll never be good at snatches unless I do them twice a week, minimum. And one of the tenets of CrossFit is that there should be a large variety of constantly varied movements. This automatically rules out doing any one movement very often. In my own CrossFit workouts, going to the gym 2 or 3 times a week, I typically go weeks at a time without being trained on snatches in pre-workout skill work. That is nowhere near enough to develop real skill at it. (This is why I do my skill-work snatches with little more than an empty bar.)

There are other movements in CrossFit that I think are riskier than they need to be, but snatches are the quintessential example.

I know many people who are experts in these topics will disagree with me very strongly, and I don’t mind that. This is just my opinion.

Bad Coaches, Bad Vibes

There’s one more problem that contributes, I think, to needless risk in CrossFit gyms. This is the combination of inadequate coaching and a focus on “goal completion” to the exclusion of safety and absolutely perfect form, especially during workouts where you’re trying to finish a set amount of movements as fast as possible, or do as much as possible in a fixed time.

There’s no getting around the fact that CrossFit coaches aren’t all giving the same level of attention to their athletes, nor do all of them have the qualifications they need.

Anecdotally, I’ll tell the story of traveling in California, where I visited a gym and did one of my favorite workouts, Diane. In Diane, you deadlift 225 pounds 21 reps, do 21 handstand pushups, repeat both movements with 15 reps each, and finish with 9 reps each.

Deadlifting consists of grasping the bar on the ground and standing up straight, then lowering it again. It is not a dynamic or unstable movement. You do not move through any out-of-control ranges of motion. If you drop the bar you won’t drop it on yourself, it’ll just fall to the ground. Nevertheless, if done wrong, it can injure you badly, just like anything else.


The gym owner / coach didn’t coach. There’s no other way to say it. He set up a bar and said “ok, everyone look at me.” He then deadlifted and said some things that sounded really important about how to deadlift safely. Then he left us on our own. A relative newcomer was next to me. His form and technique were bad, and the coach didn’t say anything. He was standing at the end of the room, ostensibly watching, but he either wasn’t really looking, or he was lazy, or he didn’t know enough to see that the guy was doing the movement unsafely.

The newcomer turned to me and asked me what weight I thought he should use. I recommended that he scale the weights way down, but it wasn’t my gym and I wasn’t the coach. He lifted too heavy. I don’t think he hurt himself, but he was rounding his back horribly and I’m sure he found it hard to move for a week afterward. The coach just watched from the end of the gym, all the way through the workout. All he did was start and stop the music. What a jerk.

There’s an element of responsibility to put on the athletes. You need to know whether you’re doing things safely or not. If you don’t know, you should ask your coach. For me, rule #1 is to find out how much I don’t know, and not to attempt something unless I know how much I know about it. This athlete should have taken the matter into his own hands and asked for more active coaching.

But that doesn’t excuse the coach either.

The gym I go to — that nonsense does not happen. And I’ve been to a few gyms over the years and found them to be good. I’m glad I learned in a safe environment, but not all gyms and coaches are that good.

Precedent and Direction-Setting, and Lack of Reporting

What worries me the most is that the type of tragedy that happened to Kevin Ogar is going to happen close to home and impact my friends or my gym. The problem is complex to untangle, but in brief,

  1. Once annually there’s a series of quasi-competitions called the CrossFit Open. These are scored workouts over a span of weeks. They are set by the national CrossFit organization, not the local gyms. The scores are used to filter who is the first rank of competitors to go to regional competitions, and then eventually on to the annual CrossFit Games.
  2. The CrossFit Open workouts will certainly include snatches.
  3. If local gyms don’t program snatches regularly, their members won’t be prepared at all for the Open.
  4. Local gyms don’t have to participate in the Open, and don’t have to encourage their members to, but that’s easier said than done due to the community aspects of CrossFit.

The result, in my opinion, is that there’s systemic pressure for gyms and members to do things that carry a higher risk-to-reward ratio than many members would prefer. Anecdotally, many members I’ve spoken to share my concerns about the snatch. They love CrossFit, but they don’t like the pressure to do this awkward and frightening movement.

Finally, it’s very difficult to understand how serious the problem really is. Is there a high risk of injury from a snatch, or does it just seem that way because of high-profile incidents? Are we right to be afraid of the snatch, or is it just a movement that makes you feel really vulnerable? The problem here is that there’s no culture of reporting incidents in CrossFit.

I can point to another sport where that culture does exist: caving. The National Speleological Society publishes accident reports, and conscientious cavers share a culture that every incident, even trivial ones, must be reported. As a result, you can browse the NSS accident reports (summarized here) and see some things clearly (you have to be a member to access the full reports, which are often excruciatingly detailed). One of the most obvious conclusions you’ll draw right away is that cave diving (scuba diving in underwater caves) is incredibly dangerous and kills a lot of people, despite it being a small portion of the overall caving sport’s popularity. If you weren’t a caver and you didn’t know about cave diving, would you think this was the case? I’m not sure I would. After reading cave diving accident reports, I remember being shocked at how many people are found dead underwater for no apparent reason, with air left in their tanks. The accident reports help cavers assess the risks of what they do.

Nothing similar exists for CrossFit, and I wish it did.

Negative Press About CrossFit

On the topic of what gets attention and exposure, I’ve seen a bunch of attention-seeking blog posts from people who “told the dirty truth” about how CrossFit injured them and there’s a culture of silencing dissenters and so on. I’m sure some of that happens, but the stuff I’ve read has been from people who have an axe to grind. And frankly, most of those people were indisputably idiots. They were blaming their problems and injuries on CrossFit when the real problem was between their ears. I won’t link to them, because they don’t deserve the attention.

Don’t believe most of what you read online about CrossFit. Many of the people telling their personal stories about their experiences in CrossFit are drama queens blowing things completely out of proportion. There’s a lot of legitimate objective criticism too, most of it from neutral third-parties who have serious credentials in physical fitness coaching, but this doesn’t get as much attention. And there’s a lot of great writing about what’s good about CrossFit, much of it from the good-hearted, honest, knowledgeable coaches and gym owners who soldier on despite the ongoing soap operas and media hype wars. They’re bringing fitness and health — and fun — to people who otherwise don’t get enough of it.


Toss corporate sponsors, personal politics, competition, the lure of great gains from winning, and a bunch of testosterone together and you’re going to get some people hurt. Mix it in with snatches and it’s a miracle if nobody gets seriously injured.

If you participate in CrossFit, which I highly recommend, take responsibility for your own safety. If there is a rah-rah attitude of pushing too hard at all costs in your gym, or if your coaches aren’t actually experts at what they do (the CrossFit weekend-long certification seminars don’t count), or if it’s not right for any other reason, go elsewhere.

Stay healthy and have fun, and do constantly varied, functional movements at high intensity in the company of your peers – and do it safely.

Photo credits:

How to Tune A Guitar (Or Any Instrument)

Posted in Recent Content on Xaprb at January 18, 2014 12:00 AM

Do you know how to tune a guitar? I mean, do you really know how to tune a guitar?

Guitar Closeup

I’ve met very few people who do. Most people pick some notes, crank the tuners, play some chords, and endlessly fidget back and forth until they either get something that doesn’t sound awful to their ears, or they give up. I can’t recall ever seeing a professional musician look like a tuning pro on stage, either. This really ought to be embarrassing to someone who makes music for a career.

There’s a secret to tuning an instrument. Very few people seem to know it. It’s surprisingly simple, it isn’t at all what you might expect, and it makes it easy and quick to tune an instrument accurately without guesswork. However, even though it’s simple and logical, it is difficult and subtle at first, and requires training your ear. This is a neurological, physical, and mental process that takes some time and practice. It does not require “perfect pitch,” however.

In this blog post I’ll explain how it works. There’s a surprising amount of depth to it, which appeals to the nerd in me. If you’re looking for “the short version,” you won’t find it here, because I find the math, physics, and theory of tuning to be fascinating, and I want to share that and not just the quick how-to.

If you practice and train yourself to hear in the correct way, with a little time you’ll be able to tune a guitar by just striking the open strings, without using harmonics or frets. You’ll be able to do this quickly, and the result will be a guitar that sounds truly active, alive, energetic, amazing — much better results than you’ll get with a digital tuner. As a bonus, you’ll impress all of your friends.

My Personal History With Tuning

When I was a child my mother hired a piano tuner who practiced the “lost art” of tuning entirely by ear. His name was Lee Flory. He was quite a character; he’d tuned for famous concert pianists all over the world, toured with many of them, and had endless stories to tell about his involvement with all sorts of musicians in many genres, including bluegrass and country/western greats. My mother loved the way the piano sounded when he tuned it. It sang. It was alive. It was joyous.

For whatever reason, Lee took an interest in me, and not only tolerated but encouraged my fascination with tuning. I didn’t think about it at the time, but I’m pretty sure he scheduled his visits differently to our house. I think he allowed extra time so that he could spend an hour or more explaining everything to me, playing notes, coaching me to hear subtleties.

And thus my love affair with the math, physics, and practice of tuning began.


The first great secret is that tuning isn’t about listening to the pitch of notes. While tuning, you don’t try to judge whether a note is too high or too low. You listen to something called beats instead.

Beats are fluctuations in volume created by two notes that are almost the same frequency.

When notes are not quite the same frequency, they’ll reinforce each other when the peaks occur together, and cancel each other out when the peaks are misaligned. Here’s a diagram of two sine waves of slightly different frequencies, and the sum of the two (in red).


Your ear will not hear two distinct notes if they’re close together. It’ll hear the sum.

Notice how the summed wave (the red wave) fluctuates in magnitude. To the human ear, this sounds like a note going “wow, wow, wow, wow.” The frequency of this fluctuation is the difference between the frequencies of the notes.

This is the foundation of all tuning by ear that isn’t based on guesswork.

Before you go on, tune two strings close together on your guitar or other instrument, and listen until you can hear it. Or, just fret one string so it plays the same note as an open string, and strike them together. Bend the string you’ve fretted, a little less, a little more. Listen until you hear the beats.

Bending String

The Math of Pitch

Musical notes have mathematical relationships to one another. The exact relationships depend on the tuning. There are many tunings, but in this article I’ll focus on the tuning used for nearly all music in modern Western cultures: the 12-tone equal temperament tuning.

In this tuning, the octave is the fundamental interval of pitch. Notes double in frequency as they rise an octave, and the ratio of frequencies between each adjacent pair of notes is constant. Since there are twelve half-steps in an octave, the frequency increase from one note to the next is the twelfth root of 2, or about 1.059463094359293.

Staying with Western music, where we define the A above middle C to have the frequency of 440Hz, the scale from A220 to A440 is as follows:

Note     Frequency
=======  =========
A220     220.0000
A-sharp  233.0819
B        246.9417
C        261.6256
C-sharp  277.1826
D        293.6648
D-sharp  311.1270
E        329.6276
F        349.2282
F-sharp  369.9944
G        391.9954
G-sharp  415.3047
A440     440.0000

We’ll refer back to this later.

The Math Of Intervals

If you’ve ever sung in harmony or played a chord, you’ve used intervals. Intervals are named for the relative distance between two notes: a minor third, a fifth, and so on. These are a little confusing, because they sound like fractions. They’re not. A fifth doesn’t mean that one note is five times the frequency of another. A fifth means that if you start on the first note and count upwards five notes on a major scale, you’ll reach the second note in the interval. Here’s the C scale, with the intervals between the lowest C and the given note listed at the right:

Note  Name  Interval from C
====  ====  ===============
C     Do    Unison
D     Re    Major 2nd
E     Mi    Major 3rd
F     Fa    4th (sometimes called Perfect 4th)
G     So    5th (a.k.a. Perfect 5th)
A     La    Major 6th
B     Ti    Major 7th
C     Do    Octave (8th)

On the guitar, adjacent strings form intervals of fourths, except for the interval between the G and B strings, which is a major third.

Some intervals sound “good,” “pure,” or “harmonious.” A major chord, for example, is composed of the root (first note), major third, fifth, and octave. The chord sounds good because the intervals between the notes sound good. There’s a variety of intervals at play: between the third and fifth is a minor third, between the fifth and octave is a fourth, and so on.

It turns out that the intervals that sound the most pure and harmonious are the ones whose frequencies have the simplest relationships. In order of increasing complexity, we have:

  • Unison: two notes of the same frequency.
  • Octave: the higher note is double the frequency.
  • Fifth: the higher note is 3/2s the frequency.
  • Fourth: the higher note is 4/3rds the frequency.
  • Third: the higher note is 5/4ths the frequency.
  • Further intervals (minor thirds, sixths, etc) have various relationships, but the pattern of N/(N-1) doesn’t hold beyond the third.

These relationships are important for tuning, but beyond here it gets significantly more complex. This is where things are most interesting!

Overtones and Intervals

As a guitar player, you no doubt know about “harmonics,” also called overtones. You produce a harmonic by touching a string gently at a specific place (above the 5th, 7th, or 12th fret, for example) and plucking the string. The note that results sounds pure, and is higher pitched than the open string.


Strings vibrate at a base frequency, but these harmonics (they’re actually partials, but I’ll cover that later) are always present. In fact, much of the sound energy of a stringed instrument is in overtones, not in the fundamental frequency. When you “play a harmonic” you’re really just damping out most of the frequencies and putting more energy into simpler multiples of the fundamental frequency.

Overtones are basically multiples of the fundamental frequency. The octave, for example, is twice the frequency of the open string. Touching the string at the 12th fret is touching it at its halfway point. This essentially divides the string into two strings of half the length. The frequency of the note is inversely dependent on the string’s length, so half the length makes a note that’s twice the frequency. The seventh fret is at 1/3rd the length of the string, so the note is three times the frequency; the 5th fret is ¼th the length, so you hear a note two octaves higher, and so on.

The overtones give the instrument its characteristic sound. How many of them there are, their frequencies, their volumes, and their attack and decay determines how the instrument sounds. There are usually many overtones, all mixing together into what you usually think of as a single note.

Tuning depends on overtones, because you can tune an interval by listening to the beats in its overtones.

Take a fifth, for example. Recall from before that the second note in the fifth is 3/2 the frequency of the first. Let’s use A220 as an example; a fifth up from A220 is E330. E330 times two is E660, and A220 times three is E660 also. So by listening to the first overtone of the E, and the second overtone of the A, you can “hear a fifth.”

You’re not really hearing the fifth, of course; you’re really hearing the beats in the overtones of the two notes.

Practice Hearing Intervals

Practice hearing the overtones in intervals. Pick up your guitar and de-tune the lowest E string down to a D. Practice hearing its overtones. Pluck a harmonic at the 12th string and strike your open D string; listen to the beats between the notes. Now play both strings open, with no harmonics, at the same time. Listen again to the overtones, and practice hearing the beats between them. De-tune slightly if you need to, to make the “wow, wow, wow, wow” effect easier to notice.

Take a break; don’t overdo it. Your ear will probably fatigue quickly and you’ll be unable to hear the overtones, especially as you experiment more with complex intervals. In the beginning, you should not be surprised if you can focus on these overtones for only a few minutes before it gets hard to pick them out and things sound jumbled together. Rest for a few hours. I would not suggest doing this more than a couple of times a day initially.

The fatigue is real, by the way. As I mentioned previously, being able to hear beats and ignore the richness of the sound to pick out weak overtones is a complex physical, mental, and neurological skill — and there are probably other factors too. I’d be interested in seeing brain scans of an accomplished tuner at work. Lee Flory was not young, and he told me that his audiologist said his hearing had not decayed with age. This surprised the doctor, because he spent his life listening to loud sounds. Lee attributed this to daily training of his hearing, and told me that the ear is like any other part of the body: it can be exercised. According to Lee, if he took even a single day’s break from tuning, his ear lost some of its acuity.

Back to the topic: When you’re ready, pluck a harmonic on the lowest D string (formerly the E string) at the 7th fret, and the A string at the 12th fret, and listen to the beats between them. Again, practice hearing the same overtones (ignoring the base notes) when you strike both open strings at the same time.

When you’ve heard this, you can move on to a 4th. You can strike the harmonic at the 5th fret of the A string and th 7th fret of the D string, for example, and listen to the beats; then practice hearing the same frequencies by just strumming those two open strings together.


As you do all of these exercises, try your best to ignore pitch (highness or lowness) of the notes, and listen only to the fluctuations in volume. In reality you’ll be conscious of both pitch and beats, but this practice will help develop your tuning ear.

Imperfect Intervals and Counting Beats

You may have noticed that intervals in the equal-tempered 12-tone tuning don’t have exactly the simple relationships I listed before. If you look at the table of frequencies above, for example, you’ll see that in steps of the 12th root of 2, E has a frequency of 329.6276Hz, not 330Hz.

Oh no! Was it all a lie? Without these relationships, does tuning fall apart?


Not really. In the equal-tempered tuning, in fact, there is only one perfect interval: the octave. All other intervals are imperfect, or “tempered.”

  • The 5th is a little “narrow” – the higher note in the interval is slightly flat
  • The 4th is a little “wide” – the higher note is sharp
  • The major 3rd is even wider than the 4th

Other intervals are wide or narrow, just depending on where their frequencies fall on the equal-tempered tuning. (In practice, you will rarely or never tune intervals other than octaves, 5ths, 4ths, and 3rds.)

As the pitch of the interval rises, so does the frequency of the beats. The 4th between A110 and the D above it will beat half as fast as the 4th an octave higher.

What this means is that not only do you need to hear beats, but you need to count them. Counting is done in beats per second. It sounds insanely hard at first (how the heck can you count 7.75 beats a second!?) but it will come with practice.

You will need to know how many beats wide or narrow a given interval will be. You can calculate it easily enough, and I’ll show examples later.

After a while of tuning a given instrument, you’ll just memorize how many beats to count for specific intervals, because as you’ll see, there’s a system for tuning any instrument. You generally don’t need to have every arbitrary interval memorized. You will use only a handful of intervals and you’ll learn their beats.

Tuning The Guitar

With all that theory behind us, we can move on to a tuning system for the guitar.

Let’s list the strings, their frequencies, and some of their overtones.

String  Freq    Overtone_2  Overtone_3  Overtone_4  Overtone_5
======  ======  ======      ======      =======     =======
E       82.41   164.81      247.22      329.63      412.03
A       110.00  220.00      330.00      440.00      550.00
D       146.83  293.66      440.50      587.33      734.16
G       196.00  392.00      587.99      783.99      979.99
B       246.94  493.88      740.82      987.77      1234.71
E       329.63  659.26      988.88      1318.51     1648.14

Because the open strings of the guitar form 4ths and one 3rd, you can tune the guitar’s strings open, without any frets, using just those intervals. There’s also a double octave from the lowest E to the highest E, but you don’t strictly need to use that except as a check after you’re done.

For convenience, here’s the same table with only the overtones we’ll use.

String  Freq    Overtone_2  Overtone_3  Overtone_4  Overtone_5
======  ======  ==========  ==========  ==========  ==========
E       82.41               247.22      329.63 
A       110.00              330.00      440.00
D       146.83              440.50      587.33      734.16
G       196.00              587.99                  979.99
B       246.94              740.82      987.77
E       329.63              988.88      

Tuning the A String

The first thing to do is tune one of the strings to a reference pitch. After that, you’ll tune all of the other strings relative to this first one. On the guitar, the most convenient reference pitch is A440, because the open A string is two octaves below at 110Hz.

You’ll need a good-quality A440 tuning fork. I prefer a Wittner for guitar tuning; it’s a good-quality German brand that is compact, so it fits in your guitar case’s pocket, and has a small notch behind the ball at the end of the stem, so it’s easy to hold in your teeth if you prefer that.

Wittner A440 Tuning Fork

Strike the tuning fork lightly with your fingernail, or tap it gently against your knee. Don’t bang it against anything hard or squeeze the tines, or you might damage it and change its pitch. You can hold the tuning fork against the guitar’s soundboard, or let it rest lightly between your teeth so the sound travels through your skull to your ears, and strike the open A string. Tune the A string until the beats disappear completely. Now put away the tuning fork and continue. You won’t adjust the A string after this.

If you don’t have a tuning fork, you can use any other reference pitch, such as the A on a piano, or a digitally produced A440.

Tuning the Low E String

Strike the open low E and A strings together, and tune the E string. Listen to the beating of the overtones at the frequency of the E two octaves higher. If you have trouble hearing it, silence all the strings, then pluck a harmonic on the E string at the 5th fret. Keep that tone in your memory and then sound the two strings together. It’s important to play the notes together, open, simultaneously so that you don’t get confused by pitches. Remember, you’re trying to ignore pitch completely, and get your ear to isolate the sound of the overtone, ignoring everything but its beating.

When correctly tuned, the A string’s overtone will be at 330Hz and the E string’s will be at 329.63Hz, so the interval is 1/3rd of a beat per second wide. That is, you can tune the E string until the beats disappear, and then flatten the low E string very slightly until you hear one beat every three seconds. The result will be a very slow “wwwoooooowww, wwwwoooooowww” beating.

Tuning the D String

Now that the low E and A strings are tuned, strike the open A and D strings together. You’re listening for beats in the high A440 overtone. The A string’s overtone will be at 440Hz, and the D string’s will be at 440.50Hz, so the interval should be ½ beat wide. Tune the D string until the beats disappear, then sharpen the D string slightly until you hear one beat every 2 seconds.

Tuning the G String

Continue by striking the open D and G strings, and listen for the high D overtone’s beating. Again, if you have trouble “finding the note” with your ear, silence everything and strike the D string’s harmonic at the 5th fret. You’re listening for a high D overtone, two octaves higher than the open D string. The overtones will be at 587.33Hz and 587.99Hz, so the interval needs to be 2/3rds of a beat wide. Counting two beats every three seconds is a little harder than the other intervals we’ve used thus far, but it will come with practice. In the beginning, feel free to just give it your best wild guess. As we’ll discuss a little later, striving for perfection is futile anyway.

Tuning the B String

Strike the open G and B strings. The interval between them is a major 3rd, so this one is trickier to hear. A major 3rd’s frequency ratio is approximately 5/4ths, so you’re listening for the 5th overtone of the G string and the 4th overtone of the B string. Because these are higher overtones, they’re not as loud as the ones you’ve been using thus far, and it’s harder to hear.

To isolate the note you need to hear, mute all the strings and then pluck a harmonic on the B string at the 5th fret. The overtone is a B two octaves higher. Search around on the G string near the 4th fret and you’ll find the same note.

The overtones are 979.99Hz and 987.77Hz, so the interval is seven and three-quarters beats wide. This will be tough to count at first, so just aim for something about 8 beats and call it good enough. With time you’ll be able to actually count this, but it will be very helpful at first to use some rules of thumb. For example, you can compare the rhythm of the beating to the syllables in the word “mississippi” spoken twice per second, which is probably about as fast as you can say it back-to-back without pause.

Tune the B string until the beats disappear, then sharpen it 8 beats, more or less.

Tuning the High E String

You’re almost done! Strike the open B and E strings, and listen for the same overtone you just used to tune the G and B strings: a high B. The frequencies are 987.77Hz and 988.88Hz, so the interval is 1.1 beats wide. Sharpen the E string until the high B note beats a little more than once a second.

Testing The Results

Run a couple of quick checks to see whether you got things right. First, check your high E against your low E. They are two octaves apart, so listen to the beating of the high E string. It should be very slow or nonexistent. If there’s a little beating, don’t worry about it. You’ll get better with time, and it’ll never be perfect anyway, for reasons we’ll discuss later.

You can also check the low E against the open B string, and listen for beating at the B note, which is the 3rd overtone of the E string. The B should be very slightly narrow (flat) — theoretically, you should hear about ¼th of a beat.

Also theoretically, you could tune the high B and E strings against the low open E using the same overtones. However, due to imperfections in strings and the slowness of the beating, this is usually much harder to do. As a result, you’ll end up with high strings that don’t sound good together. A general rule of thumb is that it’s easier to hear out-of-tune-ness in notes that are a) closer in pitch and b) higher pitched, so you should generally “tune locally” rather than “tuning at a distance.” If you don’t get the high strings tuned well together, you’ll get really ugly-sounding intervals such as the following:

  • the 5th between your open G string and the D on the 3rd fret of the B string
  • the 5th between the A on the second fret of the G string and the open high E string
  • the octave between your open G string and the G on the 3rd fret of the high E string
  • the octave between your open D string and the D on the 3rd fret of the B string
  • the 5th between the E on the second fret of the D string and the open B string

If those intervals are messed up, things will sound badly discordant. Remember that the 5ths should be slightly narrow, not perfect. But the octaves should be perfect, or very nearly so.

Play a few quick chords to test the results, too. An E Major, G major, and B minor are favorites of mine. They have combinations of open and fretted notes that helps make it obvious if anything’s a little skewed.


You’re Done!

With time, you’ll be able to run through this tuning system very quickly, and you’ll end up with a guitar that sounds joyously alive in all keys, no matter what chord you play. No more fussing with “this chord sounds good, but that one is awful!” No more trial and error. No more guessing which string is out of tune when something sounds bad. No more game of “tuning whack-a-mole.”

To summarize:

  • Tune the A string with a tuning fork.
  • Tune the low E string 1/3 of a beat wide relative to the A.
  • Tune the D string ½ of a beat wide relative to the A.
  • Tune the G string 2/3 of a beat wide relative to the D.
  • Tune the B string 7 ¾ beats wide relative to the G.
  • Tune the high E string just over 1 beat wide relative to the B.
  • Cross-check the low and high E strings, and play a few chords.

This can be done in a few seconds per string.

If you compare your results to what you’ll get from a digital tuner, you’ll find that with practice, your ear is much better. It’s very hard to tune within a Hz or so with a digital tuner, in part because the indicators are hard to read. What you’ll get with a digital tuner is most strings are pretty close to their correct frequency. This is a lot better than the ad-hoc tuning by trial-and-error you might have been accustomed to doing, because that method results in some intervals being tuned to sound good but others badly discordant. The usual scenario I see is someone’s B string is in good shape, but the G and the E are out of tune. The guitar player then tunes the B string relative to the out-of-tune E and G, and then everything sounds awful. This is because the guitarist had no frame of reference for understanding which strings were out of tune in which directions.

But when you tune by listening to beats, and get good at it, you’ll be able to tune strings to within a fraction of a cycle per second of what they should be. Your results will absolutely be better than a digital tuner.

I don’t mean to dismiss digital tuners. They’re very useful when you’re in a noisy place, or when you’re tuning things like electric guitars, which have distortion that buries overtones in noise. But if you learn to tune by hearing beats, you’ll be the better for it, and you’ll never regret it, I promise. By the way, if you have an Android smartphone, I’ve had pretty good results with the gStrings app.


Advanced Magic

If you do the math on higher overtones, you’ll notice a few other interesting intervals between open strings. As your ear sharpens, you’ll be able to hear these, and use them to cross-check various combinations of strings. This can be useful because as you get better at hearing overtones and beats, you’ll probably start to become a bit of a perfectionist, and you won’t be happy unless particular intervals (such as the 5ths and octaves mentioned just above) sound good. Here they are:

  • Open A String to Open B String. The 9th overtone of the open A string is a high B note at 990Hz, and the 4th overtone of the open B is a high B at 987.77HZ. If you can hear this high note, you should hear it beating just over twice per second. The interval between the A and B strings is a minor 7th, which should be slightly narrow. Thus, if you tune the B until the beating disappears, you should then flatten it two beats.
  • Open D String to Open E String. This is also a minor 7th interval. You’re listening for a very high E note, at 1321.5Hz on the D string, and 1318.5 on the E string, which is 3 beats narrow.
  • Open D String to Open B String. The 5th overtone of the D string is similar to the 3rd overtone of the B string. This interval is about 6 and 2/3 beats wide. This is a bit hard to hear at first, but you’re listening for a high F-sharp.

Systems for Tuning Arbitrary Instruments

The guitar is a fairly simple instrument to tune, because it has only 6 strings, and 4ths are an easy interval to tune. The inclusion of a major 3rd makes it a little harder, but not much.

It is more complicated, and requires more practice, to tune instruments with more strings. The most general approach is to choose an octave, and to tune all the notes within it. Then you extend the tuning up and down the range as needed. For example, to tune the piano you first tune all the notes within a C-to-C octave (piano tuners typically use a large middle-C tuning fork).

Once you have your first octave tuned, the rest is simple. Each note is tuned to the octave below it or above it. But getting that first octave is a bit tricky.

There are two very common systems of tuning: fourths and fifths, and thirds and fifths. As you may know, the cycle of fifths will cycle you through every note in the 12-note scale. You can cycle through the notes in various ways, however.

The system of thirds and fifths proceeds from middle C up a fifth to G, down a third to E-flat, up a fifth to B-flat, and so on. The system of fourths and fifths goes from C up a fifth to G, down a fourth to D, and so on.

All you need to do is calculate the beats in the various intervals and be able to count them. The piano tuners I’ve known prefer thirds and fifths because if there are imperfections in the thirds, especially if they’re not as wide as they should be, it sounds truly awful. Lively-sounding thirds are important; fourths and fifths are nearly perfect, and should sound quite pure, but a third is a complex interval with a lot of different things going on. Fourths and fifths also beat slowly enough that it’s easy to misjudge and get an error that accumulates as you go through the 12 notes. Checking the tuning with thirds helps avoid this.

Tuning a Hammered Dulcimer

I’ve built several many-stringed instruments, including a couple of hammered dulcimers. My first was a home woodworking project with some two-by-four lumber, based on plans from a book by Phillip Mason I found at the library and decided to pick up on a whim. For a homebuilt instrument, it sounded great, and building an instrument like this is something I highly recommend.

Later I designed and built a second one, pictured below. Pardon the dust!


Tuning this dulcimer takes a while. I start with an octave on the bass course. Dulcimers can have many different tunings; this one follows the typical tuning of traditional dulcimers, which is essentially a set of changing keys that cycle backwards by fifths as you climb the scale. Starting at G, for example, you have a C major scale up to the next G, centered around middle C. But the next B is B-flat instead of B-natural, so there’s an F major scale overlapping with the top of the C major, and so on:

G A B C D E F G A B-flat C D...

It’s easy to tune this instrument in fourths and fifths because of the way its scales are laid out. If I do that, however, I find that I have ugly-sounding thirds more often than not. So I’ll tune by combinations of fifths, fourths, and thirds:

G A B C D E F G A B-flat C D...
^-------------^                 (up an octave)
      ^-------^                 (down a fifth)
      ^---^                     (up a third)
  ^-------^                     (down a fifth)

And so on. In addition to using thirds where I can (G-B, C-E), I’ll check my fifths and fourths against each other. If you do the math, you’ll notice that the fourth from G to C is exactly as wide as the fifth from C to G again is narrow. (This is a general rule of fourths and fifths. Another rule is that the fourth at the top of the octave beats twice as fast as the fifth at the bottom; so G-D beats half as fast as D-G.)

When I’m done with this reference octave, I’ll extend it up the entire bass course, adjusting for B-flat by tuning it relative to F, and checking any new thirds that I encounter as I climb the scale. And then I’ll extend that over to the right-hand side of the treble course. I do not use the left-hand (high) side of the treble course to tune, because its notes are inaccurate depending on the placement of the bridge.

With a little math (spreadsheets are nice), and some practice, you can find a quick way to tune almost any instrument, along with cross-checks to help prevent skew as you go.

Tuning a Harp

Another instrument I built (this time with my grandfather) is a simplified replica of the Scottish wire-strung Queen Mary harp. This historical instrument might have been designed for some golden and silver strings, according to Ann Heyman’s research. In any case, it is quite difficult to tune with bronze or brass strings. It is “low-headed” and would need a much higher head to work well with bronze or brass.


Tuning this harp is quite similar to the hammered dulcimer, although it is in a single key, so there’s no need to adjust to key changes as you climb the scale. A simple reference octave is all you need, and then it’s just a matter of extending it. I have never tuned a concert harp, but I imagine it’s more involved.

Tangent: I first discovered the wire-strung harp in 1988, when I heard Patrick Ball’s first volume of Turlough O’Carolan’s music. If you have not listened to these recordings, do yourself a favor and at least preview them on Amazon. All these years later, I still listen to Patrick Ball’s music often. His newest recording, The Wood of Morois, is just stunning. I corresponded with Patrick while planning to build my harp, and he put me in touch with master harpmaker Jay Witcher, and his own role model, Ann Heymann, who was responsible for reinventing the lost techniques of playing wire-strung harps. Her recordings are a little hard to find in music stores, but are worth it. You can buy them from her websites http://www.clairseach.com/, http://www.annheymann.com/, and http://www.harpofgold.net/. If you’re interested in learning to play wire-strung harp, her book is one of the main written sources. There are a variety of magazines covering the harp renaissance in the latter part of the 20th century, and they contain much valuable additional material.

Beyond Tuning Theory: The Real World

Although simple math can compute the theoretically correct frequencies of notes and their overtones, and thus the beats of various intervals, in practice a number of factors make things more complicated and interesting. In fact, the math up until now has been of the “frictionless plane” variety. For those who are interested, I’ll dig deeper into these nuances.

The nuances and deviations from perfect theory are the main reasons why a) it’s impossible to tune anything perfectly and b) an instrument that’s tuned skillfully by ear sounds glorious, whereas an instrument tuned digitally can sound lifeless.

Harmonics, Overtones, and Partials

I was careful to use the term “overtone” most of the time previously. In theory, a string vibrates at its fundamental frequency, and then it has harmonic overtones at twice that frequency, three times, and so on.

However, that’s not what happens in practice, because theory only applies to strings that have no stiffness. The stiffness of the string causes its overtones to vibrate at slightly higher frequencies than you’d expect. For this reason, these overtones aren’t true harmonics. This is called inharmonicity, and inharmonic overtones are called partials to distinguish them from the purely harmonic overtones of an instrument like a flute, which doesn’t exhibit the same effect.

You might think that this inharmonicity is a bad thing, but it’s not. Common tones with a great deal of inharmonicity are bells (which often have so much inharmonicity that you can hear the pitches of their partials are too high) and various types of chimes. I keep a little “zenergy” chime near my morning meditation table because its bright tones focus my attention. I haven’t analyzed its spectrum, but because it is made with thick bars of aluminum, I’m willing to bet that it has partials that are wildly inharmonic. Yet it sounds pure and clear.

Woodstock Percussion ZENERGY3 Zenergy Chime

Much of the richness and liveliness of a string’s sound is precisely because of the “stretched” overtones. Many people compare Patrick Ball’s brass-strung wire harp to the sound of bells, and say it’s “pure.” It may sound pure, but pure-sounding is not simple-sounding. Its tones are complex and highly inharmonic, which is why it sounds like a bell.

In fact, if you digitally alter a piano’s overtones to correct the stretching, you get something that sounds like an organ, not a piano. This is one of the reasons that pianos tuned with digital tuners often sound like something scraped from the bottom of a pond.

Some digital tuners claim to compensate for inharmonicity, but in reality each instrument and its strings are unique and will be inharmonic in different ways.

Some practical consequences when tuning by listening to beats:

  • Don’t listen to higher partials while tuning. When tuning an octave, for example, you should ignore the beating of partials 2 octaves up. This is actually quite difficult to do and requires a well-developed ear. The reason is that higher partials will beat even when the octave is perfect, and they beat more rapidly and more obviously than the octave. Tuning a perfect octave requires the ability to hear very subtle, very gradual beats while blocking out distractions. This is also why I said not to worry if your low E string and high E string beat slightly. When tuned as well as possible, there will probably be a little bit of beating.
  • You might need to ignore higher partials in other intervals as well.
  • You might need to adjust your tuning for stretching caused by inharmonicity. In practice, for example, most guitars need to be tuned to slightly faster beats than you’d expect from pure theory.
  • Cross-checking your results with more complex intervals (especially thirds) can help balance the stretching better, and make a more pleasing-sounding tuning.
  • You might find that when using the “advanced tricks” I mentioned for the guitar, the open intervals such as minor 7ths will beat at different rates than you’d predict mathematically. However, once you are comfortable tuning your guitar so it sounds good, you’ll learn how fast those intervals should beat and it’ll be a great cross-reference for you.

Sympathetic and False Beats

It’s often very helpful to mute strings while you’re tuning other strings. The reason is that the strings you’re tuning will set up sympathetic vibrations in other strings that have similar overtones, and this can distract you.

When tuning the guitar, this generally isn’t much of a problem. However, be careful that when you tune the low E and A strings you don’t get distracted by vibrations from the high E string.

When tuning other instruments such as a hammered dulcimer or harp, small felt or rubber wedges (with wire handles if possible) are invaluable. If you don’t have these, you can use small loops of cloth.

In addition to distraction from sympathetic vibrations, strings can beat alone, when no other note is sounding. This is called a false beat. It’s usually caused by a flaw in the string itself, such as an imperfection in the wire or a spot of rust. This is a more difficult problem, because you can’t just make it go away. Instead, you will often have to nudge the tuning around a little here, a little there, to make it sound the best you can overall, given that there will be spurious beats no matter what. False beats will challenge your ear greatly, too.

In a guitar, false beats might signal that it’s time for a new set of strings. In a piano or other instrument, strings can be expensive to replace, and new strings take a while to settle in, so it’s often better to just leave it alone.

Imperfect Frets, Strings, Bridges and Nuts

I’ve never played a guitar with perfect frets. The reality is that every note you fret will be slightly out of tune, and one goal of tuning is to avoid any particular combination of bad intervals that sounds particularly horrible.

This is why it’s helpful to play at least a few chords after tuning. If you tune a particular instrument often you’ll learn the slight adjustments needed to make things sound as good as possible. On my main guitar, for example, the B string needs to be slightly sharp so that a D sounds better.

It’s not only the frets, but the nut (the zeroth fret) and the bridge (under the right hand) that matter. Sometimes the neck needs to be adjusted as well. A competent guitar repairman should be able to adjust the action if needed.

Finally, the weight and manufacture of the strings makes a difference. My main guitar and its frets and bridge sound better and more accurate with medium-weight Martin bronze-wound strings than other strings I’ve tried. As your ear improves, you’ll notice subtleties like this.


New Strings

New strings (or wires) will take some time to stretch and settle in so they stay in tune. You can shorten this time by playing vigorously and stretching the strings, bending them gently. Be careful, however, not to be rough with the strings. If you kink them or strain them past their elastic point, you’ll end up with strings that have false beats, exaggerated inharmonicity, or different densities along some lengths of the string, which will make it seem like your frets are wrong in strange ways.

The Instrument Flexes and Changes

If an instrument is especially out of tune, the first strings you tune will become slightly out of tune as you change the tension on the rest of the instrument. The best remedy I can offer for this is to do a quick approximate tuning without caring much about accuracy. Follow this up with a second, more careful tuning.

This was especially a problem with my first hammered dulcimer, and is very noticeable with my harp, which flexes and changes a lot as it is tuned. My second hammered dulcimer has a ¾ inch birch plywood back and internal reinforcements, so it’s very stable. On the downside, it’s heavy!

Temperature and humidity play a large role, too. All of the materials in an instrument respond in different ways to changes in temperature and humidity. If you have a piano, you’re well advised to keep it in a climate-controlled room. If you’re a serious pianist you already know much more than I do about this topic.

Friction and Torque in Tuning Pin and Bridges

For guitarists, it’s important to make sure that your nut (the zeroth fret) doesn’t pinch the string and cause it to move in jerks and starts, or to have extra tension built up between the nut and the tuning peg itself. If this happens, you can rub a pencil in the groove where the string rides. The graphite in the pencil is a natural lubricant that can help avoid this problem.

Of course, you should also make sure that your tuning pegs and their machinery are smooth and well lubricated. If there’s excessive slop due to wear-and-tear or cheap machinery, that will be an endless source of frustration for you.

Tuning Pegs

On instruments such as pianos, hammered dulcimers, and harps, it’s important to know how to “set” the tuning pin. While tuning the string upwards, you’ll create torque on the pin, twisting it in the hole. The wood fibers holding it in place will also be braced in a position that can “flip” downwards. If you just leave the pin like this, it will soon wiggle itself back to its normal state, and even beyond that due to the tension the wire places on the pin. As a result, you need to practice tuning the note slightly higher than needed, and then de-tuning it, knocking it down to the desired pitch with a light jerk and leaving it in a state of equilibrium.

This technique is also useful in guitars and other stringed instruments, but each type of tuning machine has its own particularities. The main point to remember is that if you don’t leave things in a state of equilibrium and stability, they’ll find one soon enough, de-tuning the instrument in the process.

References and Further Reading

I tried to find the book from which I studied tuning as a child, but I can’t anymore. I thought it was an old Dover edition. The Dover book on tuning that I can find is not the one I remember.

You can find a little bit of information at various places online. One site with interesting information is Historical Tuning of Keyboard Instruments by Robert Chuckrow. I looked around on Wikipedia but didn’t find much of use. Please suggest further resources in the comments.

In this post I discussed the equally-tempered tuning, but there are many others. The study of them and their math, and the cultures and musical histories related to them, is fascinating. Next time you hear bagpipes, or a non-Western instrument, pay attention to the tuning. Is it tempered? Are there perfect intervals other than the octave? Which ones?

Listening to windchimes is another interesting exercise. Are the chimes harmonic or do they have inharmonicity? What scales and tunings do they use? What are the effects? Woodstock chimes use many unique scales and tunings. Many of their chimes combine notes in complex ways that result in no beating between some or all of the tones. Music of the Spheres also makes stunning chimes in a variety of scales and tunings.

As I mentioned, spreadsheets can be very helpful in computing the relationships between various notes and their overtones. I’ve made a small online spreadsheet that contains some of the computations I used to produce this blog post.

Let me know if you suggest any other references or related books, music, or links.

Enjoy your beautifully tuned guitar or other instrument, and most of all, enjoy the process of learning to tune and listen! I hope it enriches your appreciation and pleasure in listening to music.

Guitar Soundhole

Suggested links from various sources:

Picture Credits

A review of Bose, Sony, and Sennheiser noise-cancelling headphones

Posted in Recent Content on Xaprb at January 16, 2014 12:00 AM

I’ve used active noise-cancelling headphones for over ten years now, and have owned several pairs of Bose, one of Sony, and most recently a pair of Sennheiser headphones. The Sennheisers are my favorites. I thought I’d write down why I’ve gone through so many sets of cans and what I like and dislike about them.

Bose QuietComfort 15 Acoustic Noise Cancelling Headphones

Bose QuietComfort 15 Acoustic Noise Cancelling Headphones

I’m sure you’re familiar with Bose QuietComfort headphones. They’re the iconic “best-in-class” noise-cancelling headphones, the ones you see everywhere. Yet, after owning several pairs (beginning with Quiet Comfort II in 2003), I decided I’m not happy with them and won’t buy them anymore. Why not?

  • They’re not very good quality. I’ve worn out two pairs and opted to sell the third pair that Bose sent me as a replacement. Various problems occurred, including torn speakers that buzzed and grated. I just got tired of sending them back to Bose for servicing.
  • They’re more expensive than I think they’re worth, especially given the cheap components used.
  • They don’t sound bad – but to my ears they still have the classic Bose fairy-dust processing, which sounds rich and pleasant at first but then fatigues me.
  • They produce a sensation of suction on the eardrums that becomes uncomfortable over long periods of time.
  • They can’t be used in non-cancelling mode. In other words, if the battery is dead, they’re unusable.
  • On a purely personal note, I think Bose crosses the line into greed and jealousy. I know this in part because I used to work at Crutchfield, and saw quite a bit of interactions with Bose. As an individual – well, try selling a pair of these on eBay, and you’ll see what I mean. I had to jump through all kinds of hoops after my first listing was cancelled for using a stock photo that eBay themselves suggested and provided in the listing wizard. Here is the information the take-down notice directed me to.

On the plus side, the fit is very comfortable physically, they cancel noise very well, and they’re smaller than some other noise-cancelling headphones. Also on the plus side, every time I’ve sent a pair in for servicing, Bose has just charged me $100 and sent me a new pair.

Sony MDR-NC200D

Sony MDR-NC200D

When I sent my last pair of Bose in for servicing, they replaced them with a factory-sealed pair of new ones in the box, and I decided to sell them on eBay and buy a set of Sony MDR-NC200D headphones, which were about $100 less money than new Bose headphones at the time. I read online reviews and thought it was worth a try.

First, the good points. The Sonys are more compact even than the Bose, although as I recall they’re a little heavier. And the noise cancellation works quite well. The passive noise blocking (muffling) is in itself quite good. You can just put them on without even turning on the switch, and block a lot of ambient noise. The sound quality is also quite good, although there is a slight hiss when noise cancellation is enabled. Active cancellation is good, but not as good as the Bose.

However, it wasn’t long before I realized I couldn’t keep them. The Sonys sit on the ear, and don’t enclose the ear and sit against the skull as the Bose do. They’re on-the-ear, not over-the-ear. Although this doesn’t feel bad at first, in about 20 minutes it starts to hurt. After half an hour it’s genuinely painful. This may not be your experience, but my ears just start to hurt after being pressed against my head for a little while.

I had to sell the Sonys on eBay too. My last stop was the Sennheisers.

Sennheiser PXC 450 NoiseGard Active Noise-Canceling Headphones

Sennheiser PXC 450 NoiseGard Active Noise-Canceling Headphones

The Sennheiser PXC 450 headphones are midway in price between the Bose and the Sony: a little less expensive than the Bose. I’ve had them a week or so and I’m very happy with them so far.

This is not the first pair of Sennheisers I’ve owned. I’ve had a pair of open-air higher-end Sennheisers for over a decade. I absolutely love them, so you can consider me a Sennheiser snob to some extent.

I’m pleased to report that the PXC 450s are Sennheisers through and through. They have amazing sound, and the big cups fit comfortably around my ears. They are a little heavier than my other Sennheisers, but still a pleasure to wear.

The nice thing is that not only does noise cancellation work very well (on par with Bose’s, I’d say), but there is no sensation of being underwater with pressure or suction on the eardrums. Turn on the noise cancellation switch and the noise just vanishes, but there’s no strange feeling as a result. Also, these headphones can work in passive mode, with noise cancellation off, and don’t need a battery to work.

On the downside, if you want to travel with them, they’re a little bigger than the Bose. However I’ve travelled with the Bose headphones several times and honestly I find even them too large to be convenient. I don’t use noise-cancelling headphones for travel, as a result.

Another slight downside is that the earcups aren’t completely “empty” inside. There are some caged-over protrusions with the machinery inside. Depending on the shape of your ears, these might brush your ears if you move your head. I find that if I don’t place the headphones in the right spot on my head, they do touch my ears every now and then.


After owning several pairs of top-rated noise-cancelling headphones, I think the Sennheisers are the clear winners in price, quality, comfort, and sound. Your mileage may vary.

Xaprb now uses Hugo

Posted in Recent Content on Xaprb at January 15, 2014 12:00 AM

I’ve switched this blog from Wordpress to Hugo. If you see any broken links or other problems, let me know. I’ll re-enable comments and other features in the coming days.

Why not Wordpress? I’ve used Wordpress since very early days, but I’ve had my fill of security problems, the need to worry about whether a database is up and available, backups, plugin compatibility problems, upgrades, and performance issues. In fact, while converting the content from Wordpress to Markdown, I found a half-dozen pages that had been hacked by some link-farm since around 2007. This wasn’t the first such problem I’d had; it was merely the only one I hadn’t detected and fixed. And I’ve been really diligent with Wordpress security; I have done things like changing my admin username and customizing my .htaccess file to block common attack vectors, in addition to the usual “lockdown” measures that one takes with Wordpress.

In contrast to Wordpress or other CMSes that use a database, static content is secure, fast, and worry-free. I’m particularly happy that my content is all in Markdown format now. Even if I make another change in the future, the content is now mostly well-structured and easy to transform as desired. (There are some pages and articles that didn’t convert so well, but I will clean them up later.)

Why Hugo? There are lots of static site generators. Good ones include Octopress and Jekyll, and I’ve used those. However, they come with some of their own annoyances: dependencies, the need to install Ruby and so on, and particularly bothersome for this blog, performance issues. Octopress ran my CPU fan at top speed for about 8 minutes to render this blog.

Hugo is written in Go, so it has zero dependencies (a single binary) and is fast. It renders this blog in a couple of seconds. That’s fast enough to run it in server mode, hugo server -w, and I can just alt-tab back and forth between my writing and my browser to preview my changes. By the time I’ve tabbed over, the changes are ready to view.

Hugo isn’t perfect. For example, it lacks a couple of features that are present in Octopress or Jekyll. But it’s more than good enough for my needs, and I intend to contribute some improvements to it if I get time. I believe it has the potential to be a leading static site/blog generator going forward. It’s already close to a complete replacement for something like Jekyll.

Immutability, MVCC, and garbage collection

Posted in Recent Content on Xaprb at December 28, 2013 12:00 AM

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut.

Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly. Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability. In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version.

Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened?

Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality.

The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database).

ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases – no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely.

Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Early-access books: a double-edged sword

Posted in Recent Content on Xaprb at December 26, 2013 12:00 AM

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too. To some extent, editions create a similar problem1. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

1 Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator – cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Napkin math: How much waste does Celestial Seasonings save?

Posted in Recent Content on Xaprb at December 22, 2013 12:00 AM

I was idly reading the Celestial Seasonings box today while I made tea. Here’s the end flap:


It seemed hard to believe that they really save 3.5 million pounds of waste just by not including that extra packaging, so I decided to do some back-of-the-napkin math.

How much paper is in each package of non-Celestial-Seasonings tea? The little bag is about 2 inches by 2 inches, it’s two-sided, and there’s a tag, staple, and string. Call it 10 square inches.

How heavy is the paper? It feels about the same weight as normal copy paper. Amazon.com lists a box of 5000 sheets of standard letter-sized paper at a shipping weight of 50 pounds (including the cardboard box, but we’ll ignore that). Pretend that each sheet (8.5 * 11 inches = 93.5 square inches) is about 100 square inches. That’s .0001 pounds per square inch.

How much tea does Celestial Seasonings sell every year? Wikipedia says their sales in the US are over $100M, and they are a subsidiary of Hain Celestial, which has a lot of other large brands. Hain’s sales last year were just under $500M. $100M is a good enough ballpark number. Each box of 20 tea bags sells at about $3.20 on their website, and I think it’s cheaper at my grocery store. Call it $3.00 per box, so we’ll estimate the volume of tea bags on the high side (to make up for the low-side estimate caused by pretending there’s 100 square inches per sheet of paper). That means they sell about 33.3M boxes, or 667M bags, of tea each year.

If they put bags, tags, and strings on all of them, I estimated 10 square inches of paper per bag, so at .0001 pound per square inch that’s .001 pound of extra paper and stuff per bag. That means they’d use about 667 thousand pounds of paper to bag up all that tea.

That’s quite a difference from the 3.5 million pounds of waste they claim they save. Did I do the math wrong or assume something wrong?

Secure your accounts and devices

Posted in Recent Content on Xaprb at December 18, 2013 12:00 AM

This is a public service announcement. Many people I know are not taking important steps necessary to secure their online accounts and devices (computers, cellphones) against malicious people and software. It’s a matter of time before something seriously harmful happens to them.

This blog post will urge you to use higher security than popular advice you’ll hear. It really, really, really is necessary to use strong measures to secure your digital life. The technology being used to attack you is very advanced, operates at a large scale, and you probably stand to lose much more than you realize.

You’re also likely not as good at being secure as you think you are. If you’re like most people, you don’t take some important precautions, and you overestimate the strength and effectiveness of security measures you do use.

Password Security

The simplest and most effective way to dramatically boost your online security is use a password storage program, or password safe. You need to stop making passwords you can remember and make long, random passwords on websites. The only practical way to do this is to use a password safe.

Why? Because if you can remember the password, it’s trivially hackable. For example, passwords like 10qp29wo38ei47ru can be broken instantly. Anything you can feasibly remember is just too weak.

And, any rule you set for yourself that requires self-discipline will be violated, because you’re lazy. You need to make security easier so that you automatically do things more securely. A password safe is the best way to do that, by far. A good rule of thumb for most people is that you should not try to know your own passwords, except the password to your password safe. (People with the need to be hyper-secure will take extraordinary measures, but those aren’t practical or necessary for most of us.)

I use 1Password. Others I know of are LastPass and KeePass Password Safe. I personally wouldn’t use any others, because lesser-known ones are more likely to be malware.

It’s easy to share a password safe’s data across devices, and make a backup of it, by using a service such as Dropbox. The password safe’s files are encrypted, so the contents will not be at risk even if the file syncing service is compromised for some reason. (Use a strong password to encrypt your password safe!)

It’s important to note that online passwords are different from the password you use to log into your personal computer. Online passwords are much more exposed to brute-force, large-scale hacking attacks. By contrast, your laptop probably isn’t going to be subjected to a brute-force password cracking attack, because attackers usually need physical access to the computer to do that. This is not a reason to use a weak password for your computer; I’m just trying to illustrate how important it is to use really long, random passwords for websites and other online services, because they are frequent targets of brute-force attacks.

Here are some other important rules for password security.

  • Never use the same password in more than one service or login. If you do, someone who compromises it will be able to compromise other services you use.
  • Set your password generation program (likely part of your password safe) to make long, random passwords with numbers, special characters, and mixed case. I leave mine set to 20 characters by default. If a website won’t accept such a long password I’ll shorten it. For popular websites such as LinkedIn, Facebook, etc I use much longer passwords, 50 characters or more. They are such valuable attack targets that I’m paranoid.
  • Don’t use your web browser’s features for storing passwords and credit cards. Browsers themselves, and their password storage, are the target of many attacks.
  • Never write passwords down on paper, except once. The only paper copy of my passwords is the master password to my computer, password safe, and GPG key. These are in my bank’s safe deposit box, because if something happens to me I don’t want my family to be completely screwed. (I could write another blog post on the need for a will, power of attorney, advance medical directive, etc.)
  • Never treat any account online, no matter how trivial, as “not important enough for a secure password.” That last item deserves a little story. Ten years ago I didn’t use a password safe, and I treated most websites casually. “Oh, this is just a discussion forum, I don’t care about it.” I used an easy-to-type password for such sites. I used the same one everywhere, and it was a common five-letter English word (not my name, if you’re guessing). Suddenly one day I realized that someone could guess this password easily, log in, change the password and in many cases the email address, and lock me out of my own account. They could then proceed to impersonate me, do illegal and harmful things in my name, etc. Worse, they could go find other places that I had accounts (easy to find – just search Google for my name or username!) and do the same things in many places. I scrambled to find and fix this problem. At the end of it, I realized I had created more than 300 accounts that could have been compromised. Needless to say, I was very, very lucky. My reputation, employment, credit rating, and even my status as a free citizen could have been taken away from me. Don’t let this happen to you!

Use Two-Factor Auth

Two-factor authentication (aka 2-step login) is a much stronger mechanism for account security than a password alone. It uses a “second factor” (something you physically possess) in addition to the common “first factor” (something you know – a password) to verify that you are the person authorized to access the account.

Typically, the login process with two-factor authentication looks like this:

  • You enter your username and password.
  • The service sends a text message to your phone. The message contains a 6-digit number.
  • You must enter the number to finish logging in. With two-factor auth in place, it is very difficult for malicious hackers to access your account, even if they know your password. Two-factor auth is way more secure than other tactics such as long passwords, but it doesn’t mean you shouldn’t also use a password safe and unique, random, non-memorized passwords.

Two-factor auth has a bunch of special ways to handle other common scenarios, such as devices that can’t display the dialog to ask for the 6-digit code, or what if you lose your cellphone, or what if you’re away from your own computer and don’t have your cellphone. Nonetheless, these edge cases are easy to handle. For example, you can get recovery codes for when you lose or don’t have your cellphone. You should store these – where else? – in your password safe.

There seems to be a perception that lots of people think two-factor auth is not convenient. I disagree. I’ve never found it inconvenient, and I use two-factor auth a lot. And I’ve never met these people, whoever they are, who think two-factor auth is such a high burden. The worst thing that happens to me is that I sometimes have to get out of my chair and get my phone from another room to log in.

Unfortunately, most websites don’t support two-factor authentication. Fortunately, many of the most popular and valuable services do, including Facebook, Google, Paypal, Dropbox, LinkedIn, Twitter, and most of the other services that you probably use which are most likely to get compromised. Here is a list of services with two-factor auth, with instructions on how to set it up for each one.

Please enable two-factor authentication if it is supported! I can’t tell you how many of my friends and family have had their Gmail, Facebook, Twitter, and other services compromised. Please don’t let this happen to you! It could do serious harm to you – worse than a stolen credit card.

Secure Your Devices

Sooner or later someone is going to get access to one of your devices – tablet, phone, laptop, thumb drive. I’ve never had a phone or laptop lost or stolen myself, but it’s a matter of time. I’ve known a lot of people in this situation. One of my old bosses, for example, forgot a laptop in the seat pocket of an airplane, and someone took it and didn’t return it.

And how many times have you heard about some government worker leaving a laptop at the coffee shop and suddenly millions of people’s Social Security numbers are stolen?

Think about your phone. If someone stole my phone and it weren’t protected, they’d have access to a bunch of my accounts, contact lists, email, and a lot of other stuff I really, really do not want them messing with. If you’re in the majority of people who leave your phone completely unsecured, think about the consequences for a few minutes. Someone getting access to all the data and accounts on your phone could probably ruin your life for a long time if they wanted to.

All of this is easily preventable. Given that one or more of your devices will someday certainly end up in the hands of someone who may have bad intentions, I think it’s only prudent to take some basic measures:

  • Set the device to require a password, lock code, or pattern to be used to unlock it after it goes to sleep, when it’s idle for a bit, or when you first power it on. If someone steals your device, and can access it without entering your password, you’re well and truly screwed.
  • Use full-device encryption. If someone steals your device, for heaven’s sake don’t let them have access to your data. For Mac users, use File Vault under Preferences / Security and Privacy. Encrypt the whole drive, not just the home directory. On Windows, use TrueCrypt, and on Linux, you probably already know what you’re doing.
  • On Android tablets and phones, you can encrypt the entire device. You have to set up a screen lock code first.
  • If you use a thumb drive or external hard drive to transfer files between devices, encrypt it.
  • Encrypt your backup hard drives. Backups are one of the most common ways that data is stolen. (You have backups, right? I could write another entire blog post on backups. Three things are inevitable: death, taxes, and loss of data that you really care about.)
  • Use a service such as Prey Project to let you have at least some basic control over your device if it’s lost or stolen. Android phones now have the Android Device Manager and Google Location History, but you have to enable these.
  • Keep records of your devices’ make, model, serial number, and so on. Prey Project makes this easy.
  • On your phone or tablet, customize the lockscreen with a message such as “user@email.com – reward if found” and on your laptops, stick a small label inside the lid with your name and phone number. You never know if a nice person will return something to you. I know I would do it for you.

External Links and Resources

Things that don’t help

Finally, here are some techniques that aren’t as useful as you might have been told.

  • Changing passwords doesn’t significantly enhance security unless you change from an insecure password to a strong one. Changing passwords is most useful, in my opinion, when a service has already been compromised or potentially compromised. It’s possible on any given day that an attacker has gotten a list of encrypted passwords for a service, hasn’t yet been discovered, and hasn’t yet decrypted them, and that you’ll foil the attack by changing your password in the meanwhile, but this is such a vanishingly small chance that it’s not meaningful.
  • (OK, this ended up being a list of 1 thing. Tell me what else should go here.)


Here is a summary of the most valuable steps you can take to protect yourself:

  • Get a password safe, and use it for all of your accounts. Protect it with a long password. Make this the one password you memorize.
  • Use long (as long as possible), randomly generated passwords for all online accounts and services, and never reuse a password.
  • Use two-factor authentication for all services that support it.
  • Encrypt your hard drives, phones and tablets, and backups, and use a password or code to lock all computers, phones, tablets, etc when you turn them off, leave them idle, or put them to sleep.
  • Install something like Prey Project on your portable devices, and label them so nice people can return them to you.
  • Write down the location and access instructions (including passwords) for your password safe, computer, backup hard drives, etc and put it in a safe deposit box. Friends try not to let friends get hacked and ruined. Don’t stop at upgrading your own security. Please tell your friends and family to do it, too!

Do you have any other suggestions? Please use the comments below to add your thoughts.

How is the MariaDB Knowledge Base licensed?

Posted in Recent Content on Xaprb at December 16, 2013 12:00 AM

I clicked around for a few moments but didn’t immediately see a license mentioned for the MariaDB knowledgebase. As far as I know, the MySQL documentation is not licensed in a way that would allow copying or derivative works, but at least some of the MariaDB Knowledge Base seems to be pretty similar to the corresponding MySQL documentation. See for example LOAD DATA LOCAL INFILE: MariaDB, MySQL.

Oracle’s MySQL documentation has a licensing notice that states:

You may create a printed copy of this documentation solely for your own personal use. Conversion to other formats is allowed as long as the actual content is not altered or edited in any way. You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Oracle disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium. Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.

Can someone clarify the situation?

S**t sales engineers say

Posted in Recent Content on Xaprb at December 07, 2013 12:00 AM

Here’s a trip down memory lane. I was just cleaning out some stuff and I found some notes I took from a hilarious MySQL seminar a few years back. I won’t say when or where, to protect the guilty.[1]

I found it so absurd that I had to write down what I was witnessing. Enough time has passed that we can probably all laugh about this now. Times and people have changed.

The seminar was a sales pitch in disguise, of course. The speakers were singing Powerpoint Karaoke to slides real tech people had written. Every now and then, when they advanced a slide, they must have had a panicked moment. “I don’t remember this slide at all!” they must have been thinking. So they’d mumble something really funny and trying-too-hard-to-be-casual about “oh, yeah, [insert topic here] but you all already know this, I won’t bore you with the details [advance slide hastily].” It’s strange how transparent that is to the audience.

Here are some of the things the sales “engineers” said during this seminar, in response to audience questions:

  • Q. How does auto-increment work in replication? A: On slaves, you have to ALTER TABLE to remove auto-increment because only one table in a cluster can be auto-increment. When you switch replication to a different master you have to ALTER TABLE on all servers in the whole cluster to add/remove auto-increment. (This lie was told early in the day. Each successive person who took a turn presenting built upon it instead of correcting it. I’m not sure whether this was admirable teamwork or cowardly face-saving.)
  • Q. Does InnoDB’s log grow forever? A: Yes. You have to back up, delete, and restore your database if you want to shrink it.
  • Q. What size sort buffer should I have? A: 128MB is the suggested starting point. You want this sucker to be BIG.

There was more, but that’s enough for a chuckle. Note to sales engineers everywhere: beware the guy in the front row scribbling notes and grinning.

What are your best memories of worst sales engineer moments?

1. For the avoidance of doubt, it was NOT any of the trainers, support staff, consultants, or otherwise anyone prominently visible to the community. Nor was it anyone else whose name I’ve mentioned before. I doubt any readers of this blog, except for former MySQL AB employees (pre-Sun), would have ever heard of these people. I had to think hard to remember who those names belonged to.

Props to the MySQL Community Team

Posted in Recent Content on Xaprb at December 07, 2013 12:00 AM

Enough negativity sometimes gets slung around that it’s easy to forget how much good is going on. I want to give a public thumbs-up to the great job the MySQL community team, especially Morgan Tocker, is doing. I don’t remember ever having so much good interaction with this team, not even in the “good old days”:

  • Advance notice of things they’re thinking about doing (deprecating, changing, adding, etc)
  • Heads-up via private emails about news and upcoming things of interest (new features, upcoming announcements that aren’t public yet, etc)
  • Solicitation of opinion on proposals that are being floated internally (do you use this feature, would it hurt you if we removed this option, do you care about this legacy behavior we’re thinking about sanitizing) I don’t know who or what has made this change happen, but it’s really welcome. I know Oracle is a giant company with all sorts of legal and regulatory hoops to jump through, for things that seem like they ought to be obviously the right thing to do in an open-source community. I had thought we were not going to get this kind of interaction from them, but happily I was wrong.

(At the same time, I still wish for more public bug reports and test cases; I believe those things are really in everyone’s best interests, both short- and long-term.)


Posted in Recent Content on Xaprb at November 26, 2013 12:00 AM

I just tried out EXPLAIN UPDATE in MySQL 5.6 and found unexpected results. This query has no usable index:

EXPLAIN UPDATE ... WHERE col1 = 9 AND col2 = 'something'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: foo
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 55
          ref: NULL
         rows: 51
        Extra: Using where

The EXPLAIN output makes it seem like a perfectly fine query, but it’s a full table scan. If I do the old trick of rewriting it to a SELECT I see that:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: foo
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 51
        Extra: Using where

Should I file this as a bug? It seems like one to me.


Posted in LINBIT Blogs by flip at November 22, 2013 12:41 PM

One of the projects that LINBIT will publish soon1 is drbdmanage, which allows easy cluster-wide storage administration with DRBD 9.

Every DRBD user knows the drill – create an LV, write a DRBD resource configuration file, create-md, up, initial sync, …

But that is no more.

The new way is this: drbdmanage new-volume r0 50 deploy 4, and here comes your quadruple replicated 50 gigabyte DRBD volume.

This is accomplished by a cluster-wide DRBD volume that holds some drbdmanage data, and a daemon on each node that receives DRBD events from the kernel.

Every time some configuration change is wanted,

  1. drbdmanage writes into the common volume,
  2. causing the other nodes to see the PrimarySecondary events,
  3. so that they know to reload the new configuration,
  4. and act upon it – creating or removing an LV, reconfiguring DRBD, etc.
  5. and, if required, cause an initial sync.

As DRBD 8.4.4 now supports DISCARD/TRIM, the initial sync (on SSDs or Thin LVM) is essentially free – a few seconds is all it takes. (See eg. mkfs.ext4 for a possible user).

Further usecases are various projects that can benefit by a “shared storage” layer – like oVirt, OpenStack, libVirt, etc.
Just imagine using a non-cluster-aware tool like virt-manager to create a new VM, and the storage gets automatically sync’ed across multiple nodes…

Interested? You’ll have to wait for a few weeks, but you can always drop us a line.

Freeing some Velocity videos

Posted in Recent Content on Xaprb at November 09, 2013 12:00 AM

Following my previous post on Velocity videos, I had some private email conversations with good folks at O’Reilly, and a really nice in-person exchange with a top-level person as well. I was surprised to hear them encourage me to publish my videos online freely!

I still believe that nothing substitutes for the experience of attending an O’Reilly conference in-person, but I’ll also be the first to admit that my talks are usually more conceptual and academic than practical, and designed to start a conversation rather than to tell you the Truth According To Baron. Thus, I think they’re worth sharing more widely.

O’Reilly alleviated my concerns about “killing the golden goose,” but I like one person’s take on the cost of O’Reilly’s conferences. “You think education is expensive? Try ignorance.”

I’ll post some of my past talks soon for your enjoyment.

DRBD and the sync rate controller, part 2

Posted in LINBIT Blogs by flip at October 29, 2013 09:36 AM

As an update to the earlier blog post, take a look below.

As a reminder: this is about resynchronization (ie. recovery after a node or network problem), not about the replication.

If you’ve got a demanding application it’s possible that it completely fills your I/O bandwidth, disk and/or network, leaving no room for the synchronization to complete. To make the synchronization slow down and let the application proceed, DRBD has the dynamically adaptive resync rate controller.

It is enabled by default with 8.4, and disabled by default with 8.3.
To explicitly enable or disable, set c-plan-ahead to 20 (enable) or 0 (disable).

Note that, while enabled, the setting for the old fixed sync rate is used only as initial guess for the controller. After that, only the c-* settings are used, so changing the fixed sync rate while the controller is enabled won’t have much effect.

What it does

The resync controller tries to use up as much network and disk bandwidth as it can get, but no more than c-max-rate, and throttles if either

  • more resync requests are in flight than what amounts to c-fill-target 1
  • it detects application IO (read or write), and the current estimated resync rate is above c-min-rate2.

The default c-min-rate with 8.4.x is 250 kiB/sec (the old default of the fixed sync-rate), with 8.3.x it was 4MiB/sec.

This “throttle if application IO is detected” is active even if the fixed sync rate is used. You can (but should not, see below) disable this specific throttling by setting c-min-rate to 0.

Tuning the resync controller

It’s hard, or next to impossible, for DRBD to detect how much activity your backend can handle. But it is very easy for DRBD to know how much resync-activity it causes itself.
So, you tune how much resync-activity you allow during periods of application activity.

To do that you should

  • set c-plan-ahead to 20 (default with 8.4), or more if there’s a lot of latency on the connection (WAN link with protocol A);
  • leave the fixed resync rate (the initial guess for the controller) at about 30% or less of what your hardware can handle;
  • set c-max-rate to 100% (or slightly more) of what your hardware can handle;
  • set c-fill-target to the minimum (just as high as necessary) that gets your hardware saturated, if the system is otherwise idle.
    Respectively, figure out the maximum possible resync rate in your setup while the system is idle, then set c-fill-target to the minimum setting that still reaches that rate.
  • And finally, while checking application request latency/responsiveness, tune c-min-rate to the maximum that still allows for acceptable responsiveness.

Most parts of this post were originally published as an ML post by Lars.
Additional information you also find in the drbd.conf manpage.

Looking for a freelancer

Posted in Recent Content on Xaprb at October 26, 2013 12:00 AM

I’m looking for a freelancer to convert this WordPress blog into Markdown for use with Octopress. It should be straightforward – I have already used a plugin to export the data as Markdown. However, a few extra things will need to be taken care of. I have posted this job on Elance. Please submit proposals there!

Get out of your comfort zone

Posted in Recent Content on Xaprb at October 25, 2013 12:00 AM

One of the most valuable life skills you can ever develop is to overcome the urge to stay within your comfort zone. If you stay where you’re familiar and feel safe, two things might happen:

  • You might find out that it’s not safe after all. Bad things can happen where you feel at home just as well as out of the familar.
  • Nothing good will happen. You might skate through life without even living it.

Last week at Velocity I responded to an invitation from O’Reilly to join a lunch with Per Scholas. I didn’t know much about Per Scholas, but I’ve learned to say yes to invitations from O’Reilly. The night before, I watched an inspiring talk during Ignite, from the founder of Per Scholas.

Impressed, I began to really look forward to spending some time with the Per Scholas folks the next day at lunch. The idea was pretty simple: people who were working through the Per Scholas training would pair up one-for-one (as I understand it) with people like myself – technologists, entrepreneurs, hackers, makers, people with perhaps something to share.

I sat down at a table in the designated area, leaving lots of room beside me for people to join. None of the Per Scholas students was there yet. I chatted idly with two other people at the table. Soon one of the Per Scholas staff members sat down with us. I asked her a series of questions for ten minutes or so. She explained that she helped people with basic life and work skills, coaching and mentoring even after they leave the intensive study program.

Still no students had joined us. Suddenly I looked over at another table and saw that it had filled entirely with Per Scholas students, shoulder to shoulder.

“Basic life skill number 1 is to build your network and learn from unfamiliar people,” I remarked to the staffer next to me. “They are missing an opportunity to mingle with us.”

“Yes,” she answered. “They are.”

And then you could (metaphorically) hear the crickets chirping. I finished my lunch and left.

I won’t brag about myself, but I can say that the other people at that table were damn sure worth getting to know. As for me, I considered it a complete waste of my time; I’d chosen it over a conversation I really wanted to attend elsewhere. There’s another blog post in here somewhere – this isn’t the first time I’ve had this kind of experience.

Per Scholas students aren’t paying anything (financially) for their training. Are they getting what they’re paying for? You bet they are. Like all of us, they will get out of it exactly what they put into it.

Get out of your comfort zone. If you don’t, the worst thing in the world happens to you: nothing.

DRBD Proxy 3.1: Performance improvements

Posted in LINBIT Blogs by flip at October 14, 2013 06:35 AM

The threading model in DRBD Proxy 3.1 received a complete overhaul; below you can see the performance implications of these changes.First of all, as it suffered from the distinction between low latency for the meta-data connections vs. high bandwidth for the data connections, a second set of pthreads has been added. The first one runs at the (normally negative) nice level the DRBD Proxy process is started at, while the second set, in order to be “nicer” to the other processes, adds +10 to the nice level and therefore gets a smaller chunk of the cpu time.

Secondly, the internal processing has been changed, too. This isn’t visible externally, of course – you can only notice the performance improvements.

DRBD Proxy 3.1 buffer usage

In the example graph above a few sections can be clearly seen:

  • From 0 to about 11.5 seconds the Proxy buffer gets filled. In case anyone’s interested, here’s the dd output:
    3712983040 Bytes (3.7 GB) copied, 11.4573 s, 324 MB/s
  • Until up to ~44 seconds, there is lzma compression active, with a single context. Slow, but compresses the best.
  • Then I switched to zlib; this is a fair bit faster. All cores are being used, so external requests (by some VMs and other processes) show up as irregular spikes. (Different compression ratios for various input data are “at fault”, too.)
  • At 56 seconds the compression is turned off completely; the time needed for the rest of the data (3GiB in about 13 seconds) shows the bonded-ethernet bandwidth of about 220MB/sec.

For two sets of test machines1 a plausible rate for transferring large blocks2 into the Proxy buffers is 450-500MiB/sec3.
For small buffers there are a few code paths that are not fully optimized yet4, further improvements are to be expected in the next versions, too.

The roadmap for the near future includes a shared memory pool for all connections and WAN bandwidth shaping (ie. limitation to some configured value) — and some more ideas that have to be researched first.

Opinions? Contact us!

“umount is too slow”

Posted in LINBIT Blogs by flip at May 27, 2013 07:31 AM

A question we see over and over again is

Why is umount so slow? Why does it take so long?

Part of the answer was already given in an earlier blog post; here’s some more explanation.

The write() syscall typically writes into RAM only. In Linux we call that “page cache“, or “buffer cache“, depending on what exactly the actual target of the write() system call was.

From that RAM (cache inside the operating system, high in the IO stack) the operating system does periodically do writeouts, at its leisure, unless it is urged to write out particular pieces (or all of it) now.

A sync (or fsync(), or fdatasync(), or …) does exactly that: it urges the operating system to do the write out.
A umount also causes a write out of all not yet written data of the affected file system.


  • Of course the “performance” of writes that go into volatile RAM only will be much better than anything that goes to stable, persistent, storage. All things that have only been written to cache but not yet synced (written out to the block layer) will be lost if you have a power outage or server crash.
    The linux block layer has never seen these changes, DRBD has never seen these changes, they cannot possibly be replicated anywhere.
    Data will be lost.

There are also controller caches which may or may not be volatile, and disk caches, which typically are volatile. These are below and outside the operating system, and not part of this discussion. Just make sure you disable all volatile caches on that level.

Now, for a moment, assume

  • you don’t have DRBD in the stack, and
  • a moderately capable IO backend that writes, say, 300 MByte/s, and
  • around 3 GiByte of dirty data around at the time you trigger the umount, and
  • you are not seek-bound, so your backend can actually reach that 300 MB/s,

you get a umount time of around 10 seconds.

Still with me?

Ok. Now, introduce DRBD to your IO stack, and add a long distance replication link. Just for the sake of me trying to explain it here, assume that because it is long distance and you have a limited budget, you can only afford 100 MBit/s. And “long distance” implies larger round trip times, so lets assume we have a RTT of 100 ms.

Of course that would introduce a single IO request latency of > 100 ms for anything but DRBD protocol A, so you opt for protocol A. (In other words, using protocol A “masks” the RTT of the replication link from the application-visible latency.)

That was latency.

But, the limited bandwidth of that replication link also limits your average sustained write throughput, in the given example to about 11MiByte/s.
The same 3 GByte of dirty data would now drain much slower, in fact that same umount would now take not 10 seconds, but 5 minutes.

You can also take a look at a drbd-user mailing list post.

So, concluding: try to avoid having much unsaved data in RAM; it might bite you. For example, you want your cluster to do a switchover, but the umount takes too long and a timeout hits: the node (should) get fenced, and the data not written to stable storage will be lost.

Please follow the advice about setting some sysctls to start write-out earlier!

Rolling updates with Ansible and Apache reverse proxies

Posted in Arrfab's Blog » Cluster by fabian.arrotin at May 23, 2013 04:36 PM

It's not a secret anymore that I use Ansible to do a lot of things. That goes from simple "one shot" actions with ansible on multiple nodes to "configuration management and deployment tasks" with ansible-playbook. One of the thing I also really like with Ansible is the fact that it's also a great orchestration tool.

For example, in some WSOA flows you can have a bunch of servers behind load balancer nodes. When you want to put a backend node/web server node in maintenance mode (to change configuration/update package/update app/whatever), you just "remove" that node from the production flow, do what you need to do, verify it's up again and put that node back in production. The principle of "rolling updates" is then interesting as you still have 24/7 flows in production.

But what if you're not in charge of the whole infrastructure ? AKA for example you're in charge of some servers, but not the load balancers in front of your infrastructure. Let's consider the following situation, and how we'll use ansible to still disable/enable a backend server behind Apache reverse proxies.

So here is the (simplified) situation : two Apache reverse proxies (using the mod_proxy_balancer module) are used to load balance traffic to four backend nodes (Jboss in our simplified case). We can't directly touch those upstream Apache nodes, but we can still interact on them , thanks to the fact that "balancer manager support" is active (and protected !)

Let's have a look at a (simplified) ansible inventory file :









Let's now create a generic (write once/use it many) task to disable a backend node from apache ! :

# This task can be included in a playbook to pause a backend node
# being load balanced by Apache Reverse Proxies
# Several variables need to be defined :
#   - ${apache_rp_backend_url} : the URL of the backend server, as known by Apache server
#   - ${apache_rp_backend_cluster} : the name of the cluster as defined on the Apache RP (the group the node is member of)
#   - ${apache_rp_group} : the name of the group declared in hosts.cfg containing Apache Reverse Proxies
#   - ${apache_rp_user}: the username used to authenticate against the Apache balancer-manager
#   - ${apache_rp_password}: the password used to authenticate against the Apache balancer-manager
#   - ${apache_rp_balancer_manager_uri}: the URI where to find the balancer-manager Apache mod
- name: Disabling the worker in Apache Reverse Proxies
local_action: shell /usr/bin/curl -k --user ${apache_rp_user}:${apache_rp_password} "https://${item}/${apache_rp_balancer_manager_uri}?b=${apache_rp_backend_cluster}&w=${apache_rp_backend_url}&nonce=$(curl -k --user ${apache_rp_user}:${apache_rp_password} https://${item}/${apache_rp_balancer_manager_uri} |grep nonce|tail -n 1|cut -f 3 -d '&'|cut -f 2 -d '='|cut -f 1 -d '"')&dw=Disable"
with_items: ${groups.${apache_rp_group}}

- name: Waiting 20 seconds to be sure no traffic is being sent anymore to that worker backend node
pause: seconds=20

The interesting bit is the with_items one : it will use the apache_rp_group variable to know which apache servers are used upstream (assuming you can have multiple nodes/clusters) and will play that command for every host in the list obtained from the inventory !

We can now, in the "rolling-updates" playbook, just call the previous tasks (assuming we saved it as ../tasks/apache-disable-worker.yml) :


- hosts: jboss-cluster

serial: 1

user: root


- include: ../tasks/apache-disable-worker.yml

- etc/etc ...

- wait_for: port=8443 state=started

- include: ../tasks/apache-enable-worker.yml

But Wait ! As you've seen, we still need to declare some variables : let's do that in the inventory, under group_vars and host_vars !

group_vars/jboss-cluster :

# Apache reverse proxies settins
apache_rp_group: apache-group-1
apache_rp_user: my-admin-account
apache_rp_password: my-beautiful-pass
apache_rp_balancer_manager_uri: balancer-manager-hidden-and-redirected

host_vars/jboss-1 :

apache_rp_backend_url : 'https://jboss1.myinternal.domain.org:8443'
apache_rp_backend_cluster : nameofmyclusterdefinedinapache

Now when we'll use that playbook, we'll have a local action that will interact with the balancer manager to disable that backend node while we do maintainance.

I let you imagine (and create) a ../tasks/apache-enable-worker.yml file to enable it (which you'll call at the end of your playbook).