The Plurality of Data

Plural Data. (I know that’s not Data on the right… Or is it?)

Let’s set this straight. The English word “data” should almost always be used as a mass noun, which means it shouldn’t be treated as a plural noun. Lots of people make the mistake of using it as a plural noun because ~~they think it sounds fancier~~ its original Latin form was a plural of “datum.” But the way we use it in modern English, especially in computer science, makes the concept of a “datum” make no sense.

Data pluralists think of data sets as many “datums,” but what is a datum? An example? A measurement? A dimension of a measurement? A bit in the binary representation of a dimension of a measurement in a IEEE floating point representation?

It is none of these. Data, especially high-dimensional and complex data, does not consist of countable singulars in any way that actually corresponds to what we mean when we talk about it, so it should not be a plural. Instead, it should be treated synonymously to “information.” We analyze “this information” or “this data,” not “these information” or “these data.”

I’m pretty darn certain of this reasoning. But is there any example where another word is (justifiably) plural when its singulars are uncountable or undefined? Or is there any scenario in modern usage where datums can be reasonably defined? If there are, then ~~those~~ that data might change my mind.

Contraction-based convergence proofs

Every now and then, teaching gives us some really valuable experiences. I’m teaching graduate level AI, and I had a really great experience learning something I had never paid much attention to before. Specifically, I read a section of Russell and Norvig’s book that deals with the convergence of Bellman’s utility updates for Markov decision processes.

When it comes to convergence proofs, the kind I’ve seen usually reason about the high-dimensional curvature of some objective function, or they do nasty things like analyzing the unrolled recursion tree. Yuck! For this reason, when I saw a section on a convergence proof, my immediate instinct was to run away and skip it for my and my students’ sakes.

But the convergence of Bellman’s updates can be proven using the concept of contraction, which says that an operator $f$ on two inputs $x$ and $x'$ makes them “closer,” according to some measure of distance $d$ .

$d(x, x') < d(f(x), f(x'))$

This property means that repeating the operator, e.g., $f(f(f(f(x))))$ and $f(f(f(f(x'))))$ , makes any two different initializations $x$ and $x'$ get closer and closer, and if the contraction is strong enough, makes them converge to the same value, giving a proof of convergence and uniqueness for the update $\latex f$. Beautiful, no?

The work comes in proving the contraction bound. There’s freedom in choosing the distance metric. The stronger the contraction, the faster the convergence. The Bellman updates contract by a constant factor, which I imagine is about as fast a convergence as you can get. And I imagine there can be some traps when things contract, but the amount of contraction becomes infinitesimal; in that case, this argument doesn’t actually show convergence.

I’m now curious what other convergent algorithms can be proven to converge using a contraction argument. Or more importantly, whether I can reach into my bag of algorithms-that-I-can’t-yet-prove-convergence-of and find one that I can finally solve using this approach.

I’m also curious if some type of contraction is a necessary condition for a convergent operator with a unique fixed point. I’m sure I could find that by doing some reading. But first, I have to put out a million tiny fires (this is how I describe my life now as a professor).

A quick Keynote tip

A few months ago, I discovered that Keynote has connection lines, which dynamically connect objects that you draw. I used to draw graphs and diagrams in Omnigraffle for this reason, but with native Keynote connection lines, you can animate them directly, etc. The ability to animate graphs is super important, and I’m a huge proponent in using animation in talks (in the right places) to help audiences visualize some of the absurdly complex things we present.

But it still wasn’t perfect, because Keynote doesn’t have built-in keyboard shortcuts to insert connection lines, so you have to select the objects you want, go to the pulldown menu, and select the command. This process is not good for avoiding repetitive-stress injury.

Well, I recently discovered that Mac OS lets you define keyboard shortcuts for any application. Maybe this is common knowledge, but in System Preferences, the Keyboard module lets you add App Shortcuts.

I set mine to cmd-L for a “Straight Connection Line” and cmd-option-L for a “Curved Connection Line.” And now I draw graphs natively in Keynote.

Unfortunately, Magic Move doesn’t work great with connection lines when they have to change orientation. They disappear as the nodes move and reappear once the nodes settle into their location on the next slide. That would be a really cool and useful animation to help visualize graphs. Oh well. Can’t win ’em all.

Bert Huang's Blog

Monthly Archives: February 2015

The Plurality of Data

Contraction-based convergence proofs

A quick Keynote tip