# The n days of Christmas

A while ago, I read somebody write (probably on Twitter) that the Christmas song The Twelve Days of Christmas is a good way to teach students about quadratic growth, since the time required to sing the song grows quadratically as each verse increases its length. I thought this was great. But the more I thought about it, it started to bother me.

Grammatically, the song is not about what appears to be quadratic growth. In the song, the narrator’s true love gifts them one partridge on the first day, then another partridge in each subsequent day.

On the first day of Christmas, my true love gave to me a partridge in a pear tree.

On the second day of Christmas, my true love gave to me two turtle doves and a partridge in a pear tree.

The narration does not say that the partridge on the second day is the same partridge that was given on the first day. So my interpretation of this is that the narrator, by day two, has now collected two partridges. This changes things.

The question that I’ve been trying to figure out is what the growth rate is for the total number of gifts (where some of these gifts are humans, so that’s weird…) on the $n$th day of Christmas. Let’s assume for the sake of making the math interesting that the number of days of Christmas grows toward infinity.

They key is that on the $i$th day of Christmas, a new gift is introduced. This gift will be given in groups of $i$ each day from then on. For example, for $i=1$, the partridge is introduced, and each subsequent day, the true love gives one more partridge. For $i = 2$, the turtle doves are introduced, and the true love gives 2 turtle doves each day for the rest of eternity. Another key point is that these gifts don’t start until the $i$th day.

This pattern means that on the $n$th day, the narrator owns $i(n - i + 1)$ of the $i$th gift. That’s a fairly straightforward linear growth. But my main question is how many total gifts does the narrator have on day $n$?

The formula here is
$f(n) = \sum_{i=1}^n i (n - i + 1)$,
which we can simplify by massaging the equations as follows
$f(n) = \sum_{i=1}^n in - \sum_{i=1}^n i^2 + \sum_{i=1}^n i = n \sum_{i=1}^n i - \sum_{i=1}^n i^2 + \sum_{i=1}^n i$.

These summations now have some well known formulas. Using the power of my computer science education (and Wikipedia), this formula simplifies to
$f(n) = n \frac{n(n+1)}{2} - \frac{n(n+1)(2n+1)}{6} + \frac{n(n+1)}{2}$.
$f(n) = (n + 1) \frac{n(n+1)}{2} - \frac{n(n+1)(2n+1)}{6}$.

It’s looking like it’s cubic growth. But let’s finish to make sure. We can combine fractions now, leading to
$f(n) = \frac{3n(n + 1)(n+1)}{6} - \frac{n(n+1)(2n+1)}{6} = \frac{3n(n+1)(n+1) - n(n+1)(2n+1)}{6}= \frac{1}{6} (n^3 + 3n^2 + 2n)$.

Thus, the total number of gifts received by the narrator is $\Theta(n^3)$.

Merry Christmas and Happy Holidays!

Here’s a little problem I’ve been wondering about for a while. Suppose you’re trying to find the solution to the saddle-point optimization
$\min_{x} ~ \max_{y} ~ f(x) + g(y) + x^\top M y$,
where $x$ and $y$ are vectors, functions $f$ and $g$ map from their respective vector spaces to scalar outputs, and $M$ is a matrix. Assume that $f$ is convex and $g$ is concave. Let’s call the objective value of this problem $L(x, y) = f(x) + g(y) + x^\top M y$.

Suppose that some oracle gives you a functional
$h(x) = \arg\max_{y} ~ g(y) + x^\top M y$,
i.e., the solution to the inner maximization of the original saddle-point problem $\arg\max_y L(x, y)$. We can then consider the optimization
$\min_x ~ L(x, h(x)) = \min_x ~ f(x) + g(h(x)) + x^\top M h(x)$.

There’s a surprising (to me) result that the gradient of the function $L(x) \equiv L(x, h(x))$ is
$\nabla_{x} L = \nabla_x f + h(x)^\top M^\top$.

This gradient somehow ignores the gradient of $h(x)$, which is clearly a function that depends on $x$. This gradient also happens to be the partial gradient with respect to $x$. Why does the dependence on $y$ disappear? Let’s try to see why this is true.

Before we do that, let me give an example of where this form of optimization arises. The most prominent example is for Lagrangian relaxation. When you’re trying to minimize some function $J(x)$ with a constraint $Ax = b$, you can form the Lagrangian problem
$\min_x ~ \max_y ~ J(x) + y^\top (Ax - b)$,
which takes the general form we started with if $f(x) = J(x)$, $g(y) = -y^\top b$, and $M = A^\top$. This general form also arises in structured prediction, for example when the inner maximization is the separation oracle of a structured support vector machine or the variational form of inference in a Markov random field.

Back to the general form, let’s try taking the gradient the traditional way, starting with
$\nabla_x L = \nabla_x f + \nabla_x g(h(x)) + \nabla_x x^\top M h(x)$.

The second term can be expanded with some chain rule action:
$\nabla_x g(h(x)) = \underbrace{\left(\frac{d~g(h(x))}{d~h(x)}\right)}_{1 \times |y|} \underbrace{\left(\frac{d~h(x)}{d~x}\right)}_{|y| \times |x|}$. (I’m probably botching the transposes here.)

The third term can be expanded with product rule:
$\nabla_x x^\top M h(x) = h(x)^\top M^\top + x^\top M \left(\frac{d~h(x)}{d~x}\right)$.

We also know something about $h(x)$. Since it comes from maximizing $L$, we know that its gradient wrt $y$ is zero, i.e.,
$\nabla_y L(h(x)) = 0$,
which means $\nabla_{h(x)} g(h(x)) + x^\top M = 0$, and $\frac{d ~ g(h(x))}{d~h(x)} = - x^\top M$.

The second term then can be replaced with
$\left(\frac{d~g(h(x))}{d~h(x)}\right) \left(\frac{d~h(x)}{d~x}\right) = - x^\top M \left(\frac{d~h(x)}{d~x}\right)$.

This replacement directly cancels out a term in the product rule (third term). Leaving us with
$\nabla_x L = \nabla_x f - x^\top M \left(\frac{d~h(x)}{d~x}\right) + h(x)^\top M^\top + x^\top M \left(\frac{d~h(x)}{d~x}\right) = \nabla_x f + h(x)^\top M^\top$.

I suspect there’s an even more generalized form of this property, perhaps generalizing the bilinearity of the problem to some other kind of convex-concave relationship between $x$ and $y$. Let me know if you know of anything along those lines.

# Announcing the wish☁cloud

This spring, I spent some time on a hobby project that has been on my back burner for over a decade. This project is unrelated to my research or anything I do in my day job (aside from being practice for building web apps, which may be useful for my research eventually). The project is called the wish☁cloud.

## What is the wish☁cloud?

The wish☁cloud is a social web application where users can vicariously fulfill their wishes. Users post things that they wish they could do or experience, and other users realize these wishes for the original wishers. The intent is to use technology to amplify the shared human experience, giving its users little moments of happiness and gratitude.

## The history of the wish☁cloud

The wish☁cloud began as a class project from a course I took in college on the role of the Internet in society. Back then, in 2004, it was called “Seriously Vicariously.” Since then, I’d been waiting to build it, or hoping someone else would scoop the idea so I could take part in it.

A confluence of factors led me to finally build a prototype for the wish☁cloud this spring: I’d been thinking a lot about the role of the Internet in our lives and been missing the optimism that existed during the early days of the social web; I’ve been wanting to learn modern web programming to hone my skills; and I needed a hobby project to take breaks from academic life and maintain my sanity. Thus, the wish☁cloud now exists in a nascent form.

## How can I play on the wish☁cloud?

Now that I’ve built it, I want to see if it can actually support users. Check it out at http://wishcloud.org. There’s not much there yet.

If you’re interested in getting an account and trying it, fill out this form: https://goo.gl/forms/oT0Kk8m8D97hVTga2

I’ll send out invitations at my discretion as I gain more confidence that the site works as I intend it to. It may be a while as I’m working out a few issues, and I won’t have much hobby time during the end-of-semester crunch. But add your name if you’re interested!

Edit: I’ve opened up the service to anyone who wants to sign up. I recommend signing up with an external account so that my server isn’t storing any login credentials. You can revoke access to your social media login at any time.

I suspect it will run into problems quickly since I’m not an expert in web programming, system administration, or basically anything behind the entire wish☁cloud. I hope that when it works, you have fun and find a little happiness on the site.

# How I prefer students address me

A few students have asked me lately how I prefer to be addressed. Here’s an ordering of my preference.

1. Bert
2. Professor Huang
3. Dr. Huang
4. Dr. Bert
5. Professor Bert

I’m cool with any of these. I get Dr. Huang most often, for some reason. That’s okay. I’m not sure why I don’t like that as much as Bert or Prof. Huang. It’s probably because being a university professor is my dream job and if you’re gonna be formal, you might as well remind me that I’m doing my dream job (especially right before you ask me to do something I probably don’t want to do). And I like first name because, as a computer scientist, all my favorite professors who I looked up to during my career have been cool first-name people.

But no big deal. You can also just call me “hey you,” or nothing.

Just don’t call me Mr. Huang.

# My Spring Conference Reviewing Data

Last night, I finally finished a marathon month+ of reviewing for machine learning and machine-learning-adjacent conferences. Because of my own poor calendar organization, I foolishly agreed to join the program committees for IJCAI 2105 (Machine Learning Track), KDD 2015, ICML 2015, and UAI 2015. These conferences all had reviewing periods during the month of March and this first bit of April.

My paper assignments for these conferences were six for IJCAI, five for KDD, six for ICML, and five for UAI. While I was reviewing these 22 papers, I was recording my initial overall recommendation (prior to discussion and author response) for each of these papers, just to measure how I tend to score papers. I figured I’d post some of these recordings here, with the major caveat that these are still tiny sample sizes and they are heavily biased by what papers and topics I like to bid on. I’m also going to convert all scores to a scale of [strong reject, weak reject, weak accept, strong accept] to both simplify and muddy up my data a bit to prevent any chance of some smartypants somehow de-anonymizing based on my silly blog post.

• For IJCAI, my recommendations for my six papers were one reject, one weak reject, three weak accepts, and one strong accept.
• For KDD, my recommendations for my five papers were three rejects, one weak reject, and one strong accept.
• For ICML, my recommendations for my six papers were two weak rejects, three weak accepts, and one strong accept.
• For UAI, my recommendations for my five papers were two rejects, two weak rejects, and one weak accept.

Overall, I recommended four rejects, six weak rejects, eight weak accepts, and three strong accepts. I gave zero strong reject recommendations. If my initial vote was the only one that counted, my accept rate for each conference is 66% for IJCAI, 20% for KDD, 66% for ICML, and 20% for UAI. Overall, my acceptance rate was a rather high 45%.

So what is the takeaway message? I’m not sure. I guess this still isn’t enough data to really tell anything. Let me attempt to make some claims.

• The numbers suggest that I like ICML and IJCAI papers better than UAI and KDD papers. I would be pretty surprised if this is true and not just a result of randomness. It’s hard to tell with the IJCAI ML track being a brand new idea. I usually imagine myself as liking UAI papers the most of all the medium-sized ML conferences.
• The numbers suggest that I like ICML papers about graphical models, structured prediction, and relational learning. Since these are the topic areas I usually bid on and that Toronto Paper Matching usually assigns to me. This is plausible, but not consistent with my low accept rate for UAI.
• By a similar argument, the numbers suggest that I don’t like KDD papers on graph mining and relational models. This is also plausible, but surprising. I think in this case, I really like the problem area of data mining from complex network data, but maybe I’m often unsatisfied by the methods people propose. It’s possible I’m too critical of this kind of work.

Sorry these are all pretty weak analyses. The sample size is just too small. If I want to understand my own biases better, I need to volunteer to review even more (note to self: do not do this), or keep better records from previous years of reviewing.

Only one thing is absolutely clear from this month of reading all these submissions: seriously everyone needs to stop using the word “employ.”

This is just an announcement regarding a FAQ primarily for the Virginia Tech community. I’m scheduled to teach CS5824/ECE5424 Advanced Machine Learning. This course is named this way because it’ll be taught in conjunction with CS4824/ECE4424 for undergrads, which is called “Machine Learning.” So the “Advanced” modifier indicates that you’ll be graded at an advanced, graduate level, but the course in both cases will be intended for students who want an introduction to machine learning. In other words, it’ll be my version of this course, currently being taught by Dhruv.

Sorry for the confusion! The university needed these courses to have different names.

Dhruv will be teaching a seminar course on deep learning in the fall (not yet in the course catalog), if you’re looking for a course to continue learning after the intro, and I’m working out details on teaching a graphical models seminar-level course in Spring 2016. So there will be plenty of chances for more advanced study in the next academic year, but “Advanced Machine Learning” is “Introduction to Machine Learning for Graduate Students.”

# The Plurality of Data

Plural Data. (I know that’s not Data on the right… Or is it?)

Let’s set this straight. The English word “data” should almost always be used as a mass noun, which means it shouldn’t be treated as a plural noun. Lots of people make the mistake of using it as a plural noun because they think it sounds fancier its original Latin form was a plural of “datum.” But the way we use it in modern English, especially in computer science, makes the concept of a “datum” make no sense.

Data pluralists think of data sets as many “datums,” but what is a datum? An example? A measurement? A dimension of a measurement? A bit in the binary representation of a dimension of a measurement in a IEEE floating point representation?

It is none of these. Data, especially high-dimensional and complex data, does not consist of countable singulars in any way that actually corresponds to what we mean when we talk about it, so it should not be a plural. Instead, it should be treated synonymously to “information.” We analyze “this information” or “this data,” not “these information” or “these data.”

I’m pretty darn certain of this reasoning. But is there any example where another word is (justifiably) plural when its singulars are uncountable or undefined? Or is there any scenario in modern usage where datums can be reasonably defined? If there are, then those that data might change my mind.

# Contraction-based convergence proofs

Every now and then, teaching gives us some really valuable experiences. I’m teaching graduate level AI, and I had a really great experience learning something I had never paid much attention to before. Specifically, I read a section of Russell and Norvig’s book that deals with the convergence of Bellman’s utility updates for Markov decision processes.

When it comes to convergence proofs, the kind I’ve seen usually reason about the high-dimensional curvature of some objective function, or they do nasty things like analyzing the unrolled recursion tree. Yuck! For this reason, when I saw a section on a convergence proof, my immediate instinct was to run away and skip it for my and my students’ sakes.

But the convergence of Bellman’s updates can be proven using the concept of contraction, which says that an operator $f$ on two inputs $x$ and $x'$ makes them “closer,” according to some measure of distance $d$.

$d(x, x') < d(f(x), f(x'))$

This property means that repeating the operator, e.g., $f(f(f(f(x))))$ and $f(f(f(f(x'))))$, makes any two different initializations $x$ and $x'$ get closer and closer, and if the contraction is strong enough, makes them converge to the same value, giving a proof of convergence and uniqueness for the update $\latex f$. Beautiful, no?

The work comes in proving the contraction bound. There’s freedom in choosing the distance metric. The stronger the contraction, the faster the convergence. The Bellman updates contract by a constant factor, which I imagine is about as fast a convergence as you can get. And I imagine there can be some traps when things contract, but the amount of contraction becomes infinitesimal; in that case, this argument doesn’t actually show convergence.

I’m now curious what other convergent algorithms can be proven to converge using a contraction argument. Or more importantly, whether I can reach into my bag of algorithms-that-I-can’t-yet-prove-convergence-of and find one that I can finally solve using this approach.

I’m also curious if some type of contraction is a necessary condition for a convergent operator with a unique fixed point. I’m sure I could find that by doing some reading. But first, I have to put out a million tiny fires (this is how I describe my life now as a professor).

# A quick Keynote tip

A few months ago, I discovered that Keynote has connection lines, which dynamically connect objects that you draw. I used to draw graphs and diagrams in Omnigraffle for this reason, but with native Keynote connection lines, you can animate them directly, etc. The ability to animate graphs is super important, and I’m a huge proponent in using animation in talks (in the right places) to help audiences visualize some of the absurdly complex things we present.

But it still wasn’t perfect, because Keynote doesn’t have built-in keyboard shortcuts to insert connection lines, so you have to select the objects you want, go to the pulldown menu, and select the command. This process is not good for avoiding repetitive-stress injury.

Well, I recently discovered that Mac OS lets you define keyboard shortcuts for any application. Maybe this is common knowledge, but in System Preferences, the Keyboard module lets you add App Shortcuts.

I set mine to cmd-L for a “Straight Connection Line” and cmd-option-L for a “Curved Connection Line.” And now I draw graphs natively in Keynote.

Unfortunately, Magic Move doesn’t work great with connection lines when they have to change orientation. They disappear as the nodes move and reappear once the nodes settle into their location on the next slide. That would be a really cool and useful animation to help visualize graphs. Oh well. Can’t win ’em all.

# On the NIPS Experiment and Review Process

A lot of attention has been focused on the NIPS experiment on its own reviewing system, Eric Price’s blog post on it, and unfortunately my flippant tweet pointing out some inaccuracies in the blog post. So I’ll try to clarify what I meant here.

Edit: folks looking for more info on the NIPS experiment should check out Neil Lawrence’s initial blog post on it and stay tuned, as he and Corinna Cortes will be eventually releasing a more thorough writeup on the results.

Eric Price’s Post

In the spirit of Eric’s post, here’s a TL;DR: the minor inaccuracies in Eric’s post should not detract from the main message that the results from the NIPS experiment are concerning, but we should be careful to get the details right before trying to explain the scary results with faulty information.

The three details that popped out at me as inaccurate were the idea that the program committee was split into two independent committees, that the committee only discussed papers with an average score between 6.0 and 6.5, and that area chairs could not see if a paper was a duplicate. In Eric’s defense, a lot of these details and more were clarified in the comments and discussion below his post, so what I’m writing here is somewhat redundant (e.g., he points out in the comments that NIPS does not have a fixed acceptance rate).

On the first point. I’m not completely sure how the program chairs implemented the duplication. But I don’t think the concept of splitting the program committee in half is correct or makes sense the way NIPS reviewing is organized. Most of the area chairing, fostering discussion, quality control of reviews, etc., is done independently by each area chair, so there no real concept of split independent committees. I’m not even sure what that would mean. The committee is in some sense already split 92 ways, for each area chair, but reviewers may be assigned papers with different chairs, so it’s not really independent. The way Eric’s post describes it invokes an image of two huge conference rooms of area chairs talking about the papers, which is a model that doesn’t scale to the absurd size of the NIPS conference nowadays.

As for the issue of which papers were discussed, I believe this was simply a recantation of a half-joking description of what the review process was, but the tongue-in-cheek tone is lost in writing. First, the reviewers discussed any paper that wasn’t immediately an obvious, high-confidence consensus. Then the area chairs examined all these reviews, papers, discussions, joining in the discussion in most cases. Then the area chairs met in pairs to go over all their papers together (except conflicted papers), making pair judgement calls on each and identifying controversial or tricky decisions that needed to be discussed at the larger meetings. After that, the chairs met in groups of four with one of the program chairs to go over these tricky decisions. Nowhere in this process does the average score even come up other than as a very rough way for area chairs to sort the papers, all of which they have to individually consider. But we also had much better heuristics to sort by, like controversiality, spread of scores, and low confidence scores. I’ll write more about my own personal experience with this process later.

Lastly, area chairs, who had access to author identities (because we are partially responsible for preventing conflicts of interest), could in fact see if a paper was a duplicate. To make the experiment work on CMT, the program chairs had to create a fake additional author, whose name was something like NIPS Dup1. So it was pretty obvious. It’s not clear how this affected the experiment. Different area chairs might have reacted differently to it. I did my best to ignore this bit of information, to preserve the experiment, but there’s no way that my awareness of this didn’t affect my behavior. Did I give these papers more attention because I knew my area chairing would be compared to someone else’s? Or did I give them less attention because I wanted to focus on my actual assigned papers? I wanted to do neither, but who knows how successful I was. This fact surely contaminates the experimental results and any conclusions we might be able to definitely make about it.

I think the reason I was irked by these nitpicky details was partially because the discussion in the comments seemed to suggest that they were the problem with NIPS and the reason for the inconsistency. But I was hasty to criticize on Twitter, because Eric’s post really brings up a few important points and interesting discussion and hypotheses. It is truly great that people are talking about it, and lots of non-machine-learning communities have been made aware of the NIPS experiment through Eric’s post. I’d like some more caution about overgeneralizing from the experimental results, like Eric’s TL;DR does, but I suppose that’s inevitable, so we might as well get right to it with the first public writeup on the results. Hopefully people read on past that, since the rest of his analysis is pretty level-headed and thoughtful.

My own experience as area chair

Since this was my first year on the senior program committee, I got a new perspective on the NIPS review process that might be helpful to share to others. It wasn’t a huge difference from what I’d seen as a reviewer in the past, but since I was responsible for chairing 21 papers, I got a somewhat larger sample size. But it was still only 21 papers, which is much less than the number of papers that the duplication experiment used, and my sample is super biased toward my subject areas. So take this with a HUGE grain of salt. It’s just an anecdote, not a scientific study.

My initial experience with the process was disappointing. I watched the reviewers enter their reviews, and more often than not, these reported opinions that were unsubstantiated, unsupported in the review writeups, late, and unproofread. As a rough guess of the ratio here, each paper had three reviewers initially, and I would say about 3 out of 4 papers had one thoughtful review. After that, the reviewers with crappy reviews became very hard to reach. They didn’t participate in the discussion without a ton of prodding, they didn’t respond to the author rebuttals, and when they did respond, they surprisingly tended to stick to their original, seemingly unsubstantiated opinions.

So that was terrible. The good news is all that happened afterwards. Like I mentioned above, we area chairs met in pairs. (Even though I have only good things to say about the area chairs I met with, I’ll leave names off to preserve the anonymity of the review process.) My AC partner met with me on Skype for (I think) about two hours, maybe three, and we talked through each of our papers. The easy decisions, where all reviewers agreed, reported high confidence, and demonstrated high confidence through their words, were very quick discussions, but most of our time was spent debating about the more difficult decisions. Throughout this discussion, it was clear to me how thoughtfully my partner had considered the reviews and the papers, understanding when reviewers were making valid arguments and when they were being unfair. In many ways, I was reassured that another area chair was trying as hard as I was to find the signal in the noise.

The next stage was a meeting with four area chairs and a program chair. Again, we met on Skype, and this time went through a filtered list of the papers our respective AC pairs had decided needed further discussion. This list mostly included papers on the borderline of acceptance or ones where the reviewers were unwilling to agree on a decision. Reading the reviews and looking at the papers, we did our best to make a decision during the meeting, and again, anytime we couldn’t reach a decision, it was marked for further discussion and consideration as a borderline paper by the next level of the hierarchy: the program chairs.

After that point, I’m in the dark as to what the PCs’ decision process was. I know at some point they had to cut off acceptances for reasons of physical venue space, but it’s my understanding that the acceptance rate before that point is already pretty close to the usual 25%-ish rate.

So what now?

So my experience with the whole process could be summarized by saying that I saw some really disappointing reviewers, but was rescued from losing faith in NIPS by the thoughtfulness of the area chairs I worked with and the program chairs. But even within the sea of bad reviews, there were some standout individuals who anchored (mixed maritime metaphor…) these discussions and decisions in reason, fairness, and perspective. So I think as NIPS continues expanding to accommodate the growing interest in its topics, we’ll have to figure out how to address the growing proportion of bad reviewers that we’ll need to recruit to handle its scale. Maybe the answer is better reviewer training, or maybe the answer is more work for the more competent reviewers, or maybe there is no answer.

One important aspect to consider is, when we talk about peer review being broken, unscalable, or any other complaint, that the primary purpose of these huge processes is not to assign credit for peoples’ work, it’s to decide what content to present in a conference, and despite all the noise, and all the flaws in the system, the quality of the conferences I’ve been attending has always been consistently high. More specifically, to not generalize beyond what I’m discussing in this post, NIPS 2014 was a great conference. Of course, in the real world, assigning credit is a super-important part of this whole deal. (Luckily, the process for journals and for funding decisions tends to happen at a smaller scale, so my guess is it’s less noisy and less crappy, but that certainly isn’t perfect either.) So something does have to be fixed, but it’s not as broken as it may feel at times.

(There are people experimenting with new models of publication to try to address these issues, e.g., see the talks from the ICML 13 Workshop on Peer Review, and the International Conference on Learning Representations (ICLR))

Lastly, I’ll conclude with two questions I’ve been asking myself and my colleagues lately. Of all the peer review experiences you’ve had as a reviewer, a chair, or an author, how often do all the reviewers understand the paper and make a valid decision based on this understanding? For me, the answer is nearly zero percent of the time. Reviewers almost never understand the papers I submit, whether they accept or reject, and when I’m a reviewer or chair, reviewers almost always have different interpretations of what’s going on in the paper, which means they can’t all be correct. So peer review is broken, right? Maybe, but as a second question, how often is the final decision the right decision? For me, the answer is pretty close to always. E.g., the papers I’ve had rejected for dumb reasons have always needed a lot of improvement in retrospect, or maybe the papers reviewers don’t get aren’t written or thought through well enough. Maybe despite reviewers not knowing what they’re reading, their B.S. detectors still work fine. But we shouldn’t just settle for that if it’s true. I dunno. Lots to think about…