Flavio Bartmann writes:

I don’t know if you have seen this, the latest from John Ioannidis (together with Michaela Schippers this time), Saving Democracy From the Pandemic, an article at Tablet magazine. In spite of its grandiose title, it is a content free diatribe, mercifully short. It promotes skepticism in science, which is mildly ironic given his sensitivity when the skepticism is directed towards his work. Might merit some comment.

My reply: I took a quick read and couldn’t make much sense of the article. The authors talked about “concrete values like freedom and equality”: I have no idea what they’re talking about. It seemed like word salad.

Bartmann responded:

I believe that unexpected, material events lead many people (including some very smart ones) to strange places.

To return to the article under discussion: The topic of societal and governmental reactions to emergencies is important, very much worth writing about. I think where the article went wrong was in its framing of “health authorities and politicians” as the bad guys, without recognizing that these decisions are made in a larger context, in this case with lots of people being afraid of spreading covid, parents pulling their kids out of schools in March 2020, etc. As with other controversial government policies such as tax cuts and mandatory criminal sentencing laws, there’s a complicated push and pull between government, political entrepreneurs, and public opinion, and it’s a mistake to try to collapse this into a model of governments imposing policies on the public.

Also the authors could think a bit more about the context of their statements. For example, they write, “It is critical in free, democratic societies that media never become a vessel for a single, state-sanctioned, official narrative at the expense of public debate and freedom of speech. Removing content considered ‘fake’ or ‘false’ in order to limit the ability of ordinary people to judge information for themselves only inflames polarization and distrust of the public sphere.”—but we live in a social media environment where political and media leaders such as Ted Cruz and Alex Jones spread dangerous conspiracy theories. The example of Cruz illustrates that the “state” is not unitary; and the example of Jones illustrates issues with “media.”

I’n not saying that the authors of this one piece need to engage with all these complexities. I just think they should be aware of them, and if they want to make suggestions or criticisms of policies or attitudes, it would help for them to be specific rather than indulging in generalities about freedom and equality etc.

**P.S.** I clicked through the Tablet site and saw that it describes itself as “a daily online magazine of Jewish news, ideas, and culture.” I didn’t see anything Jewish-related in the above-linked article so maybe I’m missing something here?

**P.P.S.** Regarding the title of the post: Yes, I too am coming down from the ivory tower to lecture here. You’ll have to judge this post on its merits, not based on my qualifications. And if I go around using meaningless phrases such as “concrete values like freedom and equality,” please call me on it!

Michael Nelson writes:

I wanted to point out a paper, Stabilizing Subgroup Proficiency Results to Improve the Identification of Low-Performing Schools, by Lauren Forrow, Jennifer Starling, and Brian Gill.

The authors use Mr. P to analyze proficiency scores of students in subgroups (disability, race, FRL, etc.). The paper’s been getting a good amount of attention among my education researcher colleagues. I think this is really cool—it’s the most attention Mr. P’s gotten from ed researchers since your JREE article. This article isn’t peer reviewed, but it’s being seen by far more policymakers than any journal article would.

All the more relevant that the authors’ framing of their results is fishy. They claim that some schools identified as underperforming, based on mean subgroup scores, actually aren’t, because they would’ve gotten higher means if the subgroup n’s weren’t so small. They’re selling the idea that adjustment by poststratification (which they brand as “score stabilization”) may rescue these schools from their “bad luck” with pre-adjustment scores. What they don’t mention is that schools with genuinely underperforming (but small) subgroups could be misclassified as well-performing if they have “good luck” with post-adjustment scores. In fact, they don’t use the word “bias” at all, as in: “Individual means will have less variance but will be biased toward the grand mean.” (I guess that’s implied when they say the adjusted scores are “more stable” rather than “more accurate,” but maybe only to those with technical knowledge.)

And bias matters as much as variance when institutions are making binary decisions based on differences in point estimates around a cutpoint. Obviously, net bias up or down will be 0, in the long run, and over the entire distribution. But bias will always be net positive at the bottom of the distribution, where the cutpoint is likely to be. Besides, relying on net bias and long-run performance to make practical, short-run decisions seems counter to the philosophy I know you share, that we should look at individual differences not averages whenever possible. My fear is that, in practice, Mr. P might be used to ignore or downplay individual differences–not just statistically but literally, given that we’re talking about equity among student subgroups.

To the authors’ credit, they note in their limitations section that they ought to have computed uncertainty intervals. They didn’t, because they didn’t have student-level data, but I think that’s a copout. If, as they note, most of the means that moved from one side of the cutoff to the other are quite near it already, you can easily infer that the change is within a very narrow interval. Also to their credit, they acknowledge that binary choices are bad and nuance is good. But, also to their discredit, the entire premise of their paper is that the education system will, and presumably should, continue using cutpoints for binary decisions on proficiency. (That’s the implication, at least, of the US Dept. of Ed disseminating it.) They could’ve described a nuanced *application* of Mr. P, or illustrated the absurd consequences of using their method within the existing system, but they didn’t.

Anyway, sorry this went so negative, but I think the way Mr. P is marketed to policymakers, and its potential unintended consequences, are important.

Nelson continues:

I’ve been interested in this general method (multilevel regression with poststratification, MRP) for a while, or at least the theory behind it. (I’m not a Bayesian so I’ve never actually used it.)

As I understand it, MRP takes the average over all subgroups (their grand mean) and moves the individual subgroup means toward that grand mean, with smaller subgroups getting moved more. You can see this in the main paper’s graphs, where low means go up and high means go down, especially on the left side (smaller n’s). The grand mean will be more precise and more accurate (due to something called superefficiency), while the individual subgroup means will be much more precise but can also be much more biased toward the grand mean. The rationale for using the biased means is that very small subgroups give you very little information beyond what the grand mean is already telling you, so you should probably just use the grand mean instead.

In my view, that’s an iffy rationale for using biased subgroup proficiency scores, though, which I think the authors should’ve emphasized more. (Maybe they’ll have to in the peer-reviewed version of the paper.) Normally, bias in individual means isn’t a big deal: we take for granted that, over the long run, upward bias will be balanced out by downward bias. But, for this method and this application, the bias won’t ever go away, at least not where it matters. If what we’re looking at is just the scores around the proficiency cutoff, that’s generally going to be near the bottom of the distribution, and means near the bottom will always go up. As a result, schools with “bad luck” (as the authors say) will be pulled above the cutoff where they belong, but so will schools with subgroups that are genuinely underperforming.

I have a paper under review that derives a method for correcting a similar problem for effect sizes—it moves individual estimates not toward a grand mean but toward the true mean, in a direction and distance determined by a measure of the data’s randomness.

I kinda see what Nelson is saying, but I still like the above-linked report because I think that in general it is better to work with regularized, partially-pooled estimates than with raw estimates, even if those raw estimates are adjusted for noise or multiple comparisons or whatever.

To help convey this, let me share a few thoughts regarding hierarchical modeling in this general context of comparing averages (in this case, from different schools, but similar issues arise in medicine, business, politics, etc.).

1. Many years ago, Rubin made the point that, when you start with a bunch of estimates and uncertainties, classical multiple comparisons adjustments are effectively increasing can be increasing the standard errors so that fewer comparisons are statistically significant, whereas Bayesian methods move the estimates around. Rubin’s point was that you can get the right level of uncertainty much more effectively by moving the intervals toward each other rather than by keeping their centers fixed and then making them wider. (I’m thinking now that a dynamic visualization would be helpful to make this clear.)

It’s funny because Bayesian estimates are often thought of as trading bias for variance, but in this case the Bayesian estimate is so direct, and it’s the multiple comparisons approaches that do the tradeoff, getting the desired level of statistical significance by effectively making all the intervals wider and thus weakening the claims that can be made from data. It’s kinda horrible that, under the classical approach, your inferences for particular groups and comparisons will on expectation get vaguer as you get data from more groups.

We explored this idea in our 2000 article, Type S error rates for classical and Bayesian single and multiple comparison procedures (see here for freely-available version) and more thoroughly in our 2011 article, Why we (usually) don’t have to worry about multiple comparisons. In particular, see the discussion on pages 196-197 of that latter paper (see here for freely-available version):

2. MRP, or multilevel modeling more generally, does not “move the individual subgroup means toward that grand mean.” It moves the error terms toward zero, which implies that it moves the local averages toward their predictions from the regression model. For example, if you’re predicting test scores given various school-level predictors, then multilevel modeling partially pools the individual school means toward the fitted model. It would not in general make sense to partially pool toward the grand mean—not in any sort of large study that includes all sorts of different schools. (Yes, in Rubin’s classic 8-schools study, the estimates were pooled toward the average, but these were 8 similar schools in suburban New Jersey, and there were no available school-level predictors to distinguish them.)

3. I agree with Nelson that it’s a mistake to summarize results using statistical significance, and this can lead to artifacts when comparing different models. There’s no good reason to make decisions based on whether a 95% interval includes zero.

4. I like multilevel models, but point estimates from any source—multilevel modeling or otherwise—have unavoidable problems when the goal is to convey uncertainty. See our 1999 article, All maps of parameter estimates are misleading.

In summary, I like the Forrow et article. The next step should be to go beyond point estimates and statistical significance and to think more carefully about decision making under uncertainty in this educational context.

]]>Robert Thornett writes:

What if, for example, instead of spending months learning about derivatives, quadratic equations, and the interior angles of rhombuses, students learned how to interpret financial and medical reports and climate, demographic, and electoral statistics? They would graduate far better equipped to understand math in the real world and to use math to make important life decisions later on.

I agree. I mean, I can’t be sure; he’s making a causal claim for which there is no direct evidence. But it makes sense to me.

Just one thing. The “interior angles of rhombuses” thing is indeed kinda silly, but I think it would be awesome to have a geometry class where students learn to solve problems like: Here’s the size of a room, here’s the location of the doorway opening and the width of the hallway, here are the dimensions of a couch, now how do you manipulate the couch to get it from the hall through the door into the room, or give a proof that it can’t be done. That would be cool, and I guess it would motivate some geometrical understanding.

In real life, though, yeah, learning standard high school and college math is all about turning yourself into an algorithm for solving exam problems. If the problem looks like A, do X. If it looks like B, to Y, etc.

Lots of basic statistics teaching looks like that too, I’m afraid. But statistics has the advantage of being one step closer to application, which should help a bit.

Also, yeah, I think we can all agree that “derivatives, quadratic equations, and the interior angles of rhombuses” are important too. The argument is not that these should not be taught, just that these should not be the first things that are taught. Learn “how to interpret financial and medical reports and climate, demographic, and electoral statistics” first, then if you need further math courses, go on to the derivatives and quadratic equations.

]]>We’ve reached the endpoint of our third seminar speaker competition. Top seeds J. R. R. Tolkien, Miles Davis, David Bowie, Dr. Seuss, Hammurabi, Judas, Martha Stewart, and Yo-Yo Ma fell by the wayside—indeed, Davis, Judas, and Ma didn’t even get to round 2!—; unseeded heavyweight Isaac Newton lost in round 3; and dark-horse favorites James Naismith, Henry Winkler, Alison Bechdel, and J. Robert Lennon couldn’t make the finish line either.

What we have is two beloved and long-lived children’s book authors. Cleary was more prolific, but maybe only because she got started at a younger age. Impish Ramona or serious Laura . . . who’s it gonna be?

Either way, I assume it will go better than this, from a few years ago:

CALL FOR APPLICATIONS: LATOUR SEMINAR — DUE DATE AUGUST 11 (extended)

The Brown Institute for Media Innovation, Alliance (Columbia University, École Polytechnique, Sciences Po, and Panthéon-Sorbonne University), The Center for Science and Society, and The Faculty of Arts and Sciences are proud to presentBRUNO LATOUR AT COLUMBIA UNIVERSITY, SEPTEMBER 22-25

You are invited to apply for a seminar led by Professor Bruno Latour on Tuesday, September 23, 12-3pm. Twenty-five graduate students from throughout the university will be selected to participate in this single seminar given by Prof. Latour. Students will organize themselves into a reading group to meet once or twice in early September for discussion of Prof. Latour’s work. They will then meet to continue this discussion with a small group of faculty on September 15, 12-2pm. Students and a few faculty will meet with Prof. Latour on September 23. A reading list will be distributed in advance.If you are interested in this 3-4 session seminar (attendance at all 3-4 sessions is mandatory), please send

Name:

Uni:

Your School:

Your Department:

Year you began your terminal degree at Columbia:

Thesis or Dissertation title or topic:

Name of main advisor:In one short, concise paragraph tell us what major themes/keywords from Latour’s work are most relevant to your own work, and why you would benefit from this seminar. Please submit this information via the site

http://brown.submittable.com/submit

The due date for applications is August 11 and successful applicants will be notified in mid-August.

That was the only time I’ve heard of a speaker who’s so important that you have to apply to attend his seminar! And, don’t forget, “attendance at all 3-4 sessions is mandatory.” I wonder what they did to the students who showed up to the first two seminars but then skipped #3 and 4.

**Past matchup**

Wilder faced Sendak in the last semifinal. Dzhaughn wrote:

This will be a really tight match up.

Sendak has won the Laura Ingalls Wilder Award. Yet no one has won more Maurice Sendak Awards than Wilder. And she was dead when he won it.

Maurice Sendak’s paid for his college by working at FAO Schwarz. That’s Big, isn’t it?

The Anagram Department notices “Serial Lulling Award,” not a good sign for a seminar speaker. “American Dukes” and “Armenia Sucked” are hardly top notch, but less ominous.

So, I come up with a narrow edge to Sendak but I hope there is a better reason.

“Serial Lulling Award” . . . that is indeed concerning!

Raghu offers some thoughts, which, although useless for determining who to advance to the final round, are so much in the spirit of this competition that I’ll repeat them here:

This morning I finished my few-page-a-day reading of the biography of basketball inventor and first-round loser James Naismith, and I was struck again by how well-suited he is to this tournament:

“It was shortly after seven o’clock, and the meal was over. He added briskly, ‘Let me show you some of the statistics I’ve collected about accidents in sports. I’ve got them in my study.’ He started to rise from the table and fell back into his chair. Ann recognized the symptoms. A cerebral hemorrhage had struck her father.” — “The Basketball Man, James Naismith” by Bernice Larson Webb

Statistics! Sports! Medical inference!

I am not, however, suggesting that the rules be bent; I’ve had enough of Naismith.

I finished Sendak’s “Higglety Pigglety Pop! Or, There Must Be More to Life” — this only took me 15 minutes or so. It is surreal, amoral, and fascinating, and I should read more by Sendak. Wilder is neither surreal nor amoral, though as I think I noted before, when I was a kid I found descriptions of playing ball with pig bladders as bizarre as science fiction. I don’t know who that’s a vote for.

I find it hard to read a book a few pages a day. I can do it for awhile, but at some point I either lose interest and stop, or I want to find out what happens next so I just finish the damn book.

Diana offers a linguistic argument:

Afterthought and correction: The “n” should be considered a nasal and not a liquid, so Laura Ingalls Wilder has five liquids, a nasal, a fricative, a glide, and two plosives, whereas Maurice Sendak has two nasals, a liquid, two fricatives, and two plosives (and, if you count his middle name, three nasals, three liquids, two fricatives, and four plosives). So Wilder’s name actually has the greater variety of consonants, given the glide, but in Sendak’s name the various kinds are better balanced and a little more spill-resistant.

OK, sippy cups. Not so relevant for a talk at Columbia, though, given that there will be very few toddlers in the audience.

Anon offers what might appear at first to be a killer argument:

If you look at the chart, you can pretty clearly notice that the bracket is only as wide as it is because of Laura Ingalls Wilder’s prodigious name. I’ve got to throw my hat in the ring for Sendak, simply for storage.

+1 for talking about storage—optimization isn’t just about CPU time!—but this length-of-name argument reeks of sexism. In a less traditional society, Laura wouldn’t have had to add the Wilder to her name, and plain old “Laura Ingalls,” that’s a mere 13 characters wide, and two of them are lower-case l’s, which take up very little space (cue Ramanujan here). Alison Bechdel’s out of the competition now, but she’s still looking over my shoulder, as it were, scanning for this sort of bias.

And Ben offers a positive case for the pioneer girl:

There’s some sort of libertarian angle with Wilder though right?

What if we told Wilder about bitcoin and defi and whatnot? Surely that qualifies as surreal and amoral in the most entertaining kind of way. I know talking about these things in any context is a bit played out at this point but c’mon. This isn’t some tired old celebrity we’re selling here! This is author of an American classic, from the grave — any way she hits that ball is gonna be funny.

Sounds good to me!

]]>Yesterday we wrote about Smith et al.’s amazing set of 1.5 tiles that cover the plane aperiodically:

And this reminds me of Munroe’s approximate map of the world’s land masses using a single tile:

Perhaps there is some mathematical connection between the two.

]]>Z in comments points to a new discovery by David Smith, Joseph Samuel Myers, Craig S. Kaplan, and Chaim Goodman-Strauss, who write:

An aperiodic monotile . . . is a shape that tiles the plane, but never periodically. In this paper we present the first true aperiodic monotile, a shape that forces aperiodicity through geometry alone, with no additional constraints applied via matching conditions. We prove that this shape, a polykite that we call “the hat”, must assemble into tilings based on a substitution system.

All I can say is . . . wow. (That is, assuming the result is correct. I have no reason to think it’s not; I just haven’t tried to check it myself.)

First off, this is just amazing. Even more amazing is that I had no idea that this was even an open problem. I’d seen the Penrose two-shape tiling pattern years ago and loved it so much that I painted a tabletop with it (and send a photo of the table to Penrose himself, who replied with a nice little note, which unfortunately I lost some years ago, or I’d reproduce it here), and it never even occurred to me to ask whether an aperiodic monotile was possible.

This is the biggest news of 2023 so far (again, conditional on the result being correct), and I doubt anything bigger will happen between now and the end of December.

OK, there’s one possibility . . .

Penrose did it with 2 unique tiles, Smith et al. just needed 1, . . . The next frontier in aperiodic tiling is to do it with 0. Whoever does that will be the real genius.

]]>As part of a discussion about research retractions, I remarked that I could care less about the twin primes conjecture.

This got some reactions in comments! Dmitri wrote:

I think it’s refreshing that Andrew doesn’t care about the twin primes conjecture. After thinking about it for a few seconds, I realized that I also don’t care about the twin primes conjecture.

It’s kind of interesting to think about what sorts of unanswered questions you actually care about. “Is there life on other planets?” Definitely. “What does Quantum Mechanics mean?” Totally. Twin primes, meh …

From the other direction, Ethan Bolker and Larry Gonick were disappointed, with Ethan writing, “Andrew did mathematics before he did what he does now and I thought some of that curiosity would remain.” Adede followed up with, “I find it interesting that someone can care about whether sqrt(2) is a normal number but not care about the twin primes conjecture. It can’t be a pure vs applied thing, both of them seem equally devoid of real-world applications (unless I am missing something).”

OK, so where are we?

First, I’m a big fan of the Cartoon Guide to Statistics, so if Larry Gonick is disappointed in me, that makes me sad and it motivates me to try to explain myself. Second, hey Ethan, I still have curiosity about mathematics, just not about the twin prime conjecture! For example, as Adede notes, I’m curious about the distribution of 0’s and 1’s in the binary expansion of the square root of 2, and that’s pure math with no relevant applications that I know of.

So here’s the question: Why do I care about the distribution of the digits of sqrt(2) but not twin primes?

I’m not really sure, but here are some guesses:

1. The distribution of the digits of sqrt(2) has a probability and statistics flavor; it’s a search for randomness. I’m interested in randomness.

2. Back when I was in high school and did math team and math olympiad training, there were two subjects that were waaay overestimated, to my taste: number theory and classical non-analytic geometry. We got so much propaganda for these subjects that I grew to hate them. A certain amount of number theory is necessary—factorization, things like that—and, yeah, I get that there are deep connections to group theory and other important topics, as well as connections to analysis. I’m glad that somewhere there are people working on the Riemann hypothesis, etc. But the twin primes conjecture, the 3n+1 problem, etc.: I get that they’re challenging, but they’ve never really engaged me.

Explanation #1 can’t be the whole story, because I also find questions about tilings to be interesting, even when no randomness is involved. And explanation #2 isn’t the whole story either. So I don’t really know. Maybe the best answer is that my understanding of mathematics is sufficient for me to understand lots of things in statistics but is not deep enough for me to have any real sense of what makes these particular problems difficult, and so my finding one or another of these problems “intriguing” or “boring” is just an idiosyncratic product of my personal history with no larger meaning.

To put it another way, when I tell you that the Fieller-Creasy problem is fundamentally uninteresting or that the so-called Fisher exact test is a bad idea or that Bayes factors typically don’t do what people want them to do, I’m saying these things for good reasons. You might disagree with me, and maybe I’m wrong and you’re right, but I have serious, explainable reasons for these views of mine. They’re not just matters of taste.

But when I say I care about the distribution of the digits of the square root of 2 but not about the twin primes conjecture, that’s just some uninformed attitude for which I’m not claiming any reasonable basis.

]]>Wesley Tansey writes:

This is no doubt something we both can agree is a sad and wrongheaded use of statistics, namely incredible reliance on null hypothesis significance testing. Here’s an example:

Phase III trial. Failed because their primary endpoint had a p-value of 0.053 instead of 0.05. Here’s the important actual outcome data though:

For the primary efficacy endpoint, INV-PFS, there was no significant difference in PFS between arms, with 243 (84%) of events having occurred (stratified HR, 0.77; 95% CI: 0.59, 1.00; P = 0.053; Fig. 2a and Table 2). The median PFS was 4.5 months (95% CI: 3.9, 5.6) for the atezolizumab arm and 4.3 months (95% CI: 4.2, 5.5) for the chemotherapy arm. The PFS rate was 24% (95% CI: 17, 31) in the atezolizumab arm versus 7% (95% CI: 2, 11; descriptive P < 0.0001) in the chemotherapy arm at 12 months and 14% (95% CI: 7, 21) versus 1% (95% CI: 0, 4; descriptive P = 0.0006), respectively, at 18 months (Fig. 2a). As the INV-PFS did not cross the 0.05 significance boundary, secondary endpoints were not formally tested.

The odds of atezolizumab being better than chemo are clearly high. Yet this entire article is being written as the treatment failing simply because the p-value was 0.003 too high.

He adds:

And these confidence intervals are based on proportional hazards assumptions. But this is an immunotherapy trial where we have good evidence that these trials violate the PH assumption. Basically, you get toxicity early on with immunotherapy, but patients that survive that have a much better outcome down the road. Same story here; see figure below. Early on the immunotherapy patients are doing a little worse than the chemo patients but the long-term survival is much better.

As usual, our recommended solution for the first problem is to acknowledge uncertainty and our recommended solution for the second problem is to expand the model, at the very least by adding an interaction.

Regarding acknowledging uncertainty: Yes, at some point decisions need to be made about choosing treatments for individual patients and making general clinical recommendations—but it’s a mistake to “prematurely collapse the wave function” here. This is a research paper on the effectiveness of the treatment, not a decision-making effort. Keep the uncertainty there; you’re not doing us any favors by acting as if you have certainty when you don’t.

]]>This is Jessica. I remember once hearing one of my colleagues who is also a professor talking about the express train that runs through much of Chicago up to Northwestern campus. He said, “The purple line is fantastic. I get on in the morning, always get a seat and I can get research done. Then I get to campus, and all research ceases for 8 hours. But I get back on the train and I’m right back to doing research!”

It is no joke that the more senior you get in academia, the less time you get to do the things that made you choose that career in the first place. But the topic of this post is a different sort of irony. Right how it’s deadline time for my lab, when many of the PhD students are preparing papers for the big conference in our field. It’s a very “researchy” time. What is surprising is how easy it is to be surrounded by people doing research and not feel like there is much actual new knowledge or understanding happening.

There is a David Blackwell quote that I have come to really like:

I’m not interested in doing research and I never have been, I’m interested in understanding, which is quite a different thing.

Andrew has previously commented on this quote, implying that this may have been true at Blackwell’s time, but things have since shifted and understanding is now recognized as a valuable part of research. But I tend to think that Blackwell’s sentiment is still very much relevant.

For example, when I think about what most people would call “my research,” I think of papers I’ve published that propose or evaluate visualization techniques or other interactive tools we create. But I don’t necessarily associate most of this work with “understanding.” On some level we find things out, but its very easy to present some stuff you learned in a paper without it ever actually challenging anything we already know. It’s framed as brand new information but usually it’s actually 99% old information in the form of premises and assumptions with a tiny new bit of something. It might not actually answer any of the questions that get you out of bed in the morning. I think most researchers would relate to feeling like this at least sometimes.

Pursuing understanding is why I like my job. I think of it as tied to the questions that I am chewing on but I can’t yet fully answer, because the answer is going to be complicated, connecting to many other things I’ve thought about in the past but without the derivation chain being totally clear. Maybe it even contradicts things I’ve thought or said in the past. On some level I think of understanding as dynamic, about a shift in perspective. This all makes hard to circumscribe linguistic boundaries around. I find it’s more natural to express understanding in questions versus answers.

The problem is that questions don’t make for a good paper though unless they can be answered with some satisfaction. As soon as you plan the thing that will fit nicely into the 10-15 page article, with a concise introduction, related work section, and a description of the methods and results, you probably have left behind the understanding. You are instead in the realm of “Making Statements Whose Assumptions and Implications More or Less Follow from One Another and are the Right Scope for a Research Article.” Your task becomes connecting the dots, e.g., making clear there’s motivating logic running from the data collection to the definitions or estimators to the inferences you draw in the end. This is of course usually already established by the time you write the paper, but it can still takes a long time to write it all out, and hopefully you don’t discover an error in your logic, because then its even harder to make the pieces fit and you have to figure out how to talk about that.

But it’s the understanding that is source of actual new information, in contrast to the veneer of new knowledge we usually get with a paper. I used to think that even though it was hard to really explore a problem in a single paper, the real learning or understanding would manifest through bodies of work. Like if you look at my papers over the last ten years, you can see what I’ve come to understand. But I don’t think that’s quite accurate. Certainly there is some knowledge accrual and some influence of what I’ve said in past papers on how I see the world now. But I would say the knowledge I’m most interested in, or most proud of having gained, is not well represented in the papers. It’s more about what intuitions I’ve developed over time, about things like what’s hard about studying behavior under uncertainty, what’s actually an important problem or an unanswered question when it comes to learning from data in different scenarios, what’s misleading or wrong in the way things get portrayed in the literature in my field, etc.

The conflict arises because understanding doesn’t care about connecting the dots. It happens in a realm where it’s well understand that the dots have only a tenuous relationship to the truth status of whatever claims you want to make. But it’s hard to write papers in that world. Strong assertions seem out of place.

Maybe this is why Blackwell’s papers tended to be short.

It’s worth asking whether one can reach understanding without going through the motions of doing the research. I’m not sure. I think there’s value in attempting to take things seriously and make moderately simple statements about them of the type that can be put in a research paper. But then again something like blogging can have the same effect.

On the bright side, if you can find a way to write a paper that you really believe in, then once you put the paper out there, you might get some critical feedback. And maybe then understanding enters the equation, because the critique jars your thinking enough to help you see beyond your old premises. But at least for me this is not the norm. I like getting critical feedback, but even when the paper is about something I’m still in the midst of trying to understand, often by the time things have been published and presented at some conference and the right people see it and weigh in, I’ve already reached some conclusions about the limitations of those ideas and moved on. For this reason it has always driven me crazy when people associate my current interests with things I’ve published a couple years ago.

In terms of shifting the balance toward more understanding, being intentional about publishing less papers and being pickier about what problems you take on should help. And other possibilities I’ve posted about in the past like trying to normalize scientists admitting what they don’t know or when they have doubts about their own work in talks and the papers themselves. More pointing out of assertions and claims to generalization that aren’t warranted, even if the work is already published and it makes the authors uncomfortable, because it enforces the idea that we are doing research because we actually care about getting the understanding right, not just because we like clever ideas.

P.S. Probably the title should have been, Understanding is everywhere, even, on rare occassions, in boxes labeled research. But I like the recursion!

]]>This year, we’re bringing back StanCon in person!

StanCon is an opportunity for members of the broader Stan community to come together and discuss applications of Stan, recent developments in Bayesian modeling, and (most importantly perhaps) unsolved problems. The conference attracts field practitioners, software developers, and researchers working on methods and theory. This year’s conference will take place on June 20 – 23 at Washington University in St Louis, Missouri.

The keynote speakers are:

- Bob Carpenter (Flatiron Institute)
- John Kruschke (Indiana University)
- Mariel Finucane (Mathematica Policy Research)
- Siddhartha Chib (Washington University in St. Louis)

Proposals for talks, sessions and tutorials are due on March 31st (though it looks like we’ll be able to extend the deadline). Posters are accepted on a rolling basis. From the website:

]]>We are interested in a broad range of topics relevant to the Stan community, including:

- Applications of Bayesian statistics using Stan in all domains
- Software development to support or complement the Stan ecosystem
- Methods for Bayesian modeling, relevant to a broad range of users
- Theoretical insights on common Bayesian methods and models
- Visualization techniques
- Tools for teaching Bayesian modeling
Keep in mind that StanCon brings together a diverse audience. Material which focuses on an application should introduce the problem to non-field experts; theoretical insights should be linked to problems modelers are working on, etc.

A frequent correspondent sent along a link to a recently published research article and writes:

I saw this paper on a social media site and it seems relevant given your post on the relative importance of social science research. At first, I thought it was an ingenious natural experiment, but the more I looked at it, the more questions I had. They sure put a lot of work into this, though, evidence of the subject’s importance.

I’m actually not sure how bad the work is, given that I haven’t spent much time with it. But the p values are a bit overdone (understatement there). And, for all the p-values they provide, I thought it was interesting that they never mention the R-squared from any of the models. I appreciate the lack of information the R-squared would provide, but I am always interested to know if it is 0.05 or 0.70. Not a mention. They do, however, find fairly large effects – a bit too large to be believable I think.

I didn’t have time to look into this one so I won’t actually link to the linked paper; instead I’ll give some general reactions.

There’s something about that sort of study that rubs me the wrong way and gives me skepticism, but, as my correspondent says, the topic is important so it makes sense to study it. My usual reaction to such studies is that I want to see the trail of breadcrumbs, starting from time series plots of local and aggregate data and leading to the conclusions. Just seeing the regression results isn’t enough for me, no matter how many robustness studies are attached to it. Again, this does not mean that the conclusions are wrong or even that there’s anything wrong with the researchers are doing; I just think that the intermediate steps are required to be able to make sense of this sort of analysis of limited historical data.

]]>OK, two more children’s book authors. Both have been through a lot. Laura defeated cool person Banksy, lawgiver Steve Stigler, person known by initials Malcolm X, and then a come-from-behind victory against lawgiver Alison Bechdel. Meanwhile, Maurice dethroned alleged tax cheat Martha Stewart, namesake Steve McQueen, and fellow children’s book author Margaret Wise Brown.

Who’s it gonna be? I’d say Maurice because he’s an illustrator as well as a writer. On the other hand, Laura’s books have a lot more content than Maurice’s, also as a political scientist I appreciate the story of how Laura rewrote some of her life history to be more consistent with her co-author daughter’s political ideology.

Both authors are wilderness-friendly!

**Past matchup**

Raghu suggests we should sit here for the present.

Dzhaughn writes:

I have had the Cleary image of Ramona sitting in a basement taking one bite out of every apple for more than 90% of my life.

But Diana counters:

I don’t wanna go down to the basement.

Moving away from Ramona for a moment, Pedro writes:

A little bit of Googling reveals that Shakira once started in a soap opera (Telenovela) in her teen years. Apparently embarrassed, she ended up buying the rights to the soap and now it’s no longer available in any legal way.

Although I’m very sympathetic towards her actions and feelings, this blog is very pro-open science and sharing data and her actions are as against that as possible…

Good point! Cleary is very open, as you can see if you read her two volumes of autobiography. Maybe if she comes to speak, we’ll hear some excerpts from volume 3?

]]>**1. Background: Comparing a graph of data to hypothetical replications under permutation**

Last year, we had a post, I’m skeptical of that claim that “Cash Aid to Poor Mothers Increases Brain Activity in Babies”, discussing recently published “estimates of the causal impact of a poverty reduction intervention on brain activity in the first year of life.”

Here was the key figure in the published article:

As I wrote at the time, the preregistered plan was to look at both absolute and relative measures on alpha, gamma, and theta (beta was only included later; it was not in the preregistration). All the differences go in the right direction; on the other hand when you look at the six preregistered comparisons, the best p-value was 0.04 . . . after adjustment it becomes 0.12 . . . Anyway, my point here is not to say that there’s no finding just because there’s no statistical significance; there’s just a lot of uncertainty. The above image looks convincing but part of that is coming from the fact that the responses at neighboring frequencies are highly correlated.

To get a sense of uncertainty and variation, I re-did the above graph, randomly permuting the treatment assignments for the 435 babies in the study. Here are 9 random instances:

**2. Planning an experiment**

Greg Duncan, one of the authors of the article in question, followed up:

We almost asked students in our classes to guess which of ~15 EEG patterns best conformed to our general hypothesis of negative impacts for lower frequency bands and positive impacts for higher-frequency bands. One of the graphs would be the real one and the others would be generated randomly in the same manner as in your blog post about our article. I had suggested that we wait until we could generate age and baseline-covariate-adjusted versions of those graphs . . . I am still very interested in this novel way of “testing” data fit with hypotheses — even with the unadjusted data — so if you can send some version of the ~15 graphs then I will go ahead with trying it out on students here at UCI.

I sent Duncan some R code and some graphs, and he replied that he’d try it out. But first he wrote:

Suppose we generate 14 random + 1 actual graphs; recruit, say, 200 undergraduates and graduate students; describe the hypothesis (“less low-frequency power and more high-frequency power in the treatment group relative to the control group”); and ask them to identify their top and second choices for the graphs that appear to conform most closely with the hypothesis. I would also have them write a few sentences justifying their responses in order to coax them to take the exercise seriously.

The question: how would you judge whether the responses convincingly favored the actual data? More than x% first-place votes; more than y% first or second place votes? Most votes? It would be good to pre-specify some criteria like that.

I replied that I’m not sure if the results would be definitive but I guess it would be intereseting to see what happens.

Duncan responded:

I agree that the results are merely useful but not definitive.

Your blog post used these graphs to show that the data, if manipulated with randomly-generated treatment dummies, produced an uncomfortable number of false positives. This exercise would inform that intuition, even if we want to rely on formal statistics for the most systematic assessment of how confident we should be with the results.

I agree, and Drew Bailey, who was also involved in the discussion, added:

The earlier blog post used these graphs to show that the data, if manipulated with randomly-generated treatment dummies, produced an uncomfortable number of false positives. This new exercise would inform that intuition, even if we want to rely on formal statistics for the most systematic assessment of how confident we should be with the results.

**3. Experimental conditions**

Duncan was then ready to go. He wrote:

I am finally ready to test randomly generated graphs out on a large classroom of undergraduate students.

Paul Yoo used Stata to generate 15 random graphs plus the real one (see attached). The position (10th) in the 16 for the PNAS graph was determined from a random number draw. (We could randomize its position but that increases the scoring task considerably.) We put an edited version of the hypothesis that was preregistered/spelled out in our original NICHD R01 proposal below the graphs. My plan is to ask class members to select their first and second choices for the graph that conforms most closely to the hypothesis.

Bailey responded:

Yes, with the same caveat as before (namely, that the paths have already forked: we aren’t looking at a plot of frequency distributions for one of the many other preregistered outcomes in part because these impacts didn’t wind up on Andrew’s blog).

**4. Results**

Duncan reported:

97 students examined the 16 graphs shown in the 4th slide in the attached powerpoint file. The earlier slides set up the exercise and the hypothesis.

Almost 2/3rds chose the right figure (#10) on their first guess and 78% did so on their first or second guesses. Most of the other guesses are for figures that show more treatment-group power in the beta and gamma ranges but not alpha.

**5. Discussion**

I’m not quite sure what to make of this. It’s interesting and I think useful to run such experiments to help stimulate our thinking.

This is all related to the 2009 paper, Statistical inference for exploratory data analysis and model diagnostics, by Andreas Buja, Dianne Cook, Heike Hofmann, Michael Lawrence, Eun-Kyung Lee, Deborah Swayne, and Hadley Wickham.

As with hypothesis tests in general, I think the value of this sort of test is when it does not reject the null hypothesis, which represents a sort of negative signal that we don’t have enough data to learn more on the topic.

The thing is, I’m not clear what to make of the result that almost 2/3rds chose the right figure (#10) on their first guess and 78% did so on their first or second guesses. On one hand, this is a lot better than the 1/16 and 1/8 we would expect by pure chance. On the other hand, the fact that some of the alternatives were similar to the real data . . . this is all getting me confused! I wonder what Buja, Cook, etc., would say about this example.

]]>(this post is by Charles)

Last week, BayesComp 2023 took place in Levi, Finland. The conference covered a broad range of topics in Bayesian computation, with many high quality sessions, talks, and posters. Here’s a link to the talk abstracts. I presented two posters at the event. The first poster was on assessing the convergence of MCMC in the many-short-chains regime. I already blogged about this research (link): here’s the poster and the corresponding preprint.

The second poster was also on the topic of running many chains in parallel but in the context of ODE-based models. This was the outcome of a project led by Stanislas du Ché, during his summer internship at Columbia University. We examined several pharmacometrics models, with likelihoods parameterized by the solution to an ODE. Having to solve an ODE inside a Bayesian model is challenging because the behavior of the ODE can change as the Markov chains journey across the parameter space. An ODE which is easy-to-solve at some point can be incredibly difficult somewhere else. In the past, we analyzed this issue in the illustrative planetary motion example (Gelman et al (2020), Section 11). This is the type of problem where we need to be careful about how we initialize our Markov chains and *we should not rely on Stan’s defaults*. Indeed, these defaults can start you in regions where your ODE is nearly impossible to solve and completely kill your computation! A popular heuristic is to draw the initial point from the prior distribution. On a related note, we need to construct priors carefully to exclude patently absurd parameter values and (hopefully) parameter values prone to frustrate our ODE solvers.

Even then—and especially if our priors are weakly informative—our Markov chains will likely journey through challenging regions. A common manifestation of this problem is that some chains lag behind because their random trajectories take them through areas that frustrate the ODE solver. Stanislas observed that this problem becomes more acute when we run many chains. Indeed, as we increase the number of chains, the probability that at least some of the chains get “stuck” increases. Then, even when running chains in parallel, the efficiency of MCMC as measured by effective sample size per second (ESS/s) eventually goes down as we add more chains *because we are waiting for the slowest chain to finish*!

Ok. Well, we don’t want to be punished for throwing more computation at our problem. *What if we instead waited for the fastest chains to finish?* This is what Stanislas studied by proposing a strategy where we stop the analysis after a certain ESS is achieved, even if some chains are still warming up. An important question is what bias does dropping chains introduce? One concern is that the fastest chains are biased because they fail to explore a region of the parameter space which contains a non-negligible amount of probability mass and where the ODE happens to be more difficult to solve. Stanislas tried to address this problem using stacking (Yao et al 2018), a strategy designed to correct for biased Markov chains. But stacking still assumes all the chains somehow “cover” the region where the probability mass concentrates and, when properly weighted, produce unbiased Monte Carlo estimators.

We may also wonder about the behavior of the slow chains. If the slow chains are close to stationarity, then by excluding them we are throwing away samples which would reduce the variance of our Monte Carlo estimators, however, it’s not worth waiting for these chains to finish if we’ve already achieved the wanted precision. What is more, as Andrew Gelman pointed out to me, slow chains can often be biased, for example if they get stuck in a pathological region during the warmup and never escape this region—as was the case in the planetary motion example. But we can’t expect this to always be the case.

In summary, I like the idea of waiting only for the fastest chains and I think understanding how to do this in a robust manner remains an open question. This work posed the problem and took steps in the right direction. There was a lot of traffic at the poster and I was pleased to see many people at the conference working on ODE-based models.

]]>A commenter points us to this juicy story:

John Glenn, huh? I had no idea. I guess it makes sense, though: after the whole astronaut thing ended, dude basically spend the last few decades of his life hanging out with rich people.

Following the link:

Two tiers will be available: the gold collectible, which is unique and grants the buyer the right to co-host the calls with Pinker, will be priced at $50,000; the standard collectibles, which are limited to 30 items and grant the buyers the right to access those video calls and ask questions to Pinker at the end, will be priced at 0.2 Ethereum (~$300).

Here’s the thing. Pinker’s selling collectibles of his idea, “Free speech is fundamental.” But we know from some very solid research that scientific citations are worth $100,000 each.

So does that mean that Pinker’s famous idea that “Free speech is fundamental” is only worth, at best, 0.5 citations? That doesn’t seem fair at all. Pinker’s being seriously ripped off here.

On the other hand, he could also sell collectibles for some of his other ideas, such as, “Did the crime rate go down in the 1990s because two decades earlier poor women aborted children who would have been prone to violence?”, “Are suicide terrorists well-educated, mentally healthy and morally driven?”, “Do African-American men have higher levels of testosterone, on average, than white men?”, or, my personal favorite, “Do parents have any effect on the character or intelligence of their children?” 50 thousand here, 50 thousand there, pretty soon you’re talking about real money.

All joking aside, I don’t see anything wrong with Pinker doing this. The NFT is a silly gimmick, sure, but what he’s really doing is coming up with a clever way to raise money for his research projects. If I had a way to get $50,000 donations, I’d do it too. It’s hard to believe that anyone buying the “NFT” is thinking that they’re getting their hands on a valuable, appreciating asset. It’s just a way for them to support Pinker’s professional work. One reason this topic interests me is that we’re always on the lookout for new sources of research funds. (We’ve talked about putting ads on the blog, but it seems like the amount of $ we’d end up getting for it would be not worth all the hassle involved in having ads.) As is often the case with humor, we laugh because we care.

And why is particular story this so funny? Maybe because it seems so time-bound, kind of as if someone were selling custom disco balls in the 1970s, or something like that. And he’s doing it with such a straight face (“* * * NOW LIVE . . . My first digital collectible . . .”)! If you’re gonna do it at all, you go all in, I guess.

**P.S.** Following the links on the above twitter feed led me to this website of McGill University’s Office for Science and Society, whose slogan is, “Separating Sense from Nonsense.” How cool is that?

What a great idea! I wonder how they fund it. They should have similar offices at Ohio State, Cornell, Harvard (also here), the University of California, Columbia, etc etc etc.

]]>Allister Bernard writes:

I recently came across some research on generalization error and deep learning (references below). These papers explore how generalization error improves in Deep Neural Networks by increasing model capacity and is contrary to what one would assume with the bias-variance tradeoff. I assumed this improvement with such overparameterized models was the effect of regularization (implicit and/or explicit) in these models. However, Zhang et al. show that regularization is highly unlikely to be the source of these gains.

References:

Zhang et al. https://cacm.acm.org/magazines/2021/3/250713-understanding-deep-learning-still-requires-rethinking-generalization/fulltext

Nakkiran et al. https://openai.com/blog/deep-double-descent/

Belkin et al. https://www.pnas.org/content/pnas/116/32/15849.full.pdfYour note on the most important statistical ideas of the past 50 years, highlights the gains achieved with overparameterized models (and regularization). It has worried me that all the hype around deep learning seemed to gloss over how overparameterized the models have become. Now this is not to diminish the gains these models have made in a number of fields, especially image recognition and NLP. I do not want to minimize these achievements as they are truly wonderful.

Here are my two questions:

1. I am curious if there is any work from the statistics community, on why we see this improvement in generalization error? Most of the research I have seen is from the ML/CS community. Belkin et al. point out this behavior is observed in other types of overparameterized models like random forests.

Another possible explanation is that these improvements may be dependent on the problem domain. Feldman et al. propose a possible reason behind this phenomenon (https://arxiv.org/pdf/2008.03703.pdf).2. Your blog has highlighted the dangers of the garden of the forking paths and I am curious if we may have another similar phenomenon here that is not well understood?

From a practical perspective, I wonder if a lot of these tools may get applied to domains where they are not applicable and end up having effects in the real world (as opposed to the theoretical world). There is currently no reason not to do so as we don’t understand where these ideas will/will not work. Besides, it is now very easy to use some of these tools via off the shelf packages.

My reply:

I took a look at the first paper linked above, and I don’t quite get what they are doing. In particular, they say, “Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice,” but they they define regularization as follows: “When the complexity of a model is very high, regularization introduces algorithmic tweaks intended to reward models of lower complexity. Regularization is a popular technique to make optimization problems ‘well posed’: when an infinite number of solutions agree with the data, regularization breaks ties in favor of the solution with lowest complexity.” This is not the regularization that I do at all! When I talk about regularization, I’m thinking about partial pooling, or more generally approaches to get more stable predictions. From a Bayesian perspective, regularization is not “algorithmic tweaks”; it’s just inference under a model. Also, regularization is not just about “breaking ties,” which implies some sort of lexicographic decision rule, nor does regularization necessarily lead to estimates with less complexity. It leads to estimates that are less variable, but that’s something different. For example, hierarchical modeling is not less complex (let alone “with lowest complexity”) than least squares, but it gives more stable predictions.

That said, my above comment is expressed in general terms, and I’m no expert on deep learning or various other machine learning techniques. I’m sympathetic with the general idea of comparing success with training and test data, and I also recognize the challenge of these evaluations, given that cross-validation tests are themselves a function of the available data.

One thing I’ve been thinking about a lot in recent years is poststratification: the idea that you’re fitting a model on data set A and then using it to make predictions in scenario B. The most important concern here might not be overfitting to the data, so much as appropriately modeling the differences between A and B.

]]>As usual, the powers-of-2 thing sneaks up on us. All of a sudden, our third Greatest Seminar Speaker competition is nearing its final rounds.

Today we have two contestants to be reckoned with. Shakira made it pretty far against weak competition but then vanquished the mighty Dahl. Meanwhile Cleary shot down David Bowie, A. J. Foyt, and the inventor of Code Names.

Songwriter or storyteller; which will it be?

**Past matchup**

Raghu offers arguments in both directions:

On the one hand, we have not resolved the mystery of physiological scaling among weight lifters.

On the other:

I decided to spend some time in the library working — a change of scenery — and I picked up a book by Maurice Sendak, “Higglety Pigglety Pop! or There Must Be More to Life,” because previously all I’ve read by Sendak is “Where the Wild Things Are” and because “There Must Be More to Life” is a wonderful title. So far I am only four chapters in: a narcissistic and possibly psychopathic dog leaves her comfortable life in search of something better. It is excellent, and I look forward to finishing. So far it shows no connections to science or statistics, but I wouldn’t mind a seminar on whether there is or is not more to life.

Dzhaughn makes the case for . . . ummm, I’m not sure which one:

It’s hard for me to relate to someone who can eat as much as they want. Or more than they want in case of japanese hot dog guy. Maybe i should open my mind and shut my mouth, even if that’s not their approach.

Supposedly Li Wenwen wins when she can eat, with more ease in her mores, more rice than Maurice, then Maurice.

Anonymous the tie:

I think you really need to give a leg up to the unheard voices. I mean Maurice Sendak got to blab and blab in books, and then I’m sure went on the academic circuit to tell pretentious college students all about the importance of children books, and how important it is to pay him millions of dollars. I don’t speak mandarin, so although Li Wenwen has surely spoken at many cadre meetings or whatever, I haven’t heard it.

And I was all ready to give it Li, but then Ethan came in with this late entry:

]]>“There must be more to life” is far weightier than anything Li can lift. Sendak wins on Wenwen’s turf. Sendak on to the semis.

Pyro: Bayesian Hierarchical Stacking: Well Switching Case Study

PyMC: Bayesian Hierarchical Stacking – well switching case study

Cool!

And here’s the research article, Bayesian hierarchical stacking: Some models are (somewhere) useful, Bayesian Analysis 17, 1043-1071, with Yuling Yao, Gregor Pirš, and Aki Vehtari.

]]>Christopher Bryan, Beth Tipton, and David Yeager write:

In the past decade, behavioural science has gained influence in policymaking but suffered a crisis of confidence in the replicability of its findings. Here, we describe a nascent heterogeneity revolution that we believe these twin historical trends have triggered. This revolution will be defined by the recognition that most treatment effects are heterogeneous, so the variation in effect estimates across studies that defines the replication crisis is to be expected as long as heterogeneous effects are studied without a systematic approach to sampling and moderation. When studied systematically, heterogeneity can be leveraged to build more complete theories of causal mechanism that could inform nuanced and dependable guidance to policymakers. We recommend investment in shared research infrastructure to make it feasible to study behavioural interventions in heterogeneous and generalizable samples, and suggest low-cost steps researchers can take immediately to avoid being misled by heterogeneity and begin to learn from it instead.

We posted on the preprint version of this article earlier. The idea is important enough that it’s good to have an excuse to post on it again.

]]>At first I was gonna say that the edge goes to the author of In the Night Kitchen, because he can draw tempting food that Li Wenwen would then eat, causing her to go out of her weight class and forfeit her title. But Li is in already in the upper weight class so she can eat as much as she wants.

Who should advance? Pierre says, “I don’t care.” But some of you must have opinions!

**Past matchup**

Dzhaughn writes:

Bechdel’s rule would have meant nothing to Laura. But just about any conceivable movie about Laura passes the Bechdel test.

Jonathan adds:

It’s hard not to look ahead to anticipate potential future matchups and ignore the match in front of you. But it’s one match at a time. Fun Home on the Prairie!

I’m going to go with male protagonist proxies here (violating the Bechdel Rule)

Michael Landon (Poppa Wilder) vs. Michael Cerveris (Poppa Bechdel): Landon played a teenage werewolf while Cerveris played Sweeney Todd. Both scary, but I think the werewolf is scarier. So, Wilder.

That’s 2 arguments for Laura and 0 for Alison, so Laura it is.

]]>