Welcome to The Riddler. Every week, I offer up problems related to the things we hold dear around here: math, logic and probability. Two puzzles are presented each week: the Riddler Express for those of you who want something bite-size and the Riddler Classic for those of you in the slow-puzzle movement. Submit a correct answer for either, win , I need to receive your correct answer before 11:59 p.m. Eastern time on Monday. Have a great weekend!</p>
</p>">^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

Due to the holidays, the next column will appear on Dec. 2. See you then!

I recently competed in a 5-kilometer “turkey trot” race. Before the race began, all the runners (including me) gathered behind the starting line in a random order. Once the race began, everyone started running at their own fixed pace.

I hadn’t run in several years, so I wasn’t sure how my pace would compare to that of the other racers. Nevertheless, once the race began, I found myself passing quite a few other runners — and being passed myself.

On average, what fraction of the other runners could I expect to pass during the race? (Assume that if my pace is faster than that of another runner’s, I will pass them at some point during the race.)

From Michael Branicky comes a challenge involve many, many dice:

I have five kinds of fair Platonic dice: tetrahedra (whose faces are numbered 1-4), cubes (numbered 1-6), octahedra (numbered 1-8), dodecahedra (numbered 1-12) and icosahedra (numbered 1-20).^{2}

When I roll two of the cubes, there is a single most likely sum: seven. But when I roll one cube and two tetrahedra, there is no single most likely sum — eight and nine are both equally likely.

Which whole numbers are *never* the single most likely sum, no matter which combinations of dice I pick?

Congratulations to Andrew Busch of London, winner of last week’s Riddler Express.

Last week, in an effort to break open the gates of the city Tinas Mirith, an army of orcs first tried using a battering ram, but to no avail. They next erected a 100-foot pole with a very massive weight at the top (i.e., the weight was much, much heavier than the rest of the pole). The pole was also anchored at the bottom, so that as the weight fell the entire pole rotated around its bottom without slipping.

How far away should the orcs have positioned the vertical pole from the gates so that when the weight came crashing down, its horizontal speed was as great as possible?

First, let’s examine an animation of the falling weight:

As the weight fell, it moved faster. At the same time, due to the rotation, its horizontal motion turns into vertical motion. Now at some point, its *horizontal* speed was at a maximum. One way to find this maximum was to do what many students do time and again in their first physics course: Write an equation describing conservation of energy.

As the weight came down, its gravitational potential energy became kinetic energy. If we call the pole length *L* (100 feet in the puzzle), then by the time the pole made an angle 𝜃 with the ground, the weight was at a height *L*sin𝜃. That meant the *change* in the height was *L*−*L*sin𝜃, or L(1−sin𝜃). Therefore, the change in gravitational potential energy was *mgL*(1−sin𝜃), where m was the mass of the weight and *g* was the acceleration due to gravity — roughly 32 feet/sec^{2}.

This difference in potential energy was what became kinetic energy, which could be expressed as 1/2*mv*^{2}, where *v* was the speed of the weight. Setting these equal gave you v = √(2*gL*(1−sin𝜃)). But you didn’t want the velocity, you wanted the *horizontal* velocity, which was equal to *v*sin(𝜃). And so the function you were trying to maximize was sin𝜃√(2*gL*(1−sin𝜃)). Pulling out a few constants that didn’t matter, this became sin𝜃√(1−sin𝜃). To maximize this function, you had to take the derivative with respect to 𝜃 and set that equal to zero. After some friendly cancelation, this left you with sin𝜃 = 2/3.

That was a nice result, but this wasn’t what the riddle was asking for. What was the right distance between the pole and the gates? That was *L* times the cosine of this optimal angle. If sin𝜃 was 2/3, then cos𝜃 was (√5)/3, which meant the distance was **100/3·√5**, or about 74.5 feet.

A few folks, like Ricky Reusser, opted for a more challenging, “bad approach.” As Ricky stated: “This problem is very easily solved with a simple energy argument, but let’s brute force it from the equations of motion instead!” Ricky proceeded to solve second-order differential equations with Jacobi elliptic functions. In the end, the answer was the same.

After placing their pole in just the right spot, the orcs successfully knocked down the gates of Tinas Mirith, securing a victory for Sord Lauron. And that was the end of all things.

Congratulations to David Cohen of Silver Spring, Maryland, winner of last week’s Riddler Classic.

Last week, I was slicing a square peanut butter and jelly sandwich. But rather than making a standard horizontal or diagonal cut, I instead picked two random points along the perimeter of the sandwich and made a straight cut from one point to the other. (These points could have been on the same side.)

My slice was “reasonable” if I cut the square into two pieces and the smaller resulting piece had an area that was at least one-quarter of the whole area. What was the probability that my slice was reasonable?

To see what’s happening here, we can start by looking at special cases, like when the first point I picked was one of the square’s corners or at the midpoint of one of its sides, as shown below.

When the first point was in a corner, the cut was reasonable whenever the other point was closer to the opposite corner than any other corner, a region that made up a quarter of the square’s perimeter. And when the first point was a midpoint, the cut was reasonable whenever the other point was somewhere on the opposite side, *another* region that made up a quarter of the square’s perimeter. And so, for both of these special cases, my probability of making a reasonable cut was 1/4. Could the answer then simply have been 1/4?

This being a Classic, that was *not* the answer. As it turned out, whenever the first point was anywhere else, the probability of making a reasonable cut was *greater than* 1/4. Solvers Trey Goesh and Starvind made animated graphs suggesting this was the case.

To determine the exact probability, suppose the first point was a distance *x* from the nearest corner of the square, and let’s assume that the square had side length 1, as shown below. Note here that *x* was equally likely to be anywhere from 0 to 0.5.

If the second point was on the opposite side, the cut was reasonable if it was within 1/2+*x* of the opposite corner, creating a trapezoid whose area was at least 1/4. But if the second point was on the *other* side touching that opposite corner, then the cut was reasonable when that second point was within (1−2*x*)/(2−2*x*) of that corner, creating a triangle whose area was at least 1/4.

To find the probability that the cut was reasonable, you had to add these two distances together, divide by 4 (the total perimeter of the square), integrate over *x* from 0 to 0.5, and finally divide by 0.5 (i.e., multiply by 2), since we wanted the *average* probability over this range of *x*. The result of this integral was relatively concise: **(7−ln(16))/16**, or about 26.4 percent.

In the end, 25 percent was not a bad guess. It seems I am only slightly more *reasonable* than that.

Well, aren’t you lucky? There’s a whole book full of the best puzzles from this column and some never-before-seen head-scratchers. It’s called “The Riddler,” and it’s in stores now!

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find the **Nov. 16, 2022, edition**, reprinted with permission at FiveThirtyEight.*

**Big emitters, disease outbreaks, permissively licensed code, impact craters and tinned fish.**

**Big emitters.** Climate Trace, a nonprofit coalition launched in 2020, uses satellite imagery, sector-specific datasets and other sources to estimate greenhouse gas emissions in detail. Their most recent inventory, released last week, highlights 70,000-plus individual sites that “represent the top known sources of emissions in the power sector, oil and gas production and refining, shipping, aviation, mining, waste, agriculture, road transportation, and the production of steel, cement and aluminum.” You can download the data, explore sector- and country-level estimates and browse a map of the sites. **Read more**: Coverage in The New York Times. [h/t Ian Johnson]

**Disease outbreaks.** Juan Armando Torres Munguía et al. have built a dataset of infectious disease outbreaks based on information extracted from the World Health Organization’s Disease Outbreak News alerts (DIP 2022.03.30) and its coronavirus dashboard. The authors have clustered the outbreaks by disease (classified by ICD-10 and ICD-11 codes), country and year. Excluding the COVID-19 pandemic, this leads to 1,500-plus total combinations between January 1996 and March 2022, spanning 60-plus diseases and 200-plus countries/territories. [h/t Konstantin M. Wacker]

**Permissively licensed code.** The Stack, a new dataset from the BigCode project, “contains over 3TB of permissively licensed source code files covering 30 programming languages crawled from GitHub.” Those terabytes hold more than 300 million files extracted from repositories whose licenses place “minimal restrictions on how the software can be copied, modified and redistributed.” The dataset provides the contents of each file along with its repository name, path, size, programming language, detected licenses and several high-level metrics. **Read m****ore**: An introductory Twitter thread and preprint paper. [h/t Karsten Johansson]

**Impact craters.** The Earth Impact Database, maintained by the University of New Brunswick’s Planetary and Space Science Centre, catalogs nearly 200 impact craters caused by meteorites that have crashed into the planet. It presents the name, location, diameter, estimated age, geology and other features of the craters, as well as photographs and bibliographies. **Related**: Cody Winchester has scraped the crater characteristics into CSV and GeoJSON files.

**Tinned fish.** Rainbow Tomatoes Garden is a farm in East Greenville, Pennsylvania, that also happens to run an online store selling “the largest selection of tinned seafood in the world.” Curator-owner Dan Waber publishes a spreadsheet of the store’s 630-plus offerings, listing each product’s name, type of seafood, brand, country of origin, tin size and price; whether it’s organic, certified kosher, smoked, boneless and/or skinless; and more. [h/t George Ho]

Welcome to The Riddler. Every week, I offer up problems related to the things we hold dear around here: math, logic and probability. Two puzzles are presented each week: the Riddler Express for those of you who want something bite-size and the Riddler Classic for those of you in the slow-puzzle movement. Submit a correct answer for either, win , I need to receive your correct answer before 11:59 p.m. Eastern time on Monday. Have a great weekend!</p>
</p>">^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

From the fantastical land of Central Earth comes a physics riddle that will break down your doors:

In an effort to break open the gates of the city Tinas Mirith, an army of orcs first tried using a battering ram, but to no avail. They next erected a 100-foot pole with a very massive weight at the top (i.e., the weight is much, much heavier than the rest of the pole). The pole is also anchored at the bottom, so that as the weight falls the entire pole rotates around its bottom without slipping.

How far away should the orcs position the vertical pole from the gates so that when the weight comes crashing down on the gates, its horizontal speed is as great as possible?

I have made a square peanut butter and jelly sandwich, and now it’s time to slice it. But rather than making a standard horizontal or diagonal cut, I instead pick two random points along the perimeter of the sandwich and make a straight cut from one point to the other. (These points can be on the same side.)

My slice is “reasonable” if I cut the square into two pieces and the smaller resulting piece has an area that is at least one-quarter of the whole area. What is the probability that my slice is reasonable?

Congratulations to Ben Gundry of San Jose, California, winner of last week’s Riddler Express.

Last week, you were challenged to improve upon the Gregorian calendar. Now, each solar year consists of approximately 365.24217 mean solar days. That’s pretty close to 365.25, which is why it makes sense to have an extra day every four years. However, the Gregorian calendar is a little more precise: There are 97 leap years every 400 years, averaging out to 365.2425 days per year.

But could you make a better approximation than the Gregorian calendar? More specifically, you were asked to find numbers *L* and *N* (where *N* was less than 400) such that if every cycle of *N* years included *L* leap years, the average number of days per year was as close as possible to 365.24217.

Many solvers used a “brute force” approach, checking all the values of *N* from 1 to 399. For each value of *N*, you had to find the whole number *L* that resulted in *L*/*N* being as close as possible to 0.24217. Solver Tiago Batalhao knew this meant *L* was either floor(0.24217·*N*) or ceiling(0.24217·*N*) — the whole numbers on either side of 0.24217*N*. In fact, you could find the best *L* for a given *N* by rounding 0.24217·*N* to the nearest whole number. Finally, the best pair of *N* and *L* minimized the absolute difference between *L*/*N* and 0.24217, which could be written as a function of *N*: abs(round(0.24217·*N*)/*N *− 0.24217).

But brute force wasn’t the only way to solve this puzzle. A particularly elegant approach I’d like to highlight involved mediants and Farey sequences. Without getting into the details, we could start with the fractions 0/1 and 1/1. The mediant of these two fractions was calculated by adding across the numerators and denominators and denominators, which was 1/2. Since 0.24217 was between 0/1 and 1/2, we next calculated the mediant of *these* two fractions: 1/3. It was also between 0/1 and 1/3, and so the next mediant of interest was 1/4. And after that, it was 1/5 — the first mediant that was *less* than 0.24217. From there, you wanted the mediant of 1/5 and 1/4, which was 2/9.

Continuing in this fashion, the last mediant that was less than 0.24217 was 7/29, after which came 8/33. *Their* mediant was 15/62, after which came 23/95, 31/128 and 54/223. These last two fractions were on either side of 0.24217, and their mediant was 85/351, or approximately 0.242165. This turned out to be the best approximation where the denominator was less than 400, meaning *N* was 351 and *L* was 85.

In the end, having 85 leap years every 351 years was about 70 times *more accurate* (in terms of averaging out to the right number of solar days per solar year) than the Gregorian calendar’s 97 leap years out of 400. That means we might have to skip a scheduled leap day … in a few thousand years or so.

Congratulations to Adam Richardson of Old Hickory, Tennessee, winner of last week’s Riddler Classic.

Last week, it was peak fall foliage season in Riddler Nation, where the trees changed color in a rather particular way. Each tree independently began changing color at a random time between the autumnal equinox and the winter solstice. Then, at a random later time for each tree — between when that tree’s leaves had begun changing color and the winter solstice — the leaves of that tree would all fall off at once.

At a certain time of year, the fraction of trees with changing leaves was expected to peak. What was this maximal fraction?

Solver Tom Keith simulated thousands of trees — quite beautifully, I might add — finding that the peak appeared to occur 63 percent of the way through the fall. And at this peak, almost 37 percent of the trees had changing leaves.

Meanwhile, many solvers like Andrea Andenna were able to calculate the exact answer. To do this, you wanted to first determine the probability *p*(*t*) of any given tree having changing leaves as a function of time *t*, which you could conveniently rescale between 0 (representing the autumnal equinox) and 1 (representing the winter solstice). Then, you could use calculus to maximize *p*(*t*).

So what was this probability distribution? A tree had changing leaves at time *t* if the leaves started changing *before* *t* and — assuming that was true — the leaves fell *after t*. The probability of the former was simply *t*. Now, assuming the leaves started changing at a time *x* prior to *t*, what was the probability that the leaves fell at a time *y* after *t*? It was equal to the ratio of the amount of time after *t* to the amount of time after *x*, or (1−*t*)/(1−*x*). Meanwhile, *x* was equally likely to be anywhere between 0 and *t*, which meant you had to integrate this ratio over this range and then normalize by dividing by *t*. In the end, this integral came out to (*t*−1)/*t*·ln(1−*t*). Multiplying this by *t* — again, that was the probability that the leaves started changing before *t* — gave you *p*(*t*) = (*t*−1)·ln(1−*t*).

To maximize this function, you took its derivative, which was ln(1−*t*)+1, and set it equal to zero. That meant the peak occurred a fraction 1−1/*e*, or about 63.21 percent, of the way through the fall. That places the peak around Nov. 18, so there’s still time to see them if you happen to live in Riddler Nation!

However, the puzzle asked for the maximum fraction of trees with changing leaves, not *when* this maximum occurred. You could find that fraction by evaluating *p*(1−1/*e*), which was an even nicer-looking expression: **1/**** e**, or about 36.79 percent.

Well, aren’t you lucky? There’s a whole book full of the best puzzles from this column and some never-before-seen head-scratchers. It’s called “The Riddler,” and it’s in stores now!

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>Welcome to The Riddler. Every week, I offer up problems related to the things we hold dear around here: math, logic and probability. Two puzzles are presented each week: the Riddler Express for those of you who want something bite-size and the Riddler Classic for those of you in the slow-puzzle movement. Submit a correct answer for either, win , I need to receive your correct answer before 11:59 p.m. Eastern time on Monday. Have a great weekend!</p>
</p>">^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

The end of daylight saving time here on the East Coast of the U.S. got me thinking more generally about the calendar year. Each solar year consists of approximately 365.24217 mean solar days. That’s pretty close to 365.25, which is why it makes sense to have an extra day every four years. However, the Gregorian calendar is a little more precise: There are 97 leap years every 400 years, averaging out to 365.2425 days per year.

Can you make a better approximation than the Gregorian calendar? Find numbers *L* and *N* (where *N* is less than 400) such that if every cycle of *N* years includes *L* leap years, the average number of days per year is as close as possible to 365.24217.

It’s peak fall foliage season in Riddler Nation, where the trees change color in a rather particular way. Each tree independently begins changing color at a random time between the autumnal equinox and the winter solstice. Then, at a random later time for each tree — between when that tree’s leaves began changing color and the winter solstice — the leaves of that tree will all fall off at once.

At a certain time of year, the fraction of trees with changing leaves will peak. What is this maximal fraction?

Congratulations to David Cohen of Silver Spring, Maryland, winner of last week’s Riddler Express.

Last week, the winner of a particular baseball game was determined by the next pitch. The pitcher either threw a fastball or an offspeed pitch, while the batter was similarly anticipating a fastball or an offspeed pitch. If the batter correctly guessed the pitch would be a fastball, they had a 1-in-5 chance of hitting a home run. If the batter correctly guessed the pitch would be offspeed, they had a 1-in-2 chance of hitting a home run. But if the batter guessed incorrectly, they struck out and lost the game. (The batter was guaranteed to swing either way.)

To spice things up, the pitcher truthfully announced the probability with which they’d throw a fastball. Then the batter truthfully announced the probability with which they’d anticipate a fastball.

Assuming both pitcher and batter were excellent logicians, what was the probability that the batter hit a home run?

A good place to start was the mindset of the pitcher. You didn’t yet know what probability the batter would announce, but let’s call it *b*. Meanwhile, suppose you were considering announcing a fastball probability of *p*. What were the batter’s chances of hitting a home run, in terms of *p* and *b*?

The batter would correctly anticipate a *fastball* with probability *pb*, so the probability of hitting a home run off a fastball was *pb*/5. The batter would correctly anticipate an *offspeed pitch* with probability (1−*p*)(1−*b*), so the probability of hitting a home run off an offspeed pitch was (1−*p*)(1−*b*)/2. Adding these probabilities together gave you the total probability of a home run, which was 7/10*pb *− *p*/2 − *b*/2 + 1/2, which we’ll call *P*.

At this point, the Problem Solving & Posing class at The Hewitt School decided to examine the partial derivative of *P* with respect to *b*, which was 7/10*p* − 1/2. This was equal to zero when *p* was 5/7. When *p* was less than 5/7 (meaning the partial derivative was negative), the batter could improve their chances of hitting a home run by lowering the value of *b*. For example, if the batter announced *b* was 0, then *P* was 1/2 − *p*/2, which was always greater than 1/7. And when *p* was greater than 5/7 (meaning the partial derivative was positive), the batter could improve their home run odds by similarly increasing the value of *b, *again resulting in *P* being always greater than 1/7.

But when *p* was *exactly* 5/7, the batter was effectively pinned. The value of *P* was 1/7 and didn’t change no matter what probability the batter announced. And so the probability of a home run was **1/7**. For MLB, that would be a rather high probability.

Congratulations to Christian Wolters of San Jose, California, winner of last week’s Riddler Classic.

Last week, you purchased 150 pieces of candy to distribute for Halloween. However, you weren’t sure how many trick-or-treaters would visit you. Based on previous years, it could have been anywhere from 50 to 150 (inclusive), with each number being equally likely.

As the trick-or-treaters arrived, you could have decided to give each of them one, two or three candies. You wanted to avoid running out of candy, but you also wanted to avoid having any candy left over. Let *X* represent the number of trick-or-treaters who didn’t get candy (if you *did* run out) or the number of leftover pieces (if you *didn’t* run out).

The day before Halloween, you came up with a strategy to minimize the expected value of *X*. What was this minimum expected value?

The “MassMutual Crew” of Springfield, Massachusetts, came up with a strategy that resulted in a fairly low value of *X*. First, they gave out 100 candies to the first 50 trick-or-treaters. (Whether it was exactly two candies per kid or alternated between one and three candies made no difference.) From there, they gave out one candy per kid for however many kids showed up. For this strategy, the expected value of *X* was precisely the expected value of the difference between the number of trick-or-treaters and 100, which turned out to be **50·51/101**, or about 25.2475.

Solver Rohan Lewis proved this was indeed the minimum expected value of *X* by recognizing that *X* could be zero for at most one value, and then increased by at least 1 for each trick-or-treater more or less than that value. By symmetrically placing that “zero case” in the middle between 50 and 150 (i.e., when there were 100 trick-or-treaters) and making sure everyone beyond the 50th trick-or-treater got at most one candy, Rohan arrived at the same strategy as the MassMutual Crew.

Finally, solver Starvind extended the puzzle, analyzing cases where your regret for each leftover candy was not necessarily the same as your regret for each empty-handed trick-or-treater. Starvid found that If you regretted each empty-handed trick-or-treater *k* times more than each leftover candy, then instead of planning for 100 trick-or-treaters you should plan for (99+301*k*)/(2+2*k*) trick-or-treaters.

I for one had way too much leftover candy. So much regret.

Well, aren’t you lucky? There’s a whole book full of the best puzzles from this column and some never-before-seen head-scratchers. It’s called “The Riddler,” and it’s in stores now!

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find the **Nov. 2, 2022, edition**, reprinted with permission at FiveThirtyEight.*

**Nuclear stockpiles, decades of river widths, flood insurance changes, the weight of the web and Swiss apartment layouts.**

**Nuclear stockpiles.** As of early 2022, a total of nine countries possessed approximately 12,700 nuclear warheads, according to estimates from the Federation of American Scientists. Although “the exact number of nuclear weapons in each country’s possession is a closely held national secret,” the researchers say that “publicly available information, careful analysis of historical records and occasional leaks” make the estimates possible, albeit “with significant uncertainty.” The report includes each country’s current warhead count and subtotals by status, as well as annual totals for each country since 1945. **As seen in**: Our World In Data. **Previously**: Nuclear capabilities (DIP 2016.02.24) and explosions (DIP 2016.03.23). [h/t u/jcceagle]

**Decades of river width****s.** Dongmei Feng et al. have applied an algorithmic approach to calculating the widths of the world’s largest rivers over time. Their dataset contains more than 1 billion measurements of 2.7 million fluvial cross-sections (focusing on those wider than 90 meters), based on roughly 1.2 million satellite images captured between 1984 and 2020. **Previously**: Free-flowing rivers (DIP 2019.07.24) and U.S. hydrography (DIP 2022.10.12). [h/t Colin Gleason]

**Flood insurance changes.** The Federal Emergency Management Agency recently revamped its method of pricing U.S. flood insurance, aiming for “rates that are actuarily sound, equitable, easier to understand and better reflect a property’s flood risk.” A series of datasets and dashboards from the agency summarizes the expected changes in premiums, which began taking effect last year. They count the number of policies for which monthly payments were projected to increase/decrease by a given amount, bucketed into ten-dollar increments, for each state, county and ZIP code. **As seen in**: “How have flood insurance premiums changed?” (USAFacts).

**The weight of the web.** Researchers at the HTTP Archive, a project of the Internet Archive, “periodically crawl the top sites on the web and record detailed information about fetched resources, used web platform APIs and features and execution traces of each page.” They make the raw data available via Google BigQuery, and also publish aggregate data tracking metrics such as loading speed and page weight (measured in kilobytes transferred). **As seen in**: “Why web pages can have a size problem” (Datawrapper).

**Swiss apartment layouts. **Swiss Dwellings “contains detailed data on over 42,500 apartments (250,000 rooms) in ~3,100 buildings including their geometries, room typology as well as their visual, acoustical, topological and daylight characteristics,” sourced from Archilyse AG, a company that analyzes building plans. The details include the placement of rooms, features (e.g., sinks and bathtubs), walls, windows, doors and more. [h/t Matthias Standfest + India in Pixels]

*Dataset suggestions? Criticism? Praise? Send feedback to jsvine@gmail.com. Looking for past datasets? This spreadsheet contains them all. Visit data-is-plural.com to subscribe and browse past editions.*

^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

From Irwin Altrows comes a “high-speed” express:

The winner of a particular baseball game will be determined by the next pitch. The pitcher will either throw a fastball or an offspeed pitch, while the batter will similarly be anticipating a fastball or an offspeed pitch. If the batter correctly guesses the pitch will be a fastball, they have a 1-in-5 chance of hitting a home run. If the batter correctly guesses the pitch will be offspeed, they have a 1-in-2 chance of hitting a home run. But if the batter guesses incorrectly, they will strike out and lose the game. (The batter is guaranteed to swing either way.)

To spice things up, the pitcher truthfully announces the probability with which they will throw a fastball. Then the batter truthfully announces the probability with which they will anticipate a fastball.

Assuming both pitcher and batter are excellent logicians, what is the probability that the batter will hit a home run?

For Halloween this year, you have purchased 150 pieces of candy. However, you’re not sure how many trick-or-treaters will visit you. Based on previous years, it could be anywhere from 50 to 150 (inclusive), with each number being equally likely.

As the trick-or-treaters arrive, you can decide to give each of them one, two or three candies. You want to avoid running out of candy, but you also want to avoid having any candy left over. Let *X* represent the number of trick-or-treaters who won’t get candy (if you *do* run out) or the number of leftover pieces (if you *don’t* run out).

This year, the day before Halloween, you come up with a strategy to minimize the expected value of *X*. What is this minimum expected value?

Congratulations to Christian Wolters of San Jose, California, winner of last week’s Riddler Express.

Last week, my son noticed that when he held a fidget spinner in front of a television (with a 60 hertz refresh rate) and gave it a whirl, it appeared to suddenly spin backward a few times before coming to a halt. While many fidget spinners have three lobes, this particular spinner had five lobes, as shown below.

After giving it a spin, we clearly saw it spin backward *three *times before it stopped. How fast could it have been spinning at the beginning?

As stated in the problem, the television’s refresh rate was 60 hertz, so you could think of the spinner as having 60 snapshots taken per second. Now, what happened when the spinner was turning at exactly 12 revolutions per second (or 720 revolutions per minute)? Since 12 was one-fifth of 60, that meant the fidget spinner made one-fifth of a complete rotation for every snapshot. In other words, each lobe rotated from its current position to the position of the next lobe. In front of the television, the spinner appeared to be stationary!

As the spinner slowed to a halt (due to friction) and passed 12 revolutions per second, each lobe no longer made it all the way to the next lobe’s position with each snapshot, which made the spinner appear to spin backward. By the way, this is closely related to the wagon-wheel effect, where wheels caught on camera appear to spin backward when they’re actually spinning forward.

Not only did this backward motion appear to occur at speeds just below 12 revolutions per second, but it also occurred for speeds that were slightly less than integer multiples of 12 revolutions per second. For my son to have seen the effect occur *three* times, that meant the initial speed of the fidget spinner had to have been **between 36 and 48 revolutions per second**.

If you’re still not convinced, here’s an animation of a spinner that starts at 45 clockwise revolutions per second and slows down to zero, with 60 snapshots taken per second — all slowed down for the purposes of visualization, of course. As you can see, at speeds just below 36, 24 and 12 revolutions per second, the lobes appear to spin backward for a brief period of time.

For extra credit, you were presented with some additional information. Upon closer examination, my son saw the spinner go backward *another* three times, but in these cases, it appeared to have twice as many lobes (i.e., 10). Now, how fast could it have been spinning at the beginning?

We already said that the spinner appeared to go backward when the speed was slightly less than integer multiples of 12 revolutions per second. But what happened at half-integer multiples? For example, when the speed was 6 revolutions per second, every *other* snapshot appeared to be in the same position, while the remaining snapshots were halfway between. This made the spinner appear to have 10 lobes rather than five. And as the spinner slowed down past these half-integer multiples, the 10 lobes again appeared to spin backward. If you look closely, you can see these backward spins around the half-integer multiples of 12 revolutions per second in the above animation.

And so the answer to the extra credit was that the spinner’s initial speed was **between 30 and 42 revolutions per second**. The speeds at which there appeared to be 10 backward-moving lobes were just below 30, 18 and 6 revolutions per second.

By the way, if you want to see a real recording of this phenomenon, albeit for a three-lobed fidget spinner, check out this video from reader Danny Sleator.

Congratulations to Sanandan Swaminathan of San Jose, California, winner of last week’s Riddler Classic.

Last week, a thousand people were playing Lotería, also known as Mexican bingo. The game consisted of a deck of 54 cards, each with a unique picture. Each player had a board with 16 of the 54 pictures, arranged in a 4-by-4 grid. The boards were randomly generated, such that each board had 16 distinct pictures that were equally likely to be any of the 54.

During the game, one card from the deck was drawn at a time, and anyone whose board included that card’s picture marked it on their board. A player won by marking four pictures that formed one of four patterns, as exemplified below: any entire row, any entire column, the four corners of the grid and any 2-by-2 square.

After the fourth card had been drawn, there were no winners. What was the probability that there would be exactly one winner when the fifth card was drawn?

Before getting into the specifics of Lotería, suppose the probability that one particular person won on the fifth draw given that they didn’t win in the first four draws was *p*. Then the probability that this particular person would have been the only winner among the thousand was *p*∙(1−*p*)^{999}. That was the same probability for another player being the only winner, and another — and for all thousand players, as a matter of fact. So the probability that anyone was that lone winner was 1,000*∙p*∙(1−*p*)^{999}.

At this point, we still had to determine the value of *p*. Solver Max Candocia did this by first recognizing that there were 54 choose 5 ways to select the first five cards. Of these, 18∙50, or 900, resulted in a victory for a particular board. The 18 came from the fact that there were 18 winning patterns in total (four rows, four columns, one set of corners, and nine 2-by-2 squares), while the 50 came from the fact that the one card *not* involved in the winning pattern could have been any of the remaining 54−4, or 50, cards.

However, 20 percent of the time, these five-card sets resulted in victory after only four cards had been drawn — that is, when that unhelpful card happened to be drawn *fifth* from the deck. That meant *p* was equal to 0.8∙18∙50/(54 choose 5), or about 0.0002277. Plugging this back into the previous expression, the probability of having exactly one winner on the fifth draw (given no winners after four draws) was approximately **18.1 percent**.

Solver Laurent Lessard went a step further, solving the general case for when there were *N* distinct cards in the deck (i.e., not necessarily 54) and *M* people playing (i.e., not necessarily 1,000). Laurent found a quartic relation (because *four* cards, of course!) between *M* and *N* that maximized the likelihood of having a single winner when the fifth card was drawn. This quartic relation is plotted below:

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find the **Oct. 26, 2022, edition**, reprinted with permission at FiveThirtyEight.*

*Strategic petroleum, internet service offers, Boston’s first women voters, Euro-area securities and gargantuan gourds.*

**Strategic petroleum.** The U.S. Energy Information Administration maintains a dataset tracking the monthly volume of the country’s Strategic Petroleum Reserve, measured in the thousands of barrels. The figures go back to 1977, the year the first crude oil was delivered to the reserve, but lag by a couple of months; the end-of-August volume is scheduled for publication on October 31. **Read more**: The Department of Energy’s history of reserve releases. **Previously**: Petroleum Supply Monthly reports (DIP 2017.08.16) and weekly gas prices (DIP 2021.06.09), both also published by the EIA. [h/t u/CountBayesie]

**Internet service offers.** For an investigation into speed disparities in internet service offers, published last week at The Markup, reporters Leon Yin and Aaron Sankin examined more than 1 million address-specific offers across dozens of U.S. cities. To support the findings, they’ve shared the raw data gathered from ISPs’ websites, as well as tabular files that summarize each offer and attach the contextual variables used for the analysis. (Disclosure: I served, and am credited, as a “Data Coach” for this project.)

**Boston’s first women voters.** The City of Boston’s Mary Eliza Project has been compiling a dataset of women who registered to vote in 1920, the year the 19th Amendment granted them that right. The dataset, transcribed from the original registration books, “is updated periodically as additional voter registers are transcribed.” It contains 6,000-plus entries so far, each listing a voter’s name, registration date, ward, precinct, address, age, country of birth, occupation, husband’s information and more. [h/t Julie Rosier]

**Euro-area securities.** The European Central Bank collects detailed records concerning the financial instruments issued and held by organizations and individuals under its jurisdiction. Its quarterly-updated Securities Holdings Statistics dataset, available through the ECB’s data warehouse, aggregates the latter by investor type (bank, non-bank company, pension fund, household, etc.), investor country of residence, issuer country, type of financial instrument and more. [h/t Martijn Boermans et al.]

**Gargantuan gourds.** At BigPumpkins.com, you can find annual “weigh-off” results from 100-plus local competitions affiliated with the Great Pumpkin Commonwealth, an international standards-setting organization. Although pumpkins represent the titular attraction, the site also publishes results for the squash, long gourd, watermelon, tomato, field pumpkin, bushel gourd and marrow competition classes. HTML tables list each specimen’s weight, grower, location, weigh-off site and lineage. [h/t Julia Silge + Tidy Tuesday]

^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

My son noticed that when he held a fidget spinner in front of a television (with a 60 Hz refresh rate) and gave it a whirl, it appeared to suddenly spin backwards a few times before coming to a halt. While many fidget spinners have three lobes, this particular spinner had five lobes, as shown below.

After giving it a spin, we clearly saw it spin backwards *three *times before it stopped. How fast could it have been spinning at the beginning?

*Extra credit:* Upon closer examination, we also saw it spin backwards *another* three times, but in these cases it appeared to have twice as many lobes (i.e., 10). Now how fast could it have been spinning at the beginning?

From Roberto Linares comes a puzzle that will have you shouting “Bingo!”:

A thousand people are playing Lotería, also known as Mexican bingo. The game consists of a deck of 54 cards, each with a unique picture. Each player has a board with 16 of the 54 pictures, arranged in a 4-by-4 grid. The boards are randomly generated, such that each board has 16 distinct pictures that are equally likely to be any of the 54.

During the game, one card from the deck is drawn at a time, and anyone whose board includes that card’s picture marks it on their board. A player wins by marking four pictures that form one of four patterns, as exemplified below: any entire row, any entire column, the four corners of the grid and any 2-by-2 square.

After the fourth card has been drawn, there are no winners. What is the probability that there will be exactly one winner when the fifth card is drawn?

Congratulations to Carl Schweppe of Medford, Massachusetts, winner of last week’s Riddler Express.

Last week, I had to cut a rug. After many years of using my favorite rug as a putting green, a narrow section in the middle had to be excised. The original rug was 12 feet long and 9 feet wide, while the middle strip was 8 feet long and 1 foot wide, as shown below.

Upon seeing the state of the rug, my neighbor suggested I cut it into two pieces and sew them back together to form a square rug, 10 feet by 10 feet, with no holes (shown below).

How was this possible?

To plug the hole in the middle, you could use 8 square feet from one of the adjacent columns. But that created a new hole, which could be filled by 6 square feet from the next column over. Then *that* was replaced by 4 square feet and finally 2 square feet. The end result called for two stair-shaped cuts on either side of the hole. Solver Glade Roper recorded a video of the rug’s rearrangement. Way to save that rug, Glade!

There’s not much more to say about this puzzle, other than the appreciation I feel for all the solvers who showed their work using ASCII art. Jenny Mitchell was smart to use square emojis with different colors:

Congratulations to Michael Ringel of Jacksonville, Florida, winner of last week’s Riddler Classic.

Last week I was celebrating the birthday of a family member, which got me wondering about how likely it was for two people in a room to have the same birthday.

Suppose people were walking into a room, one at a time. Their birthdays happened to be randomly distributed throughout the 365 days of the year (and no one was born on a leap day). The moment two people in the room had the same birthday, no more people entered the room and everyone inside celebrated by eating cake, regardless of whenever that common birthday was.

On average, what was the expected number of people in the room when they ate cake?

This puzzle was a variation of the famed birthday problem, which asks how many people must be in a room for there to be *at least a 50 percent chance* that at least two of them have the same birthday. Here, the answer is paradoxically (to some, at least) small. With just 23 people in the room, there’s a 50.7 percent chance that at least two of them have the same birthday. (Of course, this calculation makes the same assumptions as last week’s puzzle — that there are 365 days in a year, and each day is equally likely to be someone’s birthday.)

But rather than being asked for the number of people such that the probability exceeded 50 percent, you had to find the *expected number* of people such that two of them had the same birthday. While the answer was unlikely to again be exactly 23, it was surely close to 23.

Solver Adam Davitt found an exact expression for the expected number of people. First, the probability of eating cake when there was one person in the room was zero. Easy enough! The probability that *two* people in the room had the same birthday was 1/365 — the first person could have had any birthday, and the second person happened to have the same birthday. To eat cake with three people, the first person could again have any birthday, the second person had to have a different birthday, and the third person had to have either the first or second person’s birthday. The probability of this happening was (365/365) × (364/365) × (2/365).

In general, the probability of eating cake with *N* people was 365 × 364 × … × (365−*N*+1) × (*N*−1)/365* ^{N}*. To find the

Solver Laurent Lessard noted that the famed mathematician Ramanujan previously found an excellent approximation for the sum in this problem. The answer turns out to be quite close to √(365𝜋/2) + 2/3, or about 24.611. So you see, this puzzle really was about both cake *and* pi!

For extra credit, you assumed everyone ate cake the moment *three* people in the room had the same birthday. On average, what was this expected number of people?

This was closely related to a previous riddle that extended the classic birthday problem to three people. To have at least a 50 percent chance that three people in a room had the same birthday, you needed a total of 88 people in the room. So the answer to the extra credit should have been somewhere around 88. Sure enough, solver Ian Walker found the answer was very close: approximately **88.739** people.

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, **a weekly newsletter of useful/curious datasets. Below you’ll find the **Oct. 19, 2022, edition**, reprinted with permission at FiveThirtyEight.*

*Carbon pricing, UNICEF’s operations, community-moderated tweets, U.K. museums and cattle brands.*

**Carbon pricing.** In a paper published last month, Geoffroy Dolphin and Qinrui Xiahou describe their World Carbon Pricing Database. For each country (as well as each U.S. state and certain other subnational jurisdictions), the database indicates the price per metric ton of CO_{2} equivalent associated with any carbon taxes and cap-and-trade mechanisms in place, for each year going back to 1990. It lists these prices for each combination of fuel type and sectoral classification. **Previously:** The Voluntary Registry Offsets Database and the World Bank’s database of carbon pricing initiatives (DIP 2021.11.17).

**UNICEF’s operations.** The United Nations Children’s Fund is a signatory to the International Aid Transparency Initiative and, as such, publishes detailed data files describing its programs and activities around the world. The files are organized by country, updated monthly and follow the initiative’s prescribed XML structure. They list each program’s name, organizations involved, locations, dates, budgets, spending, results and more. You can also use UNICEF’s transparency portal to explore the data by program focus and country. [h/t Alexa Ighodaro]

**Community-moderated tweets.** Twitter recently expanded its Birdwatch pilot program, which allows certain users to anonymously “identify Tweets they believe are misleading, write notes that provide context to the Tweet and rate the quality of other contributors’ notes.” The company provides data on all submitted notes, ratings of notes and note status histories, though it requires you to be logged in and U.S.-based to download the files. **As see****n in:** “COVID misinfo is the biggest challenge for Twitter’s Birdwatch program, data shows,” from The Verge’s Corin Faife, who has published an interactive, downloadable table of the notes.

**U.K. museums.** The Mapping Museums project has assembled a searchable, browsable and downloadable dataset of 4,000+ museums active in the United Kingdom between 1960 and 2020. It includes museums dedicated to art, war, local history, transport, drinks and many other subjects. The records, collected and refined from a range of sources, indicate each museum’s name, location, size (small, medium, large, huge), topic, years opened and closed, accreditation, type of governance and more.

**Cattle brands.** Kansas ranchers must register their cattle-branding symbols with the state’s agriculture department. For decades, the department published books listing all the registered brands, indexed using a custom coding system. Mason Youngblood et al. have assembled a dataset of 90,000+ such entries from the 1990, 2008, 2014, 2015 and 2016 books. **Related:** “Kansas Moves Cattle Brand Registration to the Cloud” (GovTech). [h/t Felix Riede + Dugald Foster]

^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

From Michael Amspaugh comes a riddle that will pull the rug out from under you:

I have a rug that is 12 feet long and nine feet wide. Unfortunately, after many years of using it as a putting green, a narrow section in the middle had to be excised. That middle strip was eight feet long and one foot wide, as shown below.

Upon seeing the state of the rug, my neighbor (and golf partner) suggested I cut the rug into two pieces and sew them back together so that it formed a square rug, 10 feet by 10 feet, with no holes (shown below).

How was this possible?

Today I happen to be celebrating the birthday of a family member, which got me wondering about how likely it is for two people in a room to have the same birthday.

Suppose people walk into a room, one at a time. Their birthdays happen to be randomly distributed throughout the 365 days of the year (and no one was born on a leap day). The moment two people in the room have the same birthday, no more people enter the room and everyone inside celebrates by eating cake, regardless of whether that common birthday happens to be today.

On average, what is the expected number of people in the room when they eat cake?

*Extra credit: *Suppose everyone eats cake the moment *three* people in the room have the same birthday. On average, what is this expected number of people?

Congratulations to Shawn Mier of Buffalo Grove, Illinois, winner of last week’s Riddler Express.

Last week, you analyzed a digital 12-hour clock that displayed 10 digits: two digits representing the hour (from “00” to “12”), two digits representing the minute, two digits representing the second and four digits representing the year.

When did this clock next use every digit from 0 to 9?

It was helpful to arrange the digits from most significant (i.e., representing the greatest amount of time) to least significant (representing the least amount of time). We can write this as YYYY:HH:MM:SS, with Y representing “year,” H representing “hour,” M representing “minute” and S representing “second.”

At the time of this writing, the year is 2022, which meant the first Y had better be a 2. Ideally, the second Y was also a small digit, so let’s suppose it was 0. Because it was a 12-hour clock, the first H then had to be 1, which meant HH was 10 (not allowed since we already used the zero), 11 (not allowed since digits can’t repeat) or 12 (not allowed since we already used the two). Yikes! This meant the second Y *couldn’t* be 0, so our next best option was to have it be 1. In other words, the answer was some time in the 22nd century.

Next, you could use up some of the higher-valued digits toward the end of the solution. The first digit of M and S both had to be less than six, so HH:MM:SS was optimally 07:48:59. That left two remaining digits: 3 and 6.

In the end, the next time the clock used every digit from 0 to 9 was 2136:07:48:59, or **7:48 a.m. (and 59 seconds) in the year 2136**. As solver Kiera Jones pointed out, the first time this would happen was specifically on the morning of January 1, 2136.

That’s a little over a century from now. But the good news is that you wouldn’t have to wait long for the next time the clock used all 10 digits — that came less than a minute later, at 7:49 a.m. (and 58 seconds). And then again at 8:47 (and 59 seconds). And then again …

Congratulations to Eric Snyder of Everett, Washington, winner of last week’s Riddler Classic.

Last week, I had 14 pairs of socks in my laundry basket that I needed to pair up. To do this, I used a chair that could fit nine socks, at most. I randomly drew one clean sock at a time from the basket. If its matching counterpart was not already on the chair, then I placed it in one of the nine spots. But if its counterpart *was* already on the chair, then I removed it from the chair (making that spot once again unoccupied) and placed the folded pair in my drawer.

What was the probability I could fold all 14 pairs without ever running out of room on my chair?

Given the computational complexity of this puzzle, it made sense that many solvers decided to simulate thousands or even millions of laundry baskets and see how often all the socks could be folded. Clement Lelievre from Blois, France, ran 100,000 simulations, finding that all socks could be folded very nearly 70 percent of the time.

Multiple solvers were able to find an exact solution, again with the help of a computer. But before getting to that, I wanted to highlight a geometric interpretation of this puzzle. You could think of folding the socks as navigating a grid, as shown below. Initially, there were 14 pairs of socks in the basket. In the end, there were zero pairs of socks in the basket. And between these two end states, there were many paths indicated by the arrows from one state to another. The height of each state represented how many unfolded socks there were on the chair. To solve the puzzle, you had to find the probability of successfully navigating this grid without ever exceeding a height of nine — that is, without entering the red zone at the top of the grid. This calculation remained tricky, since the probabilities of transitioning from one state to the next depended on how many paired and unpaired socks remained in the basket.

Josh Silverman called the number of single socks in the basket (and therefore the number of socks laid out on the chair) *s*, and the number of paired (or “doubled”) socks in the basket *d*. Then the probability of picking out a single sock from the basket and hence removing a sock from the chair was *s*/(*s*+2*d*), while the probability of pulling out a paired sock from the basket and hence adding a sock to the chair was 2*d*/(*s*+2*d*).

From there, Josh plugged these probabilities into a recursive formula and (with the assistance of a computer) determined that the probability of never having more than nine socks on the chair was precisely 15,627,431/22,309,287, or about **70.049 percent** — a figure that agreed with Blois’s simulations.

For extra credit, you were asked for a more general solution, where the number of pairs of socks and the capacity of the chair were both variable. Solvers Emily Boyajian and Michael Goldwasser created heat maps that illustrated the probability of being able to fold your socks as a function of these parameters. Here was Emily’s heat map:

When the maximum number of socks on the chair was much greater than half the number of pairs in the basket — or rather, when the chair could hold much more than a quarter of all the socks — you had a very good chance of being able to fold them. But when the chair couldn’t hold nearly a quarter of all the socks, your chances of folding were quite poor. This phase transition, when the chair could hold just about a quarter of all the socks, was mathematically very interesting.

In any case, if you have many pairs of socks, make sure you have a big enough chair.

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find the Oct. 12, 2022, edition, reprinted with permission at FiveThirtyEight.*

**Work-related injury counts, U.S. hydrography, rebel leaders, file formats and wine economics.**

**Work-related injury counts.** The U.S. Occupational Safety and Health Administration requires many (but not all) businesses to track employees’ work-related injuries and illnesses. Larger companies and those in high-risk industries must electronically submit annual counts to the agency. Thanks to freedom-of-information lawsuits by Reveal and Public Citizen, OSHA began to publish business-level data from those electronic submissions in 2020. The records, which go back to 2016, include each business’s name, location, industry, employee count and employee hours worked, plus their reported number of deaths, injuries, skin disorders, respiratory conditions, poisonings, hearing loss and other illnesses.

**U.S. hydrography.** The National Hydrography Dataset, maintained by the U.S. Geological Survey, “represents the water drainage network of the United States with features such as rivers, streams, canals, lakes, ponds, coastline, dams, and streamgages.” You can download the NHD geospatial files by hydrologic unit or state, or for the entire nation. **Related**: a dataset of waterfalls and rapids in the contiguous U.S., linked to the NHD and sourced partly from Bryan Swan and Dean Goss’s World Waterfall Database. [h/t Malcolm Tunnell + Christopher Ingraham]

**Rebel leaders.** Benjamin Acosta et al.’s Rebel Organization Leaders Database “provides a wide range of biographical information on all top rebel, insurgent, and terrorist leaders who were active in civil wars between 1980 and 2011.” It includes each leader’s name, gender, education, religion, languages spoken, number of children, years in role, country fought against, cause of death and much more. The database covers 425 individuals fighting against 80-plus countries; the project also features written profiles for a sample of them.

**File formats.** The U.S. National Archives’ Digital Preservation Framework describes the agency’s risk assessments and recommended preservation plans for 600-plus file formats. The framework’s documentation places each format into one of 16 categories, such as “digital audio,” “spreadsheets,” “navigational charts” and “software and code.” In August, the agency added “linked open data” representations of the plans for each format. [h/t Elizabeth England]**Wine economics.** Researchers at the University of Adelaide’s Wine Economics Research Centre have compiled several longitudinal datasets. One, for example, quantifies the total area devoted to growing each grape variety in each country, 1960-2016. Another compiles various market statistics (e.g., national wine production, imports, exports) going back to 1835. **Related**: The International Organisation of Vine and Wine maintains a database of global and national statistics going back to 1995. **As seen in**: Jack Zhao’s exploration of the Adelaide data.

*Dataset suggestions? Criticism? Praise? Send feedback to jsvine@gmail.com. Looking for past datasets? **This spreadsheet contains them all**. Visit **data-is-plural.com** to subscribe and to browse past editions.*

The next few months are going to be exciting — and busy! — for the FiveThirtyEight Politics Podcast. We’re seeking a freelance audio editor to work two days a week and help make our Politics podcast sound crisp, clear and well-paced.

- Edit two hour-long podcasts per week, on Mondays and Thursdays (and some Wednesdays).

- Fluency in audio-editing software, ideally Pro Tools, Audition or Hindenburg.
- Proficiency in iZotope, with an ability to make remote guests sound crisp and clear.
- Ability to precisely execute content and cosmetic edits.

- Some experience with political journalism.

The job can be done remotely. If this sounds appealing, please apply! Send a resume and three examples of your audio-editing work to podcasts at fivethirtyeight dot com by Oct. 21. Work examples should ideally include roundtable conversations and/or two-way interviews that you have mixed and edited in full.

ABC News and FiveThirtyEight are equal-opportunity employers. Applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, disability or protected veteran status.

]]>^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

From Ryan Nelson comes a puzzle related to the digital display of digits (for the second week in a row!):

A digital 12-hour clock displays 10 digits: two digits representing the hour (from “00” to “12”), two digits representing the minute, two digits representing the second and four digits representing the year.

When will the clock next use every digit from 0 to 9?

From Anna Kómár comes a stumper about socks:

In my laundry basket, I have 14 pairs of socks that I need to pair up. To do this, I use a chair that can fit nine socks, at most. I randomly draw one clean sock at a time from the basket. If its matching counterpart is not already on the chair, then I place it in one of the nine spots. But if its counterpart *is* already on the chair, then I remove it from the chair (making that spot once again unoccupied) and place the folded pair in my drawer.

What is the probability I can fold all 14 pairs without ever running out of room on my chair?

*Extra credit:* What if I change the number of pairs of socks I own, as well as the number of socks that can fit on my chair?

Congratulations to Eyal Minsky-Fenick of New Haven, Connecticut, winner of last week’s Riddler Express.

Last week, I was cracking a safe with a two-digit passcode. (It wasn’t a very secure safe.) Both digits were between 0 and 9, and when I typed in each digit, that digit was shown via a standard seven-segment display.^{2} Only the most recent digit I entered showed up on the single-digit display.

By pressing two different digits at the same time, the safe gave me the benefit of the doubt and opened if either ordering of the digits was the correct passcode. For example, if I pressed the 1 and the 2 at the same time, the safe opened if the passcode was either “12” or “21.”

Finally, I noted that the display wasn’t functioning perfectly. Any segment that was part of a number in the passcode appeared to be slightly faded. This fade was visible even when the segment wasn’t lit. And any segment that was part of *both *digits (if there were any such segments) were twice as faded.

Unfortunately, after some mental math, I realized that I still didn’t have enough information to open the safe with confidence.

What were all the possible two-digit passcodes for the safe?

There were 10 possibilities for the first digit and another 10 possibilities for the second digit, so there were a total of 100 possible passcodes. Of these, 90 had a different first and second digit. Because you could press two different digits at the same time, that meant you could treat these 90 codes as just 45 codes. Including the 10 codes with the same first and second digit, that meant there were 55 distinct ways to try cracking open the safe.

Now, many of these 55 ways resulted in a unique pattern of faded segments. For example, consider the following seven-segment display:

The top segment and the two right segments are both very faded, while the middle and bottom segments are slightly faded. There is only one combination of two digits that could have resulted in this “fingerprint of fades”: 3 and 7. If this was what I had seen when trying to crack the safe, the passcode would have been either 37 or 73. In either case, I could have pressed 3 and 7 together to open the safe.

However, not every pair of digits had a unique fingerprint — and that’s precisely what this riddle was all about. Many solvers, like Adam Davitt, worked out all 55 cases by hand (or spreadsheet) and found which two resulted in the same fingerprint. Others, like Andrea Andenna, wrote computer code to check these cases for them. Either way, here was what I must have seen on the safe’s display:

In this case, the top-right and bottom-left segments were slightly faded, while the remaining segments were all very faded. This fingerprint could have been created by 5 and 8 (after all, it kind of looks like a faded 5), but also by 6 and 9. And so, because I couldn’t immediately open the safe by seeing the fingerprint, the passcode must have been **58, 85, 69 or 96**.

I’ll leave it to you to guess which of these was the safe’s actual passcode.

Congratulations to Laurent Lessard of Toronto, Canada, winner of last week’s Riddler Classic.

Last week, you had a pizza to share with three of your friends. Among the four of you, everyone wanted a different amount of pizza. In particular, the ratio of appetites was 1:2:3:4. Therefore, you wanted to make two complete, straight cuts (i.e., chords) across the pizza, resulting in four pieces whose areas had a 1:2:3:4 ratio.

Where should you make the two slices?

First off, to keep the numbers relatively simple, let’s suppose the pizza had an area of exactly 10 square units (since 1+2+3+4 equals 10).

Now, before slicing up the pizza, a good first step was to convince yourself that this was even solvable. One way to do that was to imagine two chords that split the circular pizza’s areas into two different ratios, such as 4:6 (i.e., 1+3:2+4) and 3:7 (i.e., 1+2:3+4). Independently rotating the chords about the center of the circle resulted in regions of different areas. For some orientation of the chords, the region with area 4 overlapped the region with area 3 to form a new region with area 1. At this point, the remainder of the region with area 4 formed a piece with area 3, while the remainder of the region with area 3 formed a piece with area 2. That left an area of 4 for the fourth and final piece. In other words, the ratio of the areas was indeed 1:2:3:4.

By picking different pairs of ratios that summed to 10 — or, equivalently, by having the four areas appear in a different order around the pizza — you could find six different solutions (i.e.,three pairs of reflected solutions). Laurent, this week’s winner, illustrated these solutions:

Many solvers, such as James Kilfiger, Jason Weisman and Thomas Stone, used trigonometry to determine specific coordinates and angles for the slices. For example, the solution in the bottom left consisted of a diameter and a second chord that intersected it about 34 percent of the way along the radius, forming an angle of almost 70 degrees.

For extra credit, you had to split the pizza among an even greater number of friends, while still (1) having half as many cuts as people and (2) having all the cuts pass through a single point. With *six* friends sharing the pizza, it turned out that there was no way to achieve a perfect 1:2:3:4:5:6 ratio among the areas of the pieces. And with *eight* friends, a perfect 1:2:3:4:5:6:7:8 ratio was similarly impossible.

Instead of exact solutions, Laurent found cuts that minimized the sum of the square differences between the areas and the desired areas. Below are his optimal slices for six and eight people:

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find the Oct. 5, 2022, edition, reprinted with permission at FiveThirtyEight.*

**Grid emissions, chain and indie restaurants, wildfire smoke pollution, federal audits and a decade of tasks.**

**Grid emissions.** Ember, an “energy think tank that uses data-driven insights to shift the world from coal to clean electricity,” has begun compiling annual and monthly statistics on electricity demand, generation and estimated greenhouse gas emissions by country, standardized from national and international sources. The annual estimates span two decades and 200-plus countries and territories; the monthly dataset provides somewhat less coverage. Both can also be explored online. **Related**: Singularity’s Open Grid Emissions initiative estimates the hourly grid emissions of balancing authorities and power plants in the U.S., currently for 2019 and 2020. **Previously**: Other energy-related datasets. [h/t Philippe Quirion]

**Chain and indie restaurants.** Xiaofan Liang and Clio Andris of Georgia Tech’s Friendly Cities Lab have published a map and dataset examining the “chainness” of 700,000-plus U.S. restaurants. Starting with records provided by a marketing-data company, the researchers standardized the restaurants’ names, counted their frequencies and classified them as chains (those with more than five outlets) or not. The dataset also lists each restaurant’s cuisine and location. **As seen in**: Andrew Van Dam’s exploration of the data for his new-ish Washington Post column, Department of Data.

**Wildfire smoke pollution.** Marissa L. Childs et al. have developed a “machine learning model of daily wildfire-driven PM2.5 concentrations using a combination of ground, satellite, and reanalysis data sources that are easy to update.” (PM2.5 refers to particulate matter 2.5 micrometers in diameter or smaller.) The researchers then used that model to generate daily smoke PM2.5 estimates for each county, Census tract and 10-kilometer-grid tile in the contiguous U.S., for 2006-2020. **Read more**: Coverage and maps in the New York Times. [h/t George LeVines]

**Federal audits.** Nonprofits, state/local governments and other noncommercial entities expending $750,000-plus of federal funds in a year are required to undergo a standardized audit of their financials and compliance. The U.S. Federal Audit Clearinghouse maintains a public database of those audits; it offers bulk downloads of the report data (about the auditee, auditor, findings and more), as well a tool to search and access individual reports. [h/t Big Local News]

**A decade of tasks.** Between April 2009 and February 2019, software engineer Renzo Borgatti set 17,000-plus daily tasks for himself. He completed slightly less than half of them. He labeled them with tags such as “@meeting”, “@talk” and “@clojure.” He estimated how many “pomodoros” each would take, and how many they really did. We know this because Borgatti allowed Derek M. Jones to publish a partially redacted dataset of his tracked tasks. **Previously**: One software company’s task estimates (DIP 2019.04.24), also published by Jones.

*Dataset suggestions? Criticism? Praise? Send feedback to jsvine@gmail.com. Looking for past datasets? **This spreadsheet contains them all**. Visit **data-is-plural.com** to subscribe and to browse past editions.*

^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

I’m cracking a safe with a two-digit passcode. (It’s not a very secure safe.) Both digits are between 0 and 9, and when I type in each digit, that digit is shown via a standard seven-segment display.^{2} Only the most recent digit I enter shows up on the single-digit display.

By pressing two different digits at the same time, the safe will give me the benefit of the doubt and open if either ordering of the digits is the correct passcode. For example, if I press the 1 and the 2 at the same time, the safe will open if the passcode is either “12” or “21.”

Finally, I note that the display isn’t functioning perfectly. Any segment that is part of a number in the passcode appears to be slightly faded. This fade is visible even when the segment isn’t lit. And any segment that is part of *both *digits (if there are any such segments) is twice as faded.

Unfortunately, after some mental math, I realize that I still don’t have enough information to open the safe with confidence.

What are all the possible two-digit passcodes for the safe?

From Dean Ballard comes a matter of asymmetrical pizza:

Dean made a pizza to share with his three friends. Among the four of them, they each wanted a different amount of pizza. In particular, the ratio of their appetites was 1:2:3:4. Therefore, Dean wants to make two complete, straight cuts (i.e., chords) across the pizza, resulting in four pieces whose areas have a 1:2:3:4 ratio.

Where should Dean make the two slices?

*Extra credit:* Suppose Dean splits the pizza with more friends. If six people are sharing the pizza and Dean cuts along three chords that intersect at a single point, how close to a 1:2:3:4:5:6 ratio among the areas can he achieve? What if there are eight people sharing the pizza?

Congratulations to Silvio of Ravenna, Italy, winner of last week’s Riddler Express.

Last week, the Riddler Shirt Store was selling *N* kinds of shirts, each kind with a picture of a different famous mathematician. Unfortunately, on average, 80 percent of orders were returned.

That was because the company’s website had customers order their shirts using a code (from 1 to *N*), but it did not state which code corresponded to which shirt. Each customer knew which mathematician — and therefore which shirt — they wanted.

But to get that desired shirt, they had to enter a random shirt code and order the corresponding shirt without knowing which mathematician they’d get. If that shirt depicted the wrong mathematician, they randomly selected a different (untested) code, and repeated this process until the desired shirt arrived.

How many different shirts did the store sell?

The first time you placed an order, the probability it was the correct shirt was 1/*N*. That meant there was a 1/*N* chance of not needing any returns. But there was a (*N*−1)/*N* chance it was the wrong shirt, in which case you had a 1/(*N*−1) of guessing correctly among the remaining shirts. That meant there was a (*N*−1)/*N *· 1/(*N*−1), or a 1/*N*, chance of needing exactly one return. In other words, the probability of needing no returns was equal to the probability of needing exactly one return.

Another way to see this was to list out the order in which you intended to test the different codes. Solver Mary E. Morley noted that the correct code was equally likely to be anywhere in this list, which meant any number of returns, from zero to *N*−1, was equally likely.

So, what was the *average* number of returns for each shirt that was correctly ordered? That was 0/*N* + 1/*N* + 2/*N* + 3/*N* + … + (*N*−1)/*N*, which simplified to (*N*−1)·*N*/(2*N*), or (*N*−1)/2. In the problem, this ratio was 80 percent to 20 percent, meaning there were 4 (i.e., 80 divided by 20)* *returns for every shirt that was correctly ordered. Setting (*N*−1)/2 equal to 4 and solving revealed that *N* was **9**.

In general, if the Riddler Shirt Store sold *N* shirts, the expected return rate was (*N*−1)/2 (the number of returns per correctly ordered shirt) divided by (*N*−1)/2 + 1. This simplified to (*N*−1)/(*N*+1), a surprisingly (at least to me) concise expression.

Congratulations to Ruben te Wierik of Utrecht, the Netherlands, winner of last week’s Riddler Classic.

Graydon, the submitter of last week’s puzzle, was about to depart on a boating expedition seeking footage of the rare aquatic creature, *F. Riddlerius*. Every day he was away, he sent a handwritten letter to his best friend, David. But if Graydon still had not spotted the creature after *N* days (where *N* was some very, very large number), he returned home.

Knowing the value of *N*, Graydon confided to David there was only a 50 percent chance of the expedition ending in success before the *N* days had passed. But as soon as any footage was collected, he would immediately return home (after sending a letter that day, of course).

On average, for what fraction of the *N* days should David have expected to receive a letter?

This puzzle was admittedly ambiguous, but I found that the most interesting mathematics occurred when you assumed that Graydon’s probability of spotting the creature — which we can call *p* — was the same each day, and that each day was independent of the next. That meant the probability of *not* finding the creature on a given day was (1−*p*). Since Graydon only had a 50 percent chance of finding the creature after *N* days, you knew that (1−*p*)* ^{N}* = 1/2.

Taking the natural logarithm of both sides of the previous equation, solver Shankar Sivarajan found *N*·ln(1−*p*) = −ln(2). Why would Shankar do such a thing? Because at this point you can use the information that *N* is “very, very large” to your advantage. When *N* is very large, *p* is very small, which means that ln(1−*p*) is approximately equal to −*p*. Therefore, *Np* was approximately ln(2).

That was useful information to have, but you still had to calculate the expected number of letters. At this point, let’s define *q* = 1−*p*, the probability Graydon *didn’t* see the creature on a given day. The probability Graydon sent exactly one letter was 1−*q*. The probability he sent exactly two letters was *q*(1−*q*), or *q*−*q** ^{2}*. The probability he sent exactly three letters was

The last term in that sum, −*Nq ^{N}*, could be rewritten as −

Finally, the puzzle asked “for what fraction of the *N* days” David could expect to receive a letter. That meant you had to divide the expected number of letters, 1/(2*p*), by the number of days, *N*, giving you 1/(2*Np*). And here was where you could apply the fact that *Np* approached ln(2) in the limit of large *N*. In the end, David could expect to receive letters on **1/(2ln(2)) **of the days, or on about 72.135 percent of the days.

Throughout this analysis, we kept asking *when *Graydon* *would spot the *F. Riddlerius*. But we never stopped to ask *how* the *F. Riddlerius *might feel about all this. (It was very pleased to have remained hidden half the time.)

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, **a weekly newsletter of useful/curious datasets. Below you’ll find the **Sept. 28, 2022, edition**, reprinted with permission at FiveThirtyEight.*

**FDA inspections, academic citations, Old and New Testament locations, university endowments and tech products promoted.**

**FDA inspections.** The U.S. Food and Drug Administration’s inspections dashboard lists 264,000-plus assessments of facilities (primarily those manufacturing food, drugs and other FDA-regulated products) and 227,000-plus problems the inspectors found. The fields include the facility owner, location, product type, inspection completion date and outcomes. The records, which go back to fiscal year 2009, can be bulk-downloaded from the dashboard and queried via an API. They come with certain caveats; they exclude, for instance, “inspections waiting for a final enforcement action” and those conducted by state (rather than federal) inspectors. **Related:** More compliance-related data dashboards from the FDA.

**Academic citations.** Since 2017, the Initiative for Open Citations has urged academic publishers to share their papers’ reference lists as open data. Last month, the group announced it had hit a major milestone: Of the 61 million papers that have references and are indexed by DOI-registrar Crossref, “100 percent […] have made their citations openly available.” You can access the data through Crossref’s API and in bulk through OpenCitations. **Read more:** “Citation data are now open, but that’s far from enough” (Nature). **Previously:** Wikipedia citations (DIP 2018.05.23), biomedical citations (DIP 2019.10.23) and legal citations (DIP 2020.07.15). [h/t Data Science Community Newsletter]

**Old and New Testament locations****.** OpenBible.info’s Bible Geocoding project “(1) comprehensively identifies the possible modern locations of every place mentioned in the Bible as precisely as possible, (2) expresses a data-backed confidence level in each identification and (3) links to open data to fit into a broader data ecosystem.” You can browse by book, chapter and location, as well as download the full dataset. **Read more:** The project’s author explains the backstory and methodology. [h/t Avi Levin]

**University endowments.** Earlier this month, the National Association of College and University Business Officers released the latest of its annual studies of college and university endowments in the U.S. and Canada. For 700-plus institutions, the study’s public tables indicate their total enrollment, endowment market value, previous year’s value and more. A page of historical datasets includes a spreadsheet listing many endowments’ sizes going back to the mid-1970s. [h/t Factle]

**Tech products promoted.** For “The Gamer and the Nihilist,” an essay in Components, Andrew Thompson and collaborators created a dataset of nearly 77,000 tech products on Product Hunt, a popular social network for launching and promoting such things. The dataset includes the name, description, launch date, upvote count and other details for every product from 2014 to 2021 in the platform’s sitemap. (“Based on experience, not every product that appears on Product Hunt seems to appear on the sitemap,” the authors caution.)

*Dataset suggestions? Criticism? Praise? Send feedback to jsvine@gmail.com. Looking for past datasets? **This spreadsheet contains them all**. Visit **data-is-plural.com** to subscribe and to browse past editions.*

^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

From Irwin Altrows comes a problem about a problematic business model:

The Riddler Shirt Store sells *N* kinds of shirts, each kind with a picture of a different famous mathematician. Unfortunately, on average, 80 percent of orders are returned.

That’s because the company’s website has customers order their shirts using a code (from 1 to *N*), but does not state which code corresponds to which shirt. Each customer knows which mathematician — and therefore which shirt — they want.

But to get that desired shirt, they enter a random shirt code and order the corresponding shirt without knowing which mathematician they’ll get. If that shirt depicts the wrong mathematician, they randomly select a different (untested) code, and repeat this process until the desired shirt arrives.

How many different shirts does the store sell?

From Graydon Snider comes a pen pal puzzle:

Graydon is about to depart on a boating expedition that seeks to catch footage of the rare aquatic creature, *F. Riddlerius*. Every day he is away, he will send a hand-written letter to his new best friend, David Hacker.^{2} But if Graydon still has not spotted the creature after *N* days (where *N* is some very, very large number), he will return home.

Knowing the value of *N*, Graydon confides to David there is only a 50 percent chance of the expedition ending in success before the *N* days have passed. But as soon as any footage is collected, he will immediately return home (after sending a letter that day, of course).

On average, for what fraction of the *N* days should David expect to receive a letter?

Congratulations to Peter Norvig of Palo Alto, California, winner of last week’s Riddler Express.

Last week, you were driving from Riddler City to Puzzletown, which are separated by 1,500 miles of highway. On a full charge, your electric car was able to drive 500 miles before it needed a recharge. Fortunately, the highway had charging stations every 250 miles, with the first in Riddler City and the last in Puzzletown.

Once you committed to charging your car at a station, you had to wait until it was fully charged. For the purposes of this riddle, you were asked to assume that charging occured at a constant rate. Also, you assumed that you could comfortably roll into a charging station just as your car ran out of power and recharge it there.

You began your journey with a full charge. Before heading out, you wanted to come up with an itinerary for which charging stations you’d stop at along the way. Being impatient, you also wanted to arrive in Puzzletown as quickly as possible, meaning you wanted to minimize the time spent waiting at charging stations.

How many distinct itineraries were possible?

Between Riddler City and Puzzletown, there were five charging stations (i.e., one less than 1,500 divided by 250). If your goal was to make as few stops as possible, then you should have stopped at the second and fourth charging stations at mile markers 500 and 1,000 respectively. But if you didn’t worry about time spent decelerating into charging stations (which wasn’t explicitly mentioned in the puzzle), other itineraries were possible.

As long as you didn’t skip two charging stations in a row, you’d successfully make it to Puzzletown. But to be as efficient as possible, solver Madeline Argent noted that you *didn’t* want to stop at the last charging station. If you had, then you would have arrived at Puzzletown still able to travel another 250 miles, meaning you wasted time charging your car more than you needed somewhere along the way.

So how many distinct itineraries were there that skipped the last charging station but never skipped two stations in a row? Because you skipped the last charging station, you *had* to charge at the penultimate station. That left eight cases to consider: to charge or not to charge at each of the first three stations (at mile markers 250, 500 and 750). Among these eight cases, three left you in need of tow service: charging at 250 miles but skipping 500 and 750, charging at 750 but skipping 250 and 500, and skipping all three stations.

The remaining **five** itineraries allowed you to reach Puzzletown with minimal charging time. So come on down to Puzzletown!

Congratulations to Stefano Perfetti , winner of last week’s Riddler Classic.

Last week you tried your hand at Anigrams, a game created by Friend-of-The-Riddler Adam Wagner. In the game, you had to unscramble successively larger, nested collections of letters to create a valid “chain” of six English words between four and nine letters in length.

For example, a chain of five words (sadly, less than the six needed for a valid game of Anigrams) could be constructed using the following sequence, with each term after the first including one additional letter than the previous term:

- DEIR (which unscrambles to make the words DIRE, IRED or RIDE)
- DEIRD (DRIED or REDID)
- DEIRDL (DIRLED, DREIDL or RIDDLE)
- DEIRDLR (RIDDLER)
- DEIRDLRS (RIDDLERS)

What was the longest chain of such nested anagrams you can create, starting with four letters?

All valid words had to come from Peter Norvig’s word list (a list we’ve used previously here at The Riddler). And coincidentally, this is the very same Peter who won this past week’s Riddler Express.

Given the incredibly vast space of words and letters, most solvers turned to their computers for some automated assistance. Josh Silverman first developed an algorithm that checked which words could be reached in Anigrams. After listing all the four-letter words, Josh looked at all the five-letter words and noted the ones for which removing a letter resulted in an anagram of a four-letter word. Then, he looked at all the six-letter words and noted the ones for which removing a letter resulted in an anagram of a valid *five*-letter word, and so on. By keeping track of each word’s precursor words, Josh was able to identify the longest chains.

For example, here was the chain Stefano found. For clarity, I have underlined the new letter added with each subsequent word.

- INTO
- NITON
- NITONS
- INTONES
- MENTIONS
- NOMINATES
- ANTINOMIES
- INSEMINATOR (!)
- TERMINATIONS
- ANTIMODERNIST
- DETERMINATIONS
- INTERMEDIATIONS
- INDETERMINATIONS

It turned out there were quite a few chains of this length, but they all ended with either INDETERMINATIONS or UNDERESTIMATIONS, both of which were 17 letters long. That meant the longest chains all had **13 words**.

For extra credit, you had to determine the total number of Anigrams games there were — that is, how many valid sets there were of four initial letters, and then five more letters added one at a time in an ordered sequence, that resulted in a sequence of valid anagrams. Swapping the order of the first four letters was not to be considered as a distinct game.

Again, your computer was your friend here. Solver Robin Christopher Yu wrote some C++ code that counted up all the paths from each set of four letters (that could be arranged to make a valid word) to each reachable set of nine letters (that could similarly be arranged to make a valid word). In the end, the total number of paths was **4,510,515**. With one new game per day, it would take the actual Anigrams game more than 12,000 years to play every variant. And by then, I’m sure there would be a whole bunch of new words added to Peter’s word list!

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading** **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find the** **Sept. 21, 2022, edition**, reprinted with permission at FiveThirtyEight.*

**Labor turnover, biodiversity trends, probabilistic predictions, working artists and Atari emails.**

**Labor turnover.** The monthly Job Openings and Labor Turnover Survey from the U.S. Bureau of Labor Statistics estimates the number of jobs that people quit, how many people were fired or laid off, the number of new hires and the current number of open positions. Those estimates, based on data gathered from a sample of businesses across the country, are available to download and query by state, industry and business size. They include most types of workers, regardless of whether they’re full-time or part-time, permanent or seasonal, salaried or hourly.

**Biodiversity trends.** Maria Dornelas et al.’s BioTIME project has collected and standardized data from hundreds of studies examining ecological communities over time. You can browse and search the studies by year, taxon, species and biome. You can also download the full dataset, which provides information about each study (biome, start/end years, number of species tallied and much more) and each sample collected (date, location, species, abundance and biomass). **As seen in:** “Economic Production and Biodiversity in the United States,” by Yuanning Liang et al.

**Probabilistic predictions.** Metaculus is a forecasting platform whose community has registered more than one million predictions on questions such as “Will a major nuclear power plant in Germany be operational on June 1, 2023?” The website’s API provides data on questions posed, user rankings and other aspects of the platform. For each question, you can see its phrasing, date posed, creator, prediction type, the distribution of predictions and more. **Related:** Zoltar, a forecast archive assembled by Nicholas G. Reich et al. **Previously:** FiveThirtyEight’s assessment of its own predictions (DIP 2019.04.10).

**Working artists.** The National Endowment for the Arts regularly produces statistical profiles of the arts in the United States. The latest, “Artists in the Workforce: National and State Estimates for 2015-2019,” is tabulated from the Census Bureau’s American Community Survey. It provides employment and earning estimates by artistic occupation and demographic. Additional tabulations, including for the country’s 25 largest metro areas, are available through the National Archive of Data on Arts and Culture. [h/t Gary Price]

**Atari emails.** A couple of decades ago, Jed Margolin posted a cache of electronic mail messages from his time as a video game hardware engineer at Atari (and Atari Games, a successor company). In 2017, with Margolin’s permission, Vikram Oberoi scraped the 4,000-plus emails and built atariemailarchive.org, which groups the messages into threads, categories and a list of favorites. The project also includes a database file containing each message’s sender, recipients, timestamp, subject, body and Oberoi’s thread grouping. **Related**: “How I made atariemailarchive.org.”

* **This spreadsheet contains them all**. Visit** **data-is-plural.com** to subscribe and to browse past editions.*

^{1} and you may get a shoutout in the next column. Please wait until Monday to publicly share your answers! If you need a hint or have a favorite puzzle collecting dust in your attic, find me on Twitter or send me an email.

You are driving from Riddler City to Puzzletown, which are separated by 1,500 miles of highway. On a full charge, your electric car can drive for 500 miles before it needs a recharge. Fortunately, the highway has charging stations every 250 miles, with the first in Riddler City and the last in Puzzletown.

Once you commit to charging your car at a station, you must wait until it is fully charged. For the purposes of this riddle, put aside anything you know about electronics and assume that the charging occurs at a constant rate. Also, assume that you can comfortably roll into a charging station just as your car runs out of power and recharge it there.

You begin your journey with a full charge. Before heading out, you want to come up with an itinerary for which charging stations you’ll stop at along the way. Being impatient, you also want to arrive in Puzzletown as quickly as possible, meaning you want to minimize the time spent waiting at charging stations.

How many distinct itineraries are possible?

From Michael Branicky comes a word puzzle that is the G.O.A.T.:

If you like Wordle, then you might also enjoy Anigrams, a game created by Friend-of-the-Riddler Adam Wagner.

In the game of Anigrams, you unscramble successively larger, nested collections of letters to create a valid “chain” of six English words between four and nine letters in length.

For example, a chain of five words (sadly, less than the six needed for a valid game of Anigrams) can be constructed using the following sequence, with each term after the first including one additional letter than the previous term:

- DEIR (which unscrambles to make the words DIRE, IRED or RIDE)
- DEIRD (DRIED or REDID)
- DEIRDL (DIRLED, DREIDL or RIDDLE)
- DEIRDLR (RIDDLER)
- DEIRDLRS (RIDDLERS)

What is the longest chain of such nested anagrams you can create, starting with four letters?

For specificity, all valid words must come from Peter Norvig’s world list (a list we’ve used previously here at The Riddler).

*Extra credit:* How many possible games of Anigrams games are there? That is, how many valid sets are there of four initial letters, and then five more letters added one at a time in an ordered sequence, that result in a sequence of valid anagrams? (Note: Swapping the order of the first four letters does not result in a distinct game.)

Congratulations to Derek Kaplan of Hollidaysburg, Pennsylvania, winner of last week’s Riddler Express.

In this year’s Battle for Riddler Nation, you had to assign 100 phalanxes of soldiers to 10 castles, each worth a distinct number of points. For example, you could assign all 100 phalanxes to a single castle (and none to the others), split them evenly so that there were 10 phalanxes at every castle or arrange them in some other way.

What was the total possible number of strategies you could have submitted?

This is a classic problem in combinatorics. To figure out the total number of strategies, solver Adrian Rodriguez imagined lining up the 100 phalanxes in a row. Somewhere in that very same row, you also have nine (i.e., one less than 10) dividers. Suppose that all the phalanxes before the first divider are assigned to Castle 1, all the phalanxes between the first and second dividers are assigned to Castle 2 and so on. Finally, all the phalanxes after the ninth divider are assigned to Castle 10. Note that this makes it possible for some castles to have no corresponding phalanxes if two dividers are consecutive or if a divider comes before or after all the phalanxes.

The key insight here is that every ordering of phalanxes and dividers corresponds to exactly one strategy for assigning them to castles, and vice versa. So how many ways were there to order 100 phalanxes and nine dividers? Since the phalanxes were indistinguishable from each other, as were the dividers, the number of orderings was 109 choose 9. In other words, among the 109 items (phalanxes or dividers) in the row, you had to choose the nine positions where the dividers went. (Alternatively, you could have chosen the 100 positions where the phalanxes went, which gave you an equivalent result since 109 choose 9 and 109 choose 100 were equal.)

In the end, 109 choose 9 was equal to 109!/(100!·9!), or **4,263,421,511,271**. In a battle of these more than four trillion strategies, I wonder which strategy would come out on top.

Congratulations to Izumihara Ryoma of Toyooka, Japan, winner of last week’s Riddler Classic.

Last week, an image of cookies on social media made me wonder how many overlapping circles were needed to fill up a rectangular tray.

More specifically, suppose you had a tray that was a unit square (i.e., with side length 1). Now, if you had four identical circles (i.e., cookies) that could overlap, they needed a radius of at least 0.25·√2 to completely cover the square, as shown below:

For last week’s puzzle, instead of four identical circles, you had *five*. What was the minimum radius they needed to completely cover the unit square?

It might have been tempting to maintain the symmetry of the four-circle arrangement shown above by inserting a fifth circle in the middle. But in that case, each of the four outer circles still would have had to pass through the midpoints of two of the square’s sides, meaning they couldn’t get any smaller. Therefore, the arrangement had to be a little less symmetric.

Solver Tom Keith gained some intuition about the optimal arrangement with computer assistance, placing small circles at five random points and letting them grow at the same pace, adjusting their positions to increase the total area of the square covered with each step. In the end, Tom’s circles each had a radius of about **0.326**:

Meanwhile, solvers Mike Strong and Paige Kester found a trove of solutions for different numbers of circles, courtesy of Erich Friedman. The case of five circles covering a square was proved by Aladár Heppes and Hans Melissen in 1997. The solution turned out to be slightly less than 1/3.065, or (again) about 0.326.

As shown by solver Mark Girard, the minimum radius that resulted in a completely covered square was the solution to a sixth-order polynomial equation: 64*x*^{6} − 144*x*^{5} + 209*x*^{4} − 196*x*^{3} + 154*x*^{2} − 92*x* + 21 = 0.

For extra credit, you had *six* identical circles that can overlap. The tempting answer again involved (too much) symmetry: splitting the square into a grid of two by three congruent rectangles, and then covering each of those with its own circle. This resulted in a radius of (√13)/12, or about 0.3004. But by tilting the relative orientation circles, it was possible to do even better. Indeed, Heppes and Melissen found a radius less than 0.2988 back in 1997. Mark also found the work of Kari Nurmela and Patric Östergård in 2000, which included the following illustration for five and six circles, highlighting radii to points of intersection for each:

This six-circle covering had a radius of approximately 0.2987.

Anyway, all this computation has only served to fuel my appetite … for cookies!

Email Zach Wissner-Gross at riddlercolumn@gmail.com.

]]>*You’re reading **Data Is Plural**, a weekly newsletter of useful/curious datasets. Below you’ll find an updated version of the **Sept. 14, 2022, edition**, reprinted with permission at FiveThirtyEight.*

*An official Congress API, voter-ID laws, local internet speeds, the Getty Provenance Index and space weather.*

**Congress gets an official API.** The U.S. legislative branch now has an official API, the Library of Congress announced last week. It provides structured data on legislators, bills, bill summaries, amendments, committee reports, appointee nominations, international treaties and more. To use the service, you’ll need to sign up for a free API key. **Read more**: Some context from the Congressional Data Coalition. **Related**: The Government Publishing Office’s bulk data on bills and bill summaries (DIP 2020.08.26) and ProPublica’s Congress API. [h/t Jackie Kazil]

**Voter-ID laws.** As part of his Ph.D. research, Tom Barton has constructed a dataset of voter-identification requirements around the world. For each country and U.S. state, the dataset indicates the type of identification needed to vote (a photo ID, a non-photo ID or just basic personal details), how many forms of identification the voter must provide and more. **Related**: A detailed examination of U.S. voter-ID laws from the National Conference of State Legislatures. **Previously**: Recent U.S. voting legislation tracked by the Voting Rights Lab (DIP 2022.07.20).

**Local internet speeds.** If you’ve tried measuring your internet speed, you might have used Ookla’s testing service. The company releases a series of quarterly datasets that summarize the test results, aggregated into geospatial tiles. (The tile sizes vary by latitude but measure roughly 2,000 feet by 2,000 feet at the equator.) For each tile, the datasets list the number of tests performed and devices tested, plus the average download speed, upload speed and latency. **As seen in**: “Europe’s internet speeds are faster than ever, but not for everyone” (European Data Journalism Network) and associated dashboard. **Previously**: Internet speeds from the Measurement Lab (DIP 2019.05.22). [h/t Federico Caruso]

**Art sales and ownership.** The Getty Provenance Index contains more than 2.3 million historical records relating to the sale and ownership of artworks, searchable online by artist, owner, auction house, date and other characteristics. As part of an ongoing project to remodel the index, the Getty Research Institute has also published a subset of the records as open data. The largest of those datasets describes more than 1 million sales in Europe from the 1600s to 1945. [h/t Richard E. Spear]

**Space weather.** The National Oceanic and Atmospheric Administration’s Space Weather Prediction Center keeps tabs on a variety of phenomena, such as coronal mass ejections, geomagnetic storms and solar flares. The agency’s reports and datasets include three-day forecasts, sunspot predictions, solar wind observations and more. The agency also provides dashboards for several audiences, such as the aviation community, emergency management and “space weather enthusiasts.” [h/t oblib]

*This spreadsheet contains them all**. Visit **data-is-plural.com** to subscribe and to browse past editions.*