On growth: there was a brilliant little exchange between Sam Bowman and Sam Freedman on Twitter there recently (here and elsewhere: https://twitter.com/s8mb/status/1623708501481586691?t=EEG7ix0Dm23rXRz1jV78tw&s=19), where Bowman advocated for quite a libertarian childcare policy (roughly just 'hand cash to parents') and Freedman pointed out that it would be better for economic growth to instead invest that cash in the childcare sector. I'm not sure to what degree Freedman actually endorsed this policy vs to what degree it was a gotcha, but it is at least a successful gotcha: I think Bowman is entirely right on the first-order policy merits, but Freedman is right that many of the benefits of 'hand parents money' would not show up in growth statistics! (I also think Freedman is completely right that certain intellectual interests of the current young British "pro-growth" right are motivated by unspoken values, and their political hobbyhorses look a little odd if you accept their face-value claim that they primarily care about growth - to give another interesting example, a huge number of them are very anti-immigration, a crazy position to take if you genuinely really do care about GDP growth.)

On Aella: I think in some regards Aella's followers are sufficient weirdos to *reverse* certain trends in the general population, and sometimes when I see an Aella poll it makes me update a teeny tiny infinitesimal amount in the *opposite* direction to her conclusion. (The reasoning being something like, 'this thing isn't talked about publicly; thus, on priors either this thing is relatively widespread but shamed, or is rare and concentrated in a small subset of weirdos; thus, evidence that it's widespread among weirdos is ipso facto weak evidence that it's not prevalent among normies'.) This is entirely compatible with the "motte" of the Aella-defenders, that you should make small Bayesian updates based on her polls - if two people's likelihoods differ, then the strength and direction of their ideal Bayesian updates can differ to arbitrary degrees - but is incompatible with their "bailey", which is that she is rightly treated as a "sex researcher" and her Twitter polls are basically analogous to academic research. This latter claim is rooted in a deep rationalist scepticism towards the norms and institutions of science, which is as old as rationalism itself (see Eliezer Yudkowsky's discussion of "science v Bayes": https://www.lesswrong.com/posts/viPPjojmChxLGPE2v/the-dilemma-science-or-bayes). This is the subtext to, say, Zvi's entire post on twitter polls (https://thezvi.substack.com/p/twitter-polls-evidence-is-evidence) - even as he's careful to almost never stray from the motte in his explicit claims, because in fact Aella-esque Twitter polls say exactly fuck all about how sceptical we should be towards academic research.

Expand full comment

> a huge number of them are very anti-immigration, a crazy position to take if you genuinely really do care about GDP growth.

Are they *really* anti-immigration - or just anti- a certain type of immigration? Are they against immigration by Americans, for example?

If you genuinely really do care about GDP growth, you ought to care about the productivity of the types of immigrants that you are importing. And some immigrants really do have negative productivity, in the sense that they take more from the state that they ever produce (e.g. "get arrested for committing crimes, get imprisoned and have your meals paid for by the state, then get deported"). This is not the type of immigration we should want if we care about growth.

Expand full comment

> But when she does sex surveys, we do have reasons to think that the data is probably really weird in a load of ways. Aella is a sex worker with weirdo followers (I don’t mean weirdo in a derogatory sense here). So, the data is likely to be weird in all sorts of ways. It’s better than no data, sure, but I don’t think we should assume that the correlations are going to hold among the general public at all.

It's not like this is a hidden confounder though. You can update your beliefs according to the survey data, adjusted for the fact that it likely skews kinky.

If Aella did the clown survey and found that 9/10 prefer pie, maybe I'd shift to a 95% chance that I'm in the clowns-love-pie world, because that would be an unexpected result given her demographic. If her results showed that 9/10 prefer anal sex, maybe I shift to a 55% chance that I'm in the clowns-love-anal world, because that result would be more or less what if expect from her surveys.

Expand full comment

How do you do this? It seems like you're assuming there's some nice numerical algorithm for adjusting for this confounding. In academic studies, there are indeed a few ways to adjust numerically (albeit none of them all that great); but in the case of Aella polls this is all really noisy, and we actually don't know exactly how 'weird' Aella's audience is or what all of the possible confounders are, so how do you know how much to adjust? The correct update from an Aella poll could be large and positive, small and positive, or even (as I mention in my other comments) negative; while in some very simple cases, like the clown one, we can take a good guess at how we should update, I think you're massively oversimplifying the difficulty of quantifying any of these effects in the absence of large, representative, accurate data. (And if we had large, representative, accurate data, we wouldn't need Aella!) It's all just subjective best guesses. And yeah, sure, the standard Bayesian point is that this isn't a criticism when *all we have* are subjective best guesses; but it does mean that we should update only a very very small amount on Aella polls, given that we live in the actual world and not stylised clown world.

Expand full comment

> How do you do this? It seems like you're assuming there's some nice numerical algorithm for adjusting for this confounding.

Not at all, in the real world I don't think you need to convert the noise into actual numbers. The numbers just help with clarity in the contrived example.

> but it does mean that we should update only a very very small amount on Aella polls

I'm not sure we're disagreeing on anything then? I agree with everything you've said, including this. Unless we have different definitions for very very small.

That is unless Aella's poll returned an unexpected result. Aella's polls (depending on the poll) might be more usefully interpreted as providing a loose upper bound on a question. Like, let's say that I believe that around 95% of people are into S&M. Then, hypothetically, Aella does a poll that finds that 5% of people are into S&M. Given that we can be nearly certain that Aella's responders are kinkier than average, it would be extremely surprising if the real number was much higher than 5%. So I should adjust my estimation way down, even if I don't think the poll was very representative.

On the other hand, if my previous estimation was that 1% of people were into S&M then I probably shouldn't adjust my estimation much at all, because it's entirely plausible that Aella's audience is 5x more into S&M than average. The poll hasn't really taught me all that much in this case.

If you're able to find another source that you know is biased in the opposite direction then you might even be able to approximate an upper and lower bound, thus narrowing in on the true answer using multiple biased sources.

This debate tends to go back and forth with "it's useful data" or "it's not useful data", and it's not clear to me whether there is any empirical disagreement, or if people just have different thresholds for what they call "useful data". To ground this in some kind of reality: if someone used data of this quality to justify some kind of medical policy, I would say that's bad. If someone was planning to start a porn production company and wanted to know which niches to focus on, this would be quite good data to use. And if someone wanted to know how weird they are for having a certain fetish this would be a decent source of data, so long as one treated it as a loose upper bound instead of taking it at face value.

Expand full comment

I'm not sure if this is the case with Peter above, but one objection I've seen to the attempt at explicit quantification is really an objection to false precision. When some people say, "update by 5%", they don't necessarily _mean it_ in an exact sense, they just "small update." It could really be 4 or 6, or whatever. We are talking about order of magnitudes and rough estimates. The utility in putting out a specific number, is that it forces a concrete claim rather that a bunch of squishy qualifiers and caveats in more verbose prose. But for some people in really sticks in the craw that the number is in some sense just arbitrary in its specificity. It raises hackles: how do you know you shouldn't update by 4.5%? Yet that is a beside the point and people end up talking past each other.

Expand full comment

Hmm, but I'm not sure I've ever seen 95%-level surprising data from Aella? I've seen some data points that are maybe 10-20 points higher or lower than I'd expect them to be, but (as you say) that's entirely to be expected with selection effects + random noise; if we're speaking numerically, the Bayes factor is barely above 1. I'm hardly an Aella scholar, I don't follow everything she does, but this seems to be the trend.

Like yes, if Aella found something truly shocking, like that 95% of people into S&M, it would be an important update. But we have good reasons to expect she won't find anything like this, both because (a) on priors it seems deeply unlikely that she'll get anything like that and (b) she's been doing this for a while and hasn't found anything.

And likewise, following on, her data would only be useful for bounding our estimates if it was substantially different from existing assumptions. But when all of Aella's data is well within the bounds given by common sense intuition (after adjusting for confounding and selection bias), then she's not adding anything. In other words: before I look at her data I already have enough knowledge to make a decent guess at upper and lower bounds for 'percent of the population that's into S&M', and unless Aella's data is substantially different from these prior estimates - which it isn't - she's not adding anything of value.

So on your final point: 'good data' is obviously a partly normative concept, so I don't mind that much of the disagreement here is seemingly normative. If the only use-cases for Aella data are 'alternative to search data for porn companies' and 'way for people anxious about their fetishes to avoid self-reflection', then yeah, I don't think that's good data - especially if there's any inductive risk involved, which I think there is. She herself seems to have loftier goals in mind than those. But even ignoring this, it seems to me that the strategy of just going with one's prior, or using other existing sources of data (e.g., existing data on porn searches, academic surveys, even the Kinsey reports) isn't really at a disadvantage compared to using Aella's data *even in the use-cases you cite*. That's perhaps the empirical disagreement.

Expand full comment

On the Aella thing, I'm pretty sure all her 'headline results' are taken from one large kink survey that went viral, and has around 480,000 respondents. It seems reasonable to draw confident conclusions from something like that.

Expand full comment
Feb 19, 2023·edited Feb 19, 2023Liked by Sam Atis

Idk, a large but unrepresentative sample can in some cases be even worse than a small unrepresentative sample! Here's the reasoning I discussed in my comment below, made a little bit more explicit:

Say there's some sexual fetish that is not often discussed publicly or shown in movies etc. Suppose there are two possible explanations: a) it's relatively widespread and evenly distributed, but socially degraded; b) it's concentrated among a relatively small community of otherwise kinky bastards.

If Aella does a small poll that shows quite a lot of people have this fetish, this is relatively inconclusive evidence - it could go either way. But if she does a large poll with lots of respondents, *almost all of whom are kinky bastards*, that shows that a lot of people have this fetish and that it's pretty evenly distributed among respondent's - then this is strong evidence for b), showing that a lot of kinky bastards have the fetish. And since b) and a) are competing explanations, this means it's very strong evidence *against* a). A poll with the headline result "this kink is widespread!" should actually make us conclude that it's *not* widespread.

Obviously this is highly stylised, irl there would be more possible hypotheses and the difference between them would be a matter of degree rather than kind, and the effects would all be way smaller. But the general reasoning holds: if some subset of the population is anti-correlated with the population as a whole to even a small degree, then if you have evidence that P is true drawn entirely from the subpopulation, this should make you more convinced of not-P. And the better quality your studies, the *more* that you should look at evidence saying P and conclude not-P. I think there's some anti-correlations between the weirdos who might follow Aella and the wider population; they're incredibly weak and shouldn't push us too far in any direction, but they do exist.

EDIT: I think this is what's going on in the example Sam linked: during the Great Depression, rich people had political preferences that were actively anti-correlated with those of the population as a whole; so seemingly-strong evidence for a Republican victory, drawn mostly from a sample of rich people, should actually make us think a Democrat victory is more likely.

Expand full comment

That makes sense. I'm a layman when it comes to this stuff, if it isn't already clear. I suppose that effect could be detected by other questions in the survey?

Expand full comment

This is one reason replication is important, even with different sample sizes.

Expand full comment

You could try to bound it, but it would be hard to get it exactly? Because you don't typically know ahead of time what the relevant correlations are, at least not exactly. You can make a guess at it but really it's all very noisy.

Expand full comment