When evaluating options there is usually uncertainty in our estimates of outcomes. One way these manifest is the Winner’s curse. How does this work?
Well imagine we do the obvious thing. We do research on tens or hundreds of charitable interventions, estimate the expected value for each, and then pick the intervention with the largest one. What could go wrong? A nasty combination of selection and uncertainty. By picking the best charity we are likely to pick a charity that is highly overestimated in its effectiveness. That’s fine right, we just temper our expectations? Well not quite. Our interventions are likely to have significantly different levels of uncertainty. As a result we are likely to pick a highly uncertain estimate which in turn is likely to be highly overestimated.
Consider a world with 100 interventions that are identically good. We are able to estimate 50 of them with great accuracy and 50 with very limited accuracy. If we pick the largest of these estimates, we are almost certain to pick one of the highly uncertain ones. Now consider if the less accurate estimates were a bit less effective, we’d still be very likely to pick one of them. This should illustrate how selection effects similar to the winner’s curse push us to penalize highly uncertain estimates.
Now to an illustration of where we might look favorable towards uncertain estimates, the more incremental world of bandit algorithms. A bandit is a lever we can pull and gives us some reward. Often we are considering a multi-armed bandit with multiple options. We have multiple levers to pull, which will give each us an uncertain reward from some distribution. We need to balance exploiting the lever we thing is the best against exploring the other less certain levers. Here, we are making a series of donations or interventions over time. Now in contrast to the single donation case, we want to look at under investigated interventions that may have a worse expected value with the expectation that after more trials we may find it to have a higher expected value than our current best intervention. Here we will go for a uncertain intervention even if it has a worse expected value to get more information while before we would be cautious to use a highly uncertain intervention even if it has a better expected value. Bandit algorithm effects push us to seek out highly uncertain estimates.
Now how do we reconcile these? Well uncertainty comes from different places. Some or much of this can be mitigated by trial. A highly uncertain anti-poverty measure will likely be more clear after enough RCTs and funding. In this case I think we can largely lean towards a bandit oriented view. There are some issues, the world is changing so some interventions may change in effectiveness over time and the distribution is certainly not fixed but largely when we do more of an intervention we usually get more information about effectiveness.
I think my great fear about long termism and less certain interventions is that largely we don’t get as much information about effectiveness over time. Maybe this is just a feeling though, as we do more research into AI and existential risk we will reduce the uncertainty but I’m pessimistic that we will reach a wall and no be able to reduce the uncertainty further.