Please turn JavaScript on
Lesswrong icon

Lesswrong

Subscribe to Lesswrong’s news feed.

Click on “Follow” and decide if you want to get news from Lesswrong via RSS, as email newsletter, via mobile or on your personal news page.

Subscription to Lesswrong comes without risk as you can unsubscribe instantly at any time.

You can also filter the feed to your needs via topics and keywords so that you only receive the news from Lesswrong which you are really interested in. Click on the blue “Filter” button below to get started.

Title: Lesswrong

Is this your feed? Claim it!

Publisher:  Unclaimed!
Message frequency:  14.87 / day

Message History

For those who are trying to bring about a glorious transhuman utopia with the help of hopefully-aligned ASI, I think it's worth thinking explicitly about what utopia might actually look like and where it's likely to fall short.

To that end, some have helpfully written depictions of utopian (or utopia-adjacent) worlds:


Read full story

When people ask what Fundamental Uncertainty is about, I usually say it’s a book about epistemology. If they want to know more, I say it’s a book arguing that truth is grounded not in observation or more truth, but in usefulness, and because what’s useful depends on what we ...


Read full story
TLDR:Frontier models can detect when they're being evaluated and change their behavior, which risks compromising safety benchmarks. We introduce LURE (Live-Usage Replay Evaluations), a method that constructs alignment evals by replaying realistic conversations and appending a safety-relevant test at the end, rather than building evaluation scenarios from scratch, as other evals ...

Read full story

AI Safety veteran Holden Karnofsky thinks there’s a 49% chance his actions are making things worse.[1]

In 2025, Jesse Clifton even step...


Read full story

This is a linkpost of a recording of a recent MATS research talk where I argue that the automation of AI research — which OpenAI and Anthropic say is imminent — could lead to an unrecoverable alignment failure. Three properties make it especially dangerous: oversight breaks down at scale, capabilities self-amplify, and capabilities will be sped up asymmetrically faster than a...


Read full story