Skip Navigation

Technology @beehaw.org

beep @piefed.world

2w ago

Anthropic: Claude learned to blackmail by reading stories about evil AI—"We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation."

alignment.anthropic.com

Teaching Claude Why

cross-posted from: https://piefed.world/c/tech/p/1099863/anthropic-claude-learned-to-blackmail-by-reading-stories-about-evil-ai-we-believe-the-or

Quote Source.

Comments

10

Load comments

Comments

10