Article Image

IPFS News Link • Robots and Artificial Intelligence

AI Can Now Unmask Anonymous Internet Users, New Study Finds

• https://www.zerohedge.com, by Tyler Durden

That's according to a new study by Simon Lermen (MATS), Daniel Paleka (ETH Zurich), Joshua Swanson (ETH Zurich), Michael Aerni (ETH Zurich), Nicholas Carlini (Anthropic), and Florian Tramèr (ETH Zurich), published on arXiv.

In the paper, "Large-Scale Online Deanonymization with LLMs," the researchers show that modern large language models (LLMs) can re-identify people behind pseudonymous online accounts at a scale and accuracy that far surpass previous techniques.

The core contribution is an automated deanonymization pipeline powered by LLMs, according to the new study. Instead of relying on structured datasets or hand-engineered features—like earlier attacks on the Netflix Prize dataset—the system works directly on raw, unstructured text.

Given posts, comments, or interview transcripts written under a pseudonym, the pipeline extracts identity-relevant signals, searches for likely matches using semantic embeddings, and then uses higher-level reasoning to verify the most promising candidates while filtering out false positives. The result is a scalable attack that mirrors—and in some cases exceeds—the effectiveness of a dedicated human investigator.

To evaluate their approach, the researchers constructed three datasets with known ground truth. The first links pseudonymous Hacker News users to real-world LinkedIn profiles, relying on cross-platform clues embedded in public text. The second matches users across movie discussion communities on Reddit. The third takes a single Reddit user's history, splits it into two time-separated profiles, and tests whether the system can reconnect them.

Across all three settings, LLM-based methods dramatically outperformed classical baselines, which often achieved near-zero recall.

The headline numbers are striking. In some experiments, the system achieved up to 68% recall at 90% precision—meaning it correctly identified a substantial portion of targets while keeping false accusations low. Even when matching temporally split Reddit accounts separated by a year, performance remained strong. In contrast, traditional non-LLM approaches struggled to produce meaningful matches. The findings suggest that advances in reasoning and representation learning have transformed deanonymization from a niche, data-hungry attack into a broadly applicable capability.

The study says that a key concern is that the attack pipeline is composed of individually benign steps: summarizing text, generating embeddings, ranking candidates, and reasoning over matches. No single component appears inherently malicious, making it difficult to detect or restrict through conventional safeguards. Moreover, the study finds that increasing model reasoning effort improves deanonymization performance, implying that as frontier models become more capable, the attack may become even more effective by default.

The broader implication is that "practical obscurity"—the idea that scattered, pseudonymous posts are safe because linking them is too labor-intensive—may no longer hold.


ppmsilvercosmetics.com/ERNEST/