poisoning.ai
Scraping

Anti-scrape poisoning and opting out for your voice

By The Poisoning.ai team
5 min read
Contents

Two independent levers fight a model training on your voice, and each defends against a different opponent: you can cloak what a scraper takes so a clone learns the wrong voice, and you can opt out so a scraper that respects rules skips you. Cloaking does nothing to a crawler that already honors your opt-out, and opt-out does nothing to a crawler that ignores it, which is why real protection uses both and treating either one as enough leaves half the threat uncovered.

Lever A: cloak what a scraper takes

Active protection embeds an imperceptible, engineered perturbation that corrupts what a model would learn from your recording. The best-documented voice cloak is AntiFake from Yu, Zhai and Zhang (ACM CCS 2023), described by its authors as “a defense mechanism that relies on adversarial examples to prevent unauthorized speech synthesis.” It won the US Federal Trade Commission’s Voice Cloning Challenge in 2024 and reports “over 95% protection rate even to unknown black-box models” on its own benchmark. DeFake is the same team’s public product name for it, so the two are one tool rather than two competitors.

There is more than one tool in this lever, because a voice can be attacked in more than one way. VoiceBlock, from O’Reilly, Bugler, Bhandari, Morrison and Pardo (NeurIPS 2022), applies its perturbation “in real-time on a single CPU thread” and targets live speaker recognition rather than offline cloning. VoiceCloak, from Hu, Wu, Lu and Luo (AAAI 2026), extends the cloak to diffusion-based voice conversion, the newer synthesis route, so treat it as recent and less independently tested. SafeSpeech, from Zhang and colleagues (USENIX Security 2025), is a second-generation protection titled “Robust and Universal Voice Protection Against Malicious Speech Synthesis” and is built specifically to survive removal, with no independent break reported yet. The practical point is that this is an active field, so pick a maintained tool and expect to reapply it as attacks evolve.

Lever B: opt out so compliant trainers skip you

Passive protection is a request that well-behaved crawlers honor, and it has two parts. First, a robots.txt on any site you control that names the training crawlers, including GPTBot, Google-Extended, and CCBot (Common Crawl); a compliant bot reads it before it fetches and skips the disallowed paths. Second, a Do-Not-Train registry such as Spawning’s, which participating trainers query to drop opted-out works before training begins. The two work in opposite directions, one read by the crawler and one queried by the trainer, and neither alters the voice file itself; each simply tells a compliant actor not to take it. The full registry sign-up and robots.txt walkthrough is in anti-scrape data poisoning and opting out.

Keep the clean recording private

Neither lever protects a file you never should have published. A cloak only guards the copy it is applied to, and an opt-out only speaks for data you control, so the clean, uncloaked recording, the raw session audio, or the unprocessed voice reel is the one asset worth keeping off the public web entirely. It is the material a purifier most wants, and the one copy no tool can retract once it is out. The practical split is simple: publish the smallest useful public sample, cloak that sample where you can, and keep clean originals out of public folders.

The hinge: why neither is enough alone

The two levers cover mirror-image blind spots. Shan, Ding, Passananti, Wu, Zheng and Zhao (IEEE S&P 2024) are explicit that their Nightshade poison exists for “web scrapers that ignore opt-out/do-not-crawl directives,” which is exactly the opponent an opt-out can never reach. Opt-out, in turn, handles the compliant trainer that a cloak never needs to touch. Nightshade is an image poison rather than a voice tool, but its threat model is the general one: if the scraper ignores the sign on the door, the content-side cloak is what remains, and if the scraper is compliant, opt-out can stop the fetch before any cloak has to matter.

LeverBites whomExample toolsGap it leaves
CloakScrapers that take your recording anywayAntiFake, VoiceBlock, SafeSpeechNo effect on a crawler that already skips you
Opt-outScrapers that obey rulesrobots.txt, Spawning registryNo effect on a crawler that ignores rules

The limits

Cloaks are removable. Fan, Chen, Liu, Zhang and Yu (ICML 2025) show with De-AntiFake that “existing purification methods can neutralize a considerable portion of the protective perturbations,” and warn that a protected speaker can be left with a “false sense of security.” That does not make voice cloaks worthless; it means they raise cost and reduce easy abuse while a determined attacker may try to clean the perturbation away, which is exactly why the second-generation SafeSpeech is built around surviving removal. Opt-out has the opposite weakness: robots.txt, ai.txt and a registry are instructions honored only voluntarily, so a non-compliant scraper simply ignores them. The two levers are complementary because their failure modes are opposite, so layer both, keep your cleanest recording off the public web, and treat each tool as raising cost rather than guaranteeing exclusion. For the step-by-step protection routine, see how to protect your voice from AI cloning; for the auditing side, did they train on my voice; and for the wider picture, do AI poisoning tools actually work.

Sources

  • Yu, Zhai, Zhang (2023). AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis. ACM CCS 2023.
  • O’Reilly, Bugler, Bhandari, Morrison, Pardo (2022). VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models. NeurIPS 2022.
  • Hu, Wu, Lu, Luo (2026). VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning. AAAI 2026.
  • Zhang, Wang, Yang (2025). SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis. USENIX Security 2025.
  • Shan, Ding, Passananti, Wu, Zheng, Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. IEEE S&P 2024.
  • Fan, Chen, Liu, Zhang, Yu (2025). De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks. ICML 2025.
#data-poisoning#opt-out#scraping#voice#antifake#robots-txt
Get new protection tests & guides

New protection tests, breakdowns and how-long-does-it-hold checks. No spam, unsubscribe anytime.