poisoning.ai
Reliability

Do AI poisoning and cloaking tools actually work?

By The Poisoning.ai team
6 min read
Contents

AI art-protection tools work as a deterrent, not a guarantee. First-generation cloaks like Glaze, Mist and Anti-DreamBooth post strong numbers in their own papers, but independent researchers have since stripped them with cheap, off-the-shelf steps; a second generation claims to survive removal but has not been independently tested; and the removal methods keep improving. Here is what each tool defends against, and what breaks it.

Which tools actually work? (scorecard)

ToolDefends againstWhat breaks itStatus (2026)
GlazeStyle mimicry (cloak)Robust mimicry; IMPRESS purificationBroken in independent tests (Hönig et al., ICLR 2025)
MistTool-agnostic style cloakRobust mimicry (noisy upscaling)Broken in independent tests (Hönig et al., ICLR 2025)
Anti-DreamBoothFace / subject fine-tuningRobust mimicry (noisy upscaling)Broken in independent tests (Hönig et al., ICLR 2025)
PhotoGuardAI img2img / inpainting editsJPEG re-encode; purificationBroken (Sandoval-Segura et al., 2023)
NightshadePoisons scrapers (concept)LightShed depoisoningBroken in independent tests (Foerster et al., USENIX 2025)
Second generation (BlurGuard)Purification-resistant protectionNone demonstrated yetSelf-reported; no independent bypass

The pattern is the headline: the tools that have been independently tested have mostly been beaten, and the tools that claim to resist have not yet been independently tested.

How well do the tools work on their own terms?

On their own benchmarks, the first-generation tools are strong. Glaze (Shan et al., USENIX Security 2023) reports an artist-rated protection success rate of 94.3% on Stable Diffusion for current artists at its default perturbation budget, and describes cloaks that “apply barely perceptible perturbations to images, and when used as training data, mislead generative models that try to mimic a specific artist.” Nightshade (Shan et al., IEEE S&P 2024) is the offensive counterpart: “a prompt-specific poisoning attack optimized for potency that can completely control the output of a prompt in Stable Diffusion’s newest model (SDXL) with less than 100 poisoned training samples.” Those are real results. The open question is whether they hold once someone tries to remove them.

Why do first-generation tools fail?

Because the protection is a fragile additive perturbation, and cheap pre-processing removes it. In a peer-reviewed study, Hönig et al. (ICLR 2025) found that Glaze, Mist and Anti-DreamBooth “are ineffective when faced with simple robust mimicry methods,” and that “all existing protective tools create a false sense of security.” Their best-of-4 attack, drawn from four low-effort methods including noisy upscaling, pushed copies to a quality success rate of 56.6% for Glaze, 56.6% for Anti-DreamBooth and 62.0% for Mist, where 50% would indicate a copy indistinguishable from one trained on unprotected art. Purification tells the same story: IMPRESS (Cao et al., NeurIPS 2023) restored a Glaze-protected style-mimicry task from 42.5% to 87.0% classifier accuracy, and PhotoGuard’s edit-blocking perturbations “are not robust to JPEG compression, which poses a major weakness because of the common usage and availability of JPEG” (Sandoval-Segura et al., 2023). And purification generalises: PurifyOnce (Zhao et al., 2026), tested across 2,100 editing tasks and six representative protection methods under model mismatch, improves edited-image quality by 3 to 6 dB PSNR and cuts FID by 50 to 70%, a “purify-once, edit-freely” failure mode. The poison side is not exempt either: LightShed (Foerster et al., USENIX Security 2025) is “a generalizable depoisoning attack that effectively identifies poisoned images and removes adversarial perturbations,” reporting a 99.98% true-positive rate at detecting Nightshade while depoisoning it, and it generalises to Glaze as well. The full breakdown is in does Glaze actually work? and can Glaze and Nightshade be bypassed?.

Is the question settled?

No. A second generation of protections is being built specifically to survive purification. BlurGuard (Kim et al., NeurIPS 2025) reshapes the frequency spectrum of the protective noise so simple removal steps cannot strip it, and reports retaining 92.9% of its protective efficacy after a worst-case purification stack, versus 38.5% to 48.4% for earlier methods. Other 2025 work trains the perturbation directly against ensembles of purifiers and upscalers for the same reason. The catch: these tools all postdate the bypass papers, and none has been independently bypassed or confirmed, so their durability is self-reported. “First generation broken, second generation unproven, removal improving” is the state of play.

What should artists actually do?

Protect your work, layer it, and treat no single tool as permanent. A cloak (Glaze or Mist) on every upload still raises the cost for a casual copier; Nightshade pushes back at scale; PhotoGuard (Salman et al., ICML 2023) or Anti-DreamBooth (Van Le et al., ICCV 2023) address editing and face cloning; and non-technical steps like opt-outs, smaller uploads and a clean original kept as proof cost nothing. The step-by-step is in how to protect your art from AI training.

Every independently tested protection so far has fallen to an attack cheaper than the one it was designed to stop, and the tools that have not fallen are the ones no one has published an attack on yet. That is the reading for 2026: these tools are worth using as deterrents, not locks. Treat protection as a moving target, layer it, re-cloak when better tools appear, and keep proof of authorship for when a cloak fails. Start with Glaze and Nightshade, explained, then compare the two.

Sources

  • Shan, Cryan, Wenger, Zheng, Hanocka, Zhao (2023). GLAZE: Protecting Artists from Style Mimicry by Text-to-Image Models. USENIX Security 2023.
  • Shan, Ding, Passananti, Wu, Zheng, Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. IEEE S&P 2024.
  • Liang, Wu (2023). Mist: Towards Improved Adversarial Examples for Diffusion Models.
  • Salman, Khaddaj, Leclerc, Ilyas, Mądry (2023). Raising the Cost of Malicious AI-Powered Image Editing (PhotoGuard). ICML 2023.
  • Van Le, Phung, Nguyen, Dao, Tran, Tran (2023). Anti-DreamBooth: Protecting Users from Personalized Text-to-Image Synthesis. ICCV 2023.
  • Cao, Li, Wang, Jia, Li, Chen (2023). IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI. NeurIPS 2023.
  • Hönig, Rando, Carlini, Tramèr (2025). Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI. ICLR 2025.
  • Sandoval-Segura, Geiping, Goldstein (2023). JPEG Compressed Images Can Bypass Protections Against AI Editing.
  • Zhao, Zhai, Bai, Shen, Lin, Gao, Wu (2026). Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch.
  • Foerster, Behrouzi, Rieger, Jadliwala, Sadeghi (2025). LightShed: Defeating Perturbation-based Image Copyright Protections. USENIX Security 2025.
  • Kim, Nam, Kim, Kim, Jeong (2025). BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing. NeurIPS 2025.
#reliability#glaze#nightshade#overview
Get new protection tests & guides

New protection tests, breakdowns and how-long-does-it-hold checks. No spam, unsubscribe anytime.