How HarmonyCloak makes songs unlearnable

A song is “unlearnable” when a generative model can train on it and still come away with nothing it can reproduce. That is a stronger claim than “distorted.” HarmonyCloak does not scramble a track so it sounds wrong; it adds an inaudible layer of error-minimizing noise that makes the model’s training signal collapse, so the model concludes there is nothing left to learn and moves on. Meerza, Sun, Liu (IEEE S&P 2025) describe it as “the first defensive mechanism … to prevent the exploitative use of artwork, specifically in the context of instrumental music, by generative AI models.” The single lever that produces that effect is worth understanding on its own, because it explains both why the protection works and exactly where it stops.

What does making a song “unlearnable” actually mean?

Most protective perturbations are adversarial in the ordinary sense: they push a model toward a wrong answer. Unlearnable examples do the opposite. Instead of making the model fail loudly, they make the thing the model is trying to learn quietly disappear. Meerza, Sun, Liu (IEEE S&P 2025) build HarmonyCloak to “employ imperceptible error-minimizing noise to make the model’s generative loss approach zero … tricking the model into believing nothing can be learned.” When a model’s loss on your track is already near zero before it has learned anything, the optimizer sees almost no gradient to follow, makes almost no update, and your track contributes almost nothing to what the model becomes.

Noise objective	Effect on a model that trains on the track
Error-maximizing	Model is pushed toward a wrong output
Error-minimizing (HarmonyCloak)	Training loss approaches zero, so nothing is learned

Why error-minimizing noise, and not just distortion?

Audible distortion would be the wrong tool twice over. It would ruin the one thing a musician cannot give up, the sound of the record, and a big obvious corruption can still leave a perfectly learnable signal behind. Error-minimizing noise is engineered in the opposite direction: inaudible to a listener, but precisely shaped to cancel the learnable structure a generator would otherwise extract. That is the whole mechanism. Everything else HarmonyCloak does is in service of placing that noise where a human ear will not catch it and a model cannot ignore it.

How do we know the mechanism actually bites?

The paper measures the bite two ways. On effect, the target-genre classification of the generator’s output fell from 89.2% to 32.3% once the training audio was cloaked, meaning a model fed protected tracks could no longer reliably reproduce the style it was trained to copy. That result held across three different music generators, MuseGAN, SymphonyNet, and MusicLM, in both white-box and black-box settings, so it is not an artifact of a single architecture or of the attacker knowing the model internals. On imperceptibility, a listening study with 31 participants confirmed the added noise stays inaudible, which is what separates a cloak from simple degradation.

Is this the same trick as image and voice cloaks?

It is the same family, with a different objective at its core. Glaze applies “style cloaks”, described by Shan, Cryan, Wenger, Zheng, Hanocka, Zhao (USENIX Security 2023) as “barely perceptible perturbations to images, and when used as training data, mislead generative models that try to mimic a specific artist.” Nightshade goes further on the offensive side, which Shan, Ding, Passananti, Wu, Zheng, Zhao (IEEE S&P 2024) frame as “a prompt-specific poisoning attack optimized for potency” that works with “less than 100 poisoned training samples.” On the voice side, Yu, Zhai, Zhang (ACM CCS 2023) build AntiFake as “a defense mechanism that relies on adversarial examples to prevent unauthorized speech synthesis.” Where those tools mostly perturb data to mislead a model, HarmonyCloak’s distinctive move is to make the learnable signal vanish, so the model is not fooled into a wrong style so much as starved of any style at all.

What the mechanism does not settle

Two questions follow immediately, and each has its own article. Whether the noise can be stripped back off is a removal problem, and the removal direction is advancing in neighboring media: Foerster, Behrouzi, Rieger, Jadliwala, Sadeghi (USENIX Security 2025) remove image poisons with LightShed, and Fan, Chen, Liu, Zhang, Yu (ICML 2025) neutralize voice cloaks with De-AntiFake. That story is does music AI-protection actually work. Whether the noise survives MP3, streaming, and the generators people actually use is a pipeline problem, told in does music poisoning survive MP3, Suno, and MusicGen.

The mechanism is quietly clever: it wins by making your music boring to a model rather than by fighting it head-on, which is why the imperceptibility result carries as much weight as the genre-collapse number. A cloak that listeners can hear is not a cloak. But an error-minimizing cloak is still a layer of noise sitting on a file, and a layer of noise is something a determined adversary can try to model and remove. Read HarmonyCloak as a way to lower what a scraper gains from your catalog, not as a lock on it. For the routine that applies this, see how to protect my music from AI training; for where music sits among the other defenses, do AI poisoning tools actually work.

Sources

Meerza, Sun, Liu (2025). HarmonyCloak: Making Music Unlearnable for Generative AI. IEEE S&P 2025.
Shan, Cryan, Wenger, Zheng, Hanocka, Zhao (2023). GLAZE: Protecting Artists from Style Mimicry by Text-to-Image Models. USENIX Security 2023.
Shan, Ding, Passananti, Wu, Zheng, Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. IEEE S&P 2024.
Yu, Zhai, Zhang (2023). AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis. ACM CCS 2023.
Foerster, Behrouzi, Rieger, Jadliwala, Sadeghi (2025). LightShed: Defeating Perturbation-based Image Copyright Protections. USENIX Security 2025.
Fan, Chen, Liu, Zhang, Yu (2025). De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks. ICML 2025.