Was my music used to train AI? How to actually tell

There is no reliable public way to prove a model trained on your songs, because no consumer tool searches audio training sets the way image indexes search pictures, and the statistical test built for this question cannot give you proof on a closed model. You can often raise a real suspicion that your tracks were scraped; you can rarely close the case, and for music that limit is built into the medium. Here is what you can and cannot establish for a song, and what to do with an answer that will stay uncertain.

Can you search the training set?

The most direct check would be to look for your work inside a dataset a model trained on, and for music that check does not exist. Spawning’s “Have I Been Trained” searches LAION-5B, a public set of billions of image-text pairs, so it can flag your cover art, promotional stills and music-video frames, but it indexes images, not audio. Your masters, stems and released tracks are not searchable there, and there is no consumer dataset-search tool for sound. A cover-art hit is still worth logging, because it is concrete evidence that material around your release entered a public training index, but it tells you a picture was scraped, not that a model learned from the track. For the music itself, the training set starts as a closed box.

Can a test tell if your track was in the data?

When the dataset is closed, the research route is a membership inference attack. Shokri, Stronati, Song and Shmatikov introduced it at IEEE S&P 2017 as a way to, “given a data record and black-box access to a model, determine if the record was in the model’s training dataset,” by reading the faint over-confidence a model shows on data it has already seen. Generative music models are exactly the kind of system that can leak this way: privGAN, from Mukherjee, Xu, Trivedi, Patowary and Lavista Ferres at PoPETs 2021, demonstrates membership leakage in generative adversarial networks, the family behind several music generators. But a lab demonstration that leakage is possible, run on models the authors controlled, is not a service where a musician uploads a clip and gets a verdict, and that gap is the whole of the next section.

Can you ever get real proof?

Not cleanly, for a closed model. A 2025 position paper by Zhang, Das, Kamath and Tramèr at IEEE SaTML 2025, titled “Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data,” argues that a sound proof would require showing the attack’s result is “unlikely under the null hypothesis that the model was not trained on the target data,” and that “sampling from this null hypothesis is impossible, as we do not know the exact contents of the training set, nor can we (efficiently) retrain a large foundation model.” The practical corollary is that these tests lack a provably low false-positive rate on production models, so ask any service that offers to prove your song was trained on for its false-positive rate first. The one method that can be sound is data extraction, getting the model to emit your work verbatim, as Carlini, Tramèr, Wallace, Jagielski, Herbert-Voss, Lee, Roberts, Brown, Song, Erlingsson, Oprea and Raffel did at USENIX Security 2021 when they pulled memorized text out of a language model. That is powerful when it happens, but it needs the model to have memorized and to reproduce your specific track, which is the exception, not the rule.

Why is music the harder case?

Two things stack against songs. There is no dataset-search tool, so the one easy image check is simply missing, and audio is almost always compressed before it is scraped, the same lossy messiness the protective side of this field has to engineer around. To see how much work that takes, look at protection rather than auditing: HarmonyCloak, from Meerza, Sun and Liu at IEEE S&P 2025, makes instrumental music unlearnable by injecting error-minimizing noise, and in its own tests drops a generator’s target-genre accuracy from 89.2% to 32.3%. That statistic is the cost of engineering an audio perturbation strong enough to persist through transcoding at all. A membership signal that nobody engineered to survive compression is far more fragile, which is why a clean “was my song used” test does not exist, and why the forward-looking protective tools are far ahead of any consumer audit tool for music.

What a result actually means

Put the routes together and the ceiling is clear. There is no song-search index, a membership-inference signal says only that a model behaves as if it saw your track, and neither proves a specific commercial system trained on your music, nor removes you from one that already did. Treat any test result as a lead, not a verdict, because a positive is not proof and a negative is not a clearance. Use the uncertainty to triage: build the best-supported suspicion you can from any image surfaces that are searchable, then spend your energy where it pays off, on opting out going forward so a future scrape has less to take, and on keeping your cleanest masters and stems off the open web so the easiest training target is never the one you publish. None of this settles the past, but it strengthens your position from the moment you act. The step-by-step version is did they train on my music: how to check and opt out; for the wider picture, do AI poisoning tools actually work.

Sources

Shokri, Stronati, Song, Shmatikov (2017). Membership Inference Attacks against Machine Learning Models. IEEE S&P 2017.
Mukherjee, Xu, Trivedi, Patowary, Lavista Ferres (2021). PrivGAN: Protecting GANs from Membership Inference Attacks at Low Cost. PoPETs 2021.
Zhang, Das, Kamath, Tramèr (2025). Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data. IEEE SaTML 2025.
Carlini, Tramèr, Wallace, Jagielski, Herbert-Voss, Lee, Roberts, Brown, Song, Erlingsson, Oprea, Raffel (2021). Extracting Training Data from Large Language Models. USENIX Security 2021.
Meerza, Sun, Liu (2025). HarmonyCloak: Making Music Unlearnable for Generative AI. IEEE S&P 2025.