Breaking the Sound Barrier: How LALAL.AI Is Rewiring Audio for Creators
What happens when machine learning meets music production? A new kind of creative freedom.
The future of audio didn’t arrive with a bang—it faded in, like a perfect mix.
Somewhere between a studio console and a neural network, LALAL.AI found its frequency.
In 2020, a small team of engineers obsessed with machine learning and digital signal processing set out to solve a creative headache: how to split a song cleanly into parts without losing the magic in the sound. Their answer was Rocknet, a neural network trained on a staggering 20 terabytes of data that could isolate vocals and instrumentals with uncanny accuracy.
What started as a technical experiment quickly evolved into a platform that would change how musicians, producers, and creators around the world work with audio.
The technology grew from simple two-stem separation into a ten-stem powerhouse capable of teasing out guitars, pianos, synthesizers, and even the elusive texture of wind and string instruments. Then came tools that went beyond separation: AI-driven noise cleaning, real-time vocal enhancement, and voice cloning that lets songs take on entirely new identities.
Today, what began as a specialty tool for producers has become a creative ecosystem. LALAL.AI’s software now lives on desktops, mobile devices, and inside other businesses’ workflows, powering projects from podcasts to film scores.
For creators, it’s not just about convenience—it’s about possibility. Because in a world where sound defines experience, LALAL.AI is proving that artificial intelligence can do more than process audio. It can understand it.
We recently had the great pleasure to connect with Nik (product owner, co-founder) and Andrew (CTO, co-founder) of LALAL.AI to dive even deeper into the ideas and innovation driving their goals to make working with audio and video easier for musicians and sound producers, music engineers, streamers, and many other professionals and creatives.
1) Take us back to the beginning: what problem in music or audio first inspired the idea for LALAL.AI, and how did that early vision evolve into the platform you run today?
In a way, the birth of LALAL.AI was a very lucky accident.
The stars aligned so that, in one place, we had people who could build truly attractive user-facing applications, people who deeply understood machine learning and AI, and people who understood how stem separation could expand what musicians can do.
We also had a clear technical intuition: this problem is, in some sense, the inverse of classical audio production and mastering-rather than “assembling” a track into a polished whole, you want to reliably “disassemble” it into meaningful parts.
Back then, we strongly believed that high‑quality and efficient stem separation simply didn’t exist in the world. At the same time, we saw that very similar classes of problems had already been tackled in computer vision, so we decided to apply comparable ideas to audio separation. That’s how our first network was born, back in 2020. We didn’t fully understand our target audience at the time-and that was okay-because we knew stem separation, like image analysis, could unlock a very wide range of use cases.
If we brought it to market, it could serve everyone from casual karaoke fans to professional music creators, DJs, video editors, and many other groups.
2) What have been the biggest breakthroughs along the way, and how do you decide which innovations are worth taking public versus leaving in the lab?
There’s no magic here. Any machine learning technology rests on three pillars: the data (and its quality), the neural network architecture, and the way you train it. We improve all three continuously.
On the data side, in the last few years we’ve made huge progress: we’ve developed multiple internal technologies that let us improve data quality, clean it, and make it far more diverse. That diversity is one of the keys to teaching a network to do stem separation well across an extremely broad range of real‑world content.
At the same time, architecture and training methodology matter just as much. We run a very large number of experiments to identify the best architectures and the best training approaches-investing significant compute-while keeping a very deliberate balance: we don’t just want higher quality, we also want models that are more compact and faster.
In practice, what we take public is what meets that bar: measurable quality gains on real content, plus practical performance characteristics that make the technology truly usable.
3) LALAL.AI now spans browser tools, mobile apps, and integrations for creators and studios; how do you think about designing products that work just as well for a bedroom producer as for a post‑production house?
A major issue with many ML‑based solutions at the time was that, even when the technology existed, it was accessible to a very narrow circle of people-mostly those with the expertise to deploy and run neural networks end‑to‑end, and with the hardware to do it (accelerators, infrastructure, and the know‑how).
That created a very high entry barrier. Many people in the actual application domain simply couldn’t afford to use these tools in a meaningful way.
So we decided that it wasn’t enough to create the technology-we had to make it accessible to the masses. That’s why we built a web platform and a web application that is easy to use and well designed from a UX perspective.
Today, the same philosophy is reflected in three practical factors:
Availability through products: we now have a range of applications-web, mobile, and apps across operating systems for desktops and laptops.
Speed: in most cases, if you have to wait for separation results, it’s not long. We run very powerful cloud servers that process reasonable content volumes in limited time.
Quality as the top priority: we position ourselves as the highest‑quality separation technology on the market, and we maintain that bar by releasing new models roughly once a year-each one beating the previous generation. This is why the tool is valuable both for “local” use cases like karaoke and for professional workflows, including engineers and DJs.
4) The newly‑released Andromeda is positioned as your most advanced model yet; what specifically changed under the hood that lets it deliver cleaner vocal isolation and more consistent quality across such a wide range of tracks?
Andromeda-our newest Frontier model-comes directly from that same “three pillars” approach: better data, better architectures, and better training.
Another point that matters to us is how we define quality. We intentionally optimize on the hardest, most “brutal” tracks-rare edge cases-because that’s the real test of a separation system. We don’t define quality as “it sounds great on easy tracks.” We define quality as: it stays strong on the toughest material, including unusual vocal styles like growling or screaming.
For the end user, that translates into confidence: whatever track you try to split with LALAL.AI, the result should be consistently good.
5) For working creators, speed and workflow matter as much as sound; how does Andromeda change the day‑to‑day experience of DJs, remixers, editors, and engineers using your tools?
First, we once again raised the quality bar to a level that wasn’t achievable before.
That means tasks that didn’t work well-or didn’t work at all-on the previous generation can now be done with Andromeda.
And performance is part of that story too. Even though Andromeda beats our previous Perseus family in quality, it’s also about 40% faster for end users, meaning they wait even less for results.
But the day‑to‑day experience isn’t only about the model. It’s also about how the technology fits into the workflow. Pushing compactness and efficiency further, we’ve also created models that can run not only on powerful cloud servers, but on users’ local machines as well.
A major step here was our VST plugin for separating vocals from instrumentals, which we released shortly before Andromeda.
It can substantially change how DJs and engineers work, because it enables stem separation directly inside the Digital Audio Workstation-the tool they already live in-without exporting audio, uploading it to the main LALAL.AI app, and then re‑importing results back into the DAW.
When sound designers can do separation inside the DAW, it becomes much easier to adopt the technology as part of their existing pipelines.
6) How do you see AI‑powered stem separation and voice cleanup reshaping music‑making, licensing, and remix culture over the next five years?
We don’t believe stem separation technologies fundamentally change the legal world or licensing models. What they absolutely do change is what creators can practically do. They dramatically expand the capabilities of musicians, sound designers, remixers, and anyone working creatively with audio content.
In a strict sense, many of these outcomes were possible before-but they required an enormous amount of manual effort.
For example, creating samples for remixes could take days, weeks, sometimes even months of painstaking work: cleaning audio, isolating exactly the fragment you want, iterating endlessly. With stem separation, you can often do the same thing in seconds.
Like any strong tool, it’s a major productivity optimizer in sound design-and it increases creativity because it removes routine work and frees time for creative decisions, for creating rather than grinding.
7) From labels to indie catalogs, what are the most realistic, high‑impact ways you think music companies can harness AI-tools like Andromeda included-without losing sight of artist rights, fan trust, and long‑term catalog value?
In the same way that creating a power screwdriver doesn’t violate the rights of screwdriver makers or screw makers, we don’t see how stem separation technology violates artists’ rights. If anything, it gives creators new capabilities.
It helps them create more-and, importantly, create better-because it reduces the routine, painstaking labor and lets people focus on the creative component: making artistic choices and making their work more compelling, higher quality, and more interesting for listeners.
8) What’s next on your roadmap that you’re most excited for creators and partners to get their hands on? How is 2026 shaping up?
The first and most obvious improvement we’re planning for 2026 is expanding the Andromeda architecture to stems beyond vocals. That work is already underway, and we hope to release Andromeda for non‑vocal stems in the first half of 2026.
In parallel, while building Andromeda for non‑vocal stems, we’re already working on the next architecture after Andromeda. It’s showing very promising results and should allow us to raise the quality bar even higher.
We’re also improving our VST plugin. We plan to release a multi‑stem version capable of separating not only vocals but also a range of instruments. And we plan to give users control over latency introduced by the plugin-if they have sufficient compute on the machine running the plugin and their DAW.
I can’t share precise plans or name specific technologies for obvious reasons, but we also plan to release additional technologies that complement stem separation. For example, last year we released voice transformation and custom Voice Packs-Voice Changer and Voice Cloner.
This year, we plan to release more complementary technologies of a similar nature that will expand our portfolio, strengthen our tech stack, and give users even more room to be creative when making music.




