The debate over open source AI continues to rage after years of discussion. Despite all the arguments, we still don’t have clear answers about what open source AI actually means, why it matters, or how to make it work.
Let’s examine the different definitions, the challenges we face, and what history teaches us about navigating this complex landscape.
This was originally posted as a daily walk, share, and discuss video. The written version is below.
A Brief History of Open Source
To understand open source AI, we need to look back at how open source software began. Free software was built on four fundamental freedoms: the ability to use software, examine and learn from it, modify it, and redistribute it.
We used copyright law to make these freedoms enforceable. But here’s the problem: copyright law was designed for books, letters, and written works. Later it expanded to cover art and movies. Applying it to software was already a stretch.
Now we’re trying to apply copyright law—created for simple written works—to AI systems that include datasets, training software, model weights, and complex algorithms. It’s like using a hammer to work with microchips.
Why Open Source AI Matters
This complexity doesn’t make open source AI less important. We’re using AI everywhere—from camera stabilization to major business decisions. We need to understand and trust the systems we depend on.
The EU AI Act recognizes this importance by creating specific provisions for open source AI. But these regulations also create new challenges about how we define and implement open source AI principles.
Three Approaches to Defining Open Source AI
Definitions of open source AI vary widely, but they generally fall into three categories:
1. The Model-Only Approach
This view says you can tweak, retrain, and redistribute the AI model itself—but not necessarily access the original training data or software.
Critics argue this isn’t enough. Without access to the training process, you can’t truly examine how the model was created, whether you can trust it, or how it handles your data. Some also claim this approach just enables free community product management for large AI companies.
2. The Everything-Included Approach
This definition requires access to everything needed to recreate an AI model: all training data, the training software, original weights, and all components.
Hardcore advocates of free software push for this approach, and I appreciate their position. But the reality is complicated. AI models are becoming almost like people—could you recreate a person even if you had their DNA, knew their upbringing, and could somehow model every random encounter they’ve had? The complexity is staggering.
Still, this level of transparency is something we should strive for.
3. The Apply-Licenses-Everywhere Approach
This camp argues we don’t need new models or licenses—just apply existing open source licenses to every AI component: data, training software, models, weights, everything.
I think this approach is naive. Open source licenses were built on copyright law, which doesn’t translate easily to all components of AI systems.
Lessons from History
Looking at open source software’s original four freedoms—use, learn, modify, distribute—none explicitly mention collaboration. Yet collaboration became central to open source through maintainers, committers, pull requests, and community culture.
Similarly, AI is evolving into something we don’t fully understand yet. Open source AI will likely create new forms of collaboration we can’t predict. What if AI models themselves become contributors to other AI models?
The Software-as-a-Service Parallel
History offers another lesson through Software-as-a-Service (SaaS). When SaaS emerged, the open source community worried it would create a loophole. Companies could take open source software, modify it on their servers, and serve it to users without sharing their modifications back to the community.
The AGPL license was created to address this, requiring companies using software as a service to provide source code. But the AGPL never really succeeded. Different variations emerged, the problem persisted, yet SaaS didn’t kill open source as feared.
Today, 90% of the world’s software still contains open source components. We didn’t solve the SaaS challenge perfectly, but open source adapted and thrived anyway.
The Challenge Ahead
Open source AI faces similar adaptation challenges. We might not get the definition right immediately, and that’s okay—but we need to stay flexible and understand our goals.
The EU AI Act creates additional complexity by carving out exceptions for open source AI while simultaneously restricting what high-risk AI systems can be used for. This creates a catch-22: open source software traditionally doesn’t restrict use cases (that’s a core freedom), but AI regulations do impose restrictions.
We’re trying to define what AI is, how it can be used, and what responsibilities come with it—all while still figuring out what open source AI means.
Moving Forward Together
The Open Source Initiative’s process of defining open source AI generated significant controversy and disagreement. These arguments are valuable for sharing ideas and considering different perspectives.
But more importantly, we need to clarify our goals for open source AI. What are we really trying to accomplish? Is it:
- Ensuring we can trust AI systems?
- Protecting privacy in AI training data?
- Knowing what data was used for training?
- Understanding if our data was included?
- Creating collaborative models like we have in open source software?
- Something else entirely?
What’s Your Definition?
The conversation about open source AI is far from over. We need continued dialogue about what open source AI should be and why it matters.
What’s your definition of open source AI? What aspects are most important to you? How do you think we should balance transparency, accessibility, collaboration, and practical implementation?
These questions don’t have easy answers, but they’re worth wrestling with as we shape the future of AI development.