close
close

Apre-salomemanzo

Breaking: Beyond Headlines!

A battle rages over the definition of open source AI
aecifo

A battle rages over the definition of open source AI

Open source software, in which a developer releases the source code of a product and allows anyone to reuse and remix it as they wish, is the basis of Google’s Android, Apple’s iOS and four largest web browsers. The encryption of a WhatsApp chat, the compression of a Spotify stream, and the format of a saved screenshot are all controlled by open source code.

Although the open source movement has its roots in the post-hippie utopianism of 1980s California, it nevertheless thrives today in part because its philosophy is not entirely altruistic. Making software freely available allowed developers to get help strengthening their code; prove its reliability; earn the applause of their peers; and, in some cases, make money by selling support to those who use the products for free.

Several modelers in the world of artificial intelligence (AI), including Meta, a social media giant, wish to be part of this open source tradition by developing their suites of powerful products. They hope to bring together amateurs and startups into a force capable of competing with billionaire laboratories, while restoring their reputation.

Unfortunately for them, guidelines released last week by the Open Source Initiative (OSI), a US non-profit organization, suggest that modern use of the term by tech giants has become meaningless. Filled with restrictions and developed in secret, these free products will never be able to fuel a true wave of innovation unless something changes, says OSI. It’s the latest salvo in a lively debate: what does open source really mean in the age of AI?

In traditional software, the term is well defined. A developer will make available the original lines of code used to write software. In doing so, they will give up most of their rights: any other developer can download the code and modify it as they see fit. Often, the original developer adds a so-called “copyleft” license, requiring that the modified version be shared in turn. Eventually, the original code can evolve into an entirely new product. The Android operating system, for example, is the descendant of Linux, originally written for use on personal computers.

Following this tradition, Meta, an American technology giant, proudly claims that its large language model (LLM), Llama 3, is “open source”, sharing the finished product for free with anyone who wants to build on it. However, the company also places restrictions on its use, including prohibiting the model from being used to create products with more than 700 million monthly active users. Other laboratories, from France’s Mistral to China’s Alibaba, have also released LLMs for free use, but with similar results. constraints.

What Meta shares freely – the weight of the connections between the artificial neurons in his LLM, rather than all the source code and data that went into its creation – is certainly not enough for someone to build his own version of Llama 3 from scratch, as open source purists would normally ask for. This is because training an AI is very different from normal software development. Engineers gather the data and build a rough draft of the model, but the system assembles itself, processes the training data, and updates its own structure until it reaches acceptable performance.

Since each training step changes the model in fundamentally unpredictable ways that only converge to the correct solution over time, a model trained using the same data, code, and hardware as Llama 3 would be very similar to the original, but not to the original model. even. This negates some of the supposed benefits of the open source approach: inspect the code as much as you want, but you can never be sure that what you’re using is the same thing as what the company offers.

Other obstacles also stand in the way of truly open source AI. Training a “frontier” AI model that is on par with the latest versions of OpenAI or its peers, for example, costs at least $1 billion, putting off those who have spent such sums from let others benefit from it. In the wrong hands, the most powerful models could teach users to make biological weapons or create unlimited images of child abuse. Locking their models behind a carefully restricted access point allows AI labs to control what can be asked of them and dictate what is asked of them. ways in which they are allowed to react.

Open and close

The complexity of the issue has led to disputes over what exactly “open source AI” should mean. “There are a lot of different people who have different ideas about what (open source) is,” said Rob Sherman, vice president of IA. president for politics at Meta. The issue in this debate is not limited to principles, because those who tinker with open source today could become the industry giants of tomorrow.

In a recent report, OSI did its best to define the term. He argued that to achieve this label, AI systems must offer “four freedoms”: they should be free to use, study, modify and share. Instead of requiring full publication of training data, he called only on labs to describe them in sufficient detail. detail to allow the construction of a “substantially equivalent” system. In any case, sharing all the training data of a model would not always be desirable: it would indeed prevent, for example, the creation of open source medical AI tools, since health records are the property of their patients and cannot be shared without restriction.

For those building on Llama 3, the question of whether or not it can be called open source matters less than the fact that no other major lab has been as generous as Meta. Vincent Weisser, founder of Prime Intellect, an AI lab based in San Francisco, would prefer the model to be made “fully open across all dimensions,” but he still believes Meta’s approach will have positive long-term impacts , leading to cheaper access for the end. users and increased competition. Since Llama was first released, enthusiasts have shrunk it down to a small enough size to run on a phone; they built specialized hardware chips capable of running it at blazing speed and repurposed it for military purposes as part of a Chinese army project; proving that the disadvantages are more than theoretical.

Not everyone is likely to be so willing to embrace it. From a legal perspective, using true open source software should be “frictionless,” says Ben Maling, a patent expert at EIP, a law firm in London. Once lawyers are needed to analyze the details and consequences of each individual restriction, the engineering freedom that technological innovation relies on disappears. Companies like Getty Images and Adobe have already backed away from using certain AI products for fear of accidentally violating the terms of their licenses.

Precisely how open source AI is defined will have broad implications. Just as wineries live or die by whether they can call their products champagne or simple sparkling wine, an open source label can prove essential to the future of a technology company. According to Mark Surman, president of Mozilla, an open source foundation, if a country doesn’t have a national AI superpower, it may want to support the open source industry to counterbalance U.S. dominance. The European Union’s AI law currently has loopholes to relax testing requirements for open source models, for example. Other regulators around the world will likely follow suit. As governments seek to establish strict controls on how AI can be built and operated, they will be forced to decide: do they want to ban tinkerers from operating in space, or free them from costly burdens?

For now, closed laboratories are optimistic. Even Llama 3, the most successful of the near-open source competitors, has caught up with models released by OpenAI, Anthropic, and Google. An official at a major lab told The Economist that the economics involved make this state of affairs inevitable. Although releasing a powerful, freely available model allows Meta to undercut its competitors’ businesses without disrupting its own, the lack of direct revenue also limits its desire to spend the sums necessary to be a leader rather than a follower fast. Freedom is rarely truly free.

© 2024, The Economist Newspaper Ltd. All rights reserved. Taken from The Economist, published under license. Original content can be viewed at www.economist.com