The future of Artificial Intelligence is Open-source! Here’s why?

8 minute read

Developments in open-source artificial intelligence (AI) have accelerated massively in the last three months. So it stands to reason that the future of AI is open source.

The future of Artificial Intelligence is Open-source!
Generated with Bing DALL-E

There are several reasons why this makes sense. One reason is that AI is increasingly penetrating our everyday lives.

“AI concerns us all, so access to the code should be available to everyone!” - Tinz Twins

That’s the open-source idea! Access to the code of AI models is essential for the security of humanity. External experts and researchers can identify and warn about vulnerabilities. So we can minimize the risks of AI. We must not leave AI development exclusively to big technology companies, who keep their know-how to themselves.

Governments need to push the open-source idea so that big technology companies don’t have power over AI technology. The EU’s AI Act aims to push open-source solutions. But at least one big company could paradoxically benefit from the open-source idea. We explain it in this article. Be curious!

First, this article reviews current developments and outlines the advantages and challenges of open-source AI development. The focus of the article is on Large Language Models (LLMs). Furthermore, it is essential to highlight the strategies of big technology companies, as they are primarily involved in the development of new AI models.

Let’s first look at current developments!

AI and Open Source — Current developments

In this section, we will go through the rapid development of AI in the open-source sector step by step (Based on Google “We Have No Moat, And Neither Does OpenAI”).

Open-source LLMs timeline (Image by authors)

It all started in February 2023:

February 24, 2023

Meta launches LLaMA and releases the source code for all but not the weights. The LLaMA’s model weights are only available for academics and researchers on a case-by-case basis. It is a relatively small model (available with 7B, 13B, 33B and 65B parameters) that has been trained over a long time and is quite powerful relative to its size.

March 03, 2023

A week later, the LLaMA’s weights were leaked on 4chan. This leak has a benefit for the open-source community. Researchers around the world were suddenly able to experiment with the model. That is a great driver of innovation.

March 13, 2023

Stanford releases Alpaca, which adds instruction tuning to LLaMA. Next, Eric J. Wang’s alpaca-lora repo introduces the training method low rank fine-tuning. The training could be done “within hours on a single RTX 4090”. Now anyone could fine-tune the model for low costs. Also, this implementation makes model training independent of Meta’s original license (No commercial use). From now on, anyone can distribute and use them.

March 19, 2023

A cross-university collaboration published Vicuna. They used GPT-4-based evaluations to make qualitative comparisons of model results. For this, they used data from the ShareGPT website.

March 25, 2023

Nomic creates GPT4All, which is a model and an ecosystem. It is a collection of different LLMs. You can also use GPT4All in your Python projects. In addition, you can use GPT4All like ChatGPT on your computer. There are installers for Windows, macOS and Ubuntu. In comparison to ChatGPT GPT4All is a free-to-use, locally running and privacy-aware chatbot. And the best is you need no GPU or internet.

March 2023 was full of rapid developments in open-source. Further publications on multimodal training and models such as Koala followed. Based on recent developments, anyone can fine-tune and use LLMs for a low cost. All you need is a laptop. At this point, it should be clear that the open-source community is on its way to challenging technology companies like OpenAI, Microsoft and Google.


Explore our premium blog articles


But what do the big technology companies think about the current open-source developments?

The positions of the big tech companies

Big technology companies are pursuing different strategies concerning the disclosure of AI models. Here are the current positions:

OpenAI and Microsoft

Microsoft and OpenAI work very closely together in the development of AI models. For this reason, many Microsoft products already contain modified versions of the OpenAI models. One example is Microsoft’s search engine, Bing. Bing provides a chat function based on the current ChatGPT version. In addition, Bing Chat contains an improved version of the image generator DALL-E 2. In our tests, the image generator achieved significantly better results than the implementation on the OpenAI website.

OpenAI and Microsoft are close to making implementations of large AI models freely available. Instead, they want to integrate the models into products and earn money with them. It is also understandable that they do not want to make the company’s know-how freely available. However, the question remains what risks can arise if we cannot evaluate the models independently?

In addition, both companies are under pressure from the rapid development in the open-source sector. By the end of the year, the open-source models could have reached the performance of the commercial LLMs. Who pays money for a model which you can use for free?

Google

Google takes a similar position to Microsoft and OpenAI. Currently, they are keeping the implementation for PaLM (Basis of Google Bard) closed. You might think that Google feels threatened above all by OpenAI. But that is not the case. A few months ago, an internal document was leaked. In this document, a Google researcher says:

“We (Google) Have No Moat, And Neither Does OpenAI.” (by a Google researcher)

The researcher talks about open-source AI outpacing Google and OpenAI in the long run. One reason is the rapid development in the open-source sector. The researcher also believes that the constant exchange of employees among big companies will lead to a loss of know-how. For this reason, Google should participate more in the development of open-source AI.

Apple

Apple has kept a low profile in recent months regarding the efforts of Microsoft and Google. At this year’s WWDC, Apple never used the word artificial intelligence. Instead, Apple always talks about machine learning. It seems like Apple doesn’t want to offer products like ChatGPT. Instead, they integrate machine learning into their products in a meaningful way. The user should get added value through AI without noticing that an AI is involved. Apple is unimpressed by the current AI hype. Apple does not give access to its models, as models are part of the individual products.

Meta

Meta is the company that has massively influenced the AI run in the open-source community. The release of LLaMA gave the open-source community access to a powerful LLM. The leak of the LLM’s weights eventually triggered the rapid development. 
Meta is generally much more open in terms of open-source AI. Meta chief AI scientist Yann LeCun says in an interview with the New York Times:

“The platform that will win will be the open one.” (by Yann LeCun)

He also sees the increasing secrecy at Google and OpenAI as a “big mistake” and a “really bad view of what’s happening”.

We should remember that Meta’s PyTorch is open-source. PyTorch is used to implement AI models, which makes the implementation of OpenAI models possible.

The swarm intelligence of the open-source community has accelerated the development of LLMs. Meta can use the open-source know-how for their products, as the basis is the Meta implementation. Recently, Meta announced that they will integrate generative AI tools into WhatsApp and Instagram.

The future of open-source AI

In the field of image generation, OpenAI’s plans have already been thwarted. When OpenAI introduced DALL-E 2, they thought it would be a huge success. But no such luck! An open-source alternative came onto the market: Stable Diffusion. There are countless applications on the internet based on Stable Diffusion. We gave an overview of headshot tools based on Stable Diffusion in a previous article. The applications produce impressive results. Stable Diffusion enables small businesses to create their applications.

“Open Source AI empowers small businesses.” - Tinz Twins

This example shows the opportunities that can arise for small businesses. In addition, we can minimize risks through open-source AI models. The models can be studied and adapted by researchers around the world. In addition, private AI models that meet privacy requirements become possible. Current open-source LLMs cannot yet reach the performance of commercial models. However, it is conceivable that open-source models will have similar performance by the end of the year. As an entrepreneur, you should have the following in mind:

  • Open-source AI is becoming more powerful.
  • You can integrate open-source AI into your products for free.
  • No data privacy problems.
  • No GPU is necessary for fine-tuning LLMs.
  • No dependence on big companies.

The rapid development of open-source AI has shown that the swarm intelligence of the open-source community is the biggest threat to big AI companies. Big companies should disclose the essential things around AI because AI concerns us all.


💡 Do you enjoy our content and want to read super-detailed articles about data science topics? If so, be sure to check out our premium offer!


Thanks so much for reading. Have a great day!

Leave a comment