Europeans are racing to create their own artificial intelligence chatbots to stop U.S.-made tech from gobbling up their economies, culture and even languages themselves.
From Madrid to Sofia, European Union countries have launched and supported a flurry of initiatives aimed at creating chatbots that are truly fluent in local languages.
The latest AI technology powering tools like the popular ChatGPT chatbot hinges on “large-language models” or LLMs — systems capable of eerily human-like conversation. Language is at the core of these innovations, and the EU — a Tower of Babel with 24 official languages, from Lithuanian to Maltese — wants the booming tech to click with its own cultural content and quirks.
“Mark Twain should not erase Stendhal,” France’s Economy Minister Bruno Le Maire said at a tech event in Cannes in February. “We don’t want to settle just for English … Going ahead, we don’t want our language to be weakened by algorithms and AI systems.”
The United States lead the current wave of innovations. The country boasts among its ranks ChatGPT maker OpenAI — and its large backer, Microsoft — and Google with its Gemini model. Anthropic, Meta and Elon Musk’s xAI are also in the race to build leading models.
The speed of the U.S. industry has made European governments anxious. They fear a repeat of the dominance that American firms had in the age of social media and Web 2.0.
From academic ventures to government-sponsored masterplans to startups and hardscrabble teams of independent coders, the Continent is putting up a fight against the Californian behemoths. In the last year alone, 13 European countries have announced or taken steps to develop local models focused on their local languages, POLITICO research has found.
Most of the existing or developing projects are open-source, in a bid to make up for the computing and funding gaps with the U.S. by relying on a vast community of volunteer developers.
With the bustle comes the hope to create a vibrant local AI economy.
“Having models in the local language is also about encouraging more people in your country to code and develop more AI products,” said Carlos Romero Duplá, a former Spanish diplomat who negotiated the EU’s AI law and is now a Brussels-based consultant with Vinces. “It fosters a whole tech ecosystem.”
For some countries, like Spain, own-language models could help boost their clout in culturally and historically connected parts of the world. Madrid, which is funding the creation of an LLM able to speak Spanish based on a corpus of high-quality Spanish content for AI training, sees the emerging technology as an area for closer cooperation with Ibero-American countries.
The scramble for own-language LLMs comes as the cultural industry is in a fierce — and, some say, existential — fight with tech companies over cultural content including film scripts, media archives and even the copyright over musical artists’ voice imprints.
In past months OpenAI has been busy cutting deals with international media brands like Axel Springer, the owner of German-language outlets Bild and Welt (which also owns POLITICO) and French daily Le Monde, building a trove of high-quality training content in foreign languages.
The maneuver set off alarm bells in France. In his Cannes speech, Le Maire pitched the creation of a price-controlled European single market for training data to prevent deep-pocketed U.S. tech giants from outbidding European AI companies for access to every last scrap of valuable content.
France has also spearheaded the creation of Alt-EDIC, a 12-country EU consortium devoted to intra-bloc collaboration on developing LLMs in European languages.
Lost in translation
Ironically, to be truly competitive, European LLMs will still need to be fluent in English — which remains the language of most of the world’s scientific papers, and just over half of the pages on the world wide web, according to online surveys outfit W3Techs.
“There’s a power imbalance in terms of the amount and quality of training data: just look at how large English Wikipedia is compared to its versions in other languages,” said Sebastian Ruder, a research scientist at Canada-based multilingual AI company Cohere.
Some U.S.-made LLMs are conversant in languages other than English, but they do not always have the proficiency and nuance needed to serve local users well.
“You need, for instance, to get the right level of politeness,“ Ruder said. Think of teaching a chatbot to use the polite pronoun “vous” instead of the informal “tu” to avoid miffing an elderly French user.
For chatbots designed to interact in whole conversations with everyone from a country’s citizens to a company’s clients, that can create problems. An August 2023 “cultural alignment” assessment by researchers at University College London found OpenAI’s and Google’s LLMs to be out of whack with cultural norms in countries including China, Saudi Arabia and Slovakia — while acing tests for adherence to U.S. mores.
As AI becomes entrenched in every aspect of our societies, the impact of such cultural clashes could be significant. Kris Shrishak, a technology fellow at the Irish Council for Civil Liberties, said, “A U.S. tech company can train its model in, say, Lithuanian, but that’s loss-making. So it’d usually train it in English and then do some finetuning.”
The solution, according to Ruder, is for European AI developers to train their bots in both their language and English, thus allowing the LLM to tap into English-encoded knowledge when speaking its native tongue.
You must be logged in to post a comment.