Exploring AI Translation: Can ChatGPT Translate Languages Well?

Exploring AI Translation: Can ChatGPT Translate Languages Well?

Creating website content, writing software code, or fixing grammar errors – ChatGPT is the Swiss Army knife equivalent of AI-powered language technologies. But just how good is the current generation of generative AI at translating languages?  

We spoke to three AI experts from Milengo to explore the burning question many have asked: how well can ChatGPT (and other large language models (LLMs)) translate languages? 

The experts who shared their insights were:

  • Stephan Wolschon, Head of Engineering and Software Development
  • Sarita Vasquez, Machine Translation Specialist, and
  • Matt Evans, Linguistic Product Owner and Prompt Engineering Expert

How good are GPT4 and other large language models (LLMs) at translation? And what are their limitations? 

Sarita: From my testing and experience on whether ChatGPT can translate languages, what I’ve found is that ChatGPT and other LLMs can produce translations that are highly fluent and grammatically correct. Current research confirms this as well. However, the results vary by language as outputs are always dependent on how much training data was available for the LLM. In particular, we’ve seen that these systems excel in English-centric language pairs, such as English-Spanish, English-French, and English-German.

One common limitation we’ve seen during our tests is that if the training data is limited or biased, it can lead to inaccuracies and errors in translations. This is especially true for languages with less available data. 

Stephan: LLMs are indeed a great asset to speed up translation processes. But LLMs are also known to occasionally surprise users with “hallucinations” or unwanted “creative” decisions, which are not desired in a translation.  

This is especially true for short segments of text, where the machine might lack context and add random explanations or additional information to complete the picture.  

As with other LLM use cases, problems can also arise when prompts are not specific enough or lack important instructions. For example, a simple prompt like “Please translate this text” is not always sufficient. For a good outcome, users need to specify their desired output, such as “Please translate this text into Mandarin with a formal tone for the general public in Singapore.” 

With such limitations present, it is advisable to professionally revise and post-edit texts to achieve the best ChatGPT translation quality. 

ChatGPT vs Google Translate: How do LLMs compare to “classic” machine translation (MT) services? 

Sarita: At present, neural machine translation engines, such as Google Translate, still tend to produce a higher translation quality and more accurate translations than LLMs. This is because traditional machine translation models are trained on bilingual translation corpora – this means they are specifically engineered for translation tasks. The training of ChatGPT, on the other hand, is based on scraped monolingual web data, which means the reference data has a lower linguistic quality standard as it has not been curated. 

One area where LLMs outshine classic MT is in capturing context and generating more natural-sounding translations for longer passages of text. For this reason, they are perfectly suited for content that requires less accuracy, but an appealing style – let’s say a marketing text instead of a technical manual. 

It is also worth mentioning that LLMs such as ChatGPT translate rare languages better than MT services like DeepL, but perform worse in languages with complex word structures. The reason here is that MT engines are optimized for specific languages and therefore handle languages with complex grammar or morphology more effectively. 

Speed is another domain where we see a major difference between the two technologies. LLMs are much slower at producing translations than a machine translation engine. It might take an hour for an LLM to process a large volume of text, which an MT engine can complete in a fragment of this time. This could be an issue for enterprises that want to scale their localization efforts in the future. 

How can companies implement ChatGPT for translation on a large scale? 

Stephan: Using ChatGPT to translate languages as an individual is straightforward: Copy and paste your text into the ChatGPT console, ask for a translation and get the result.

However, the application of LLMs for translation on an enterprise scale requires special tools and workflows to ensure details such as company terminology and brand voice is correctly captured. This requires a dedicated setup consisting of localization managers, a translation management system (TMS) with the right features and plug-ins, as well as integrations, and automations. 

Luckily, there are options available that use existing localization tools and providers. Custom.MT offers LLM integration via a memoQ plugin. Plus, there are other providers with no code/low code translation options, such as Localize, CrowdIn, and memoQ AGT

Still, custom development might be necessary – which involves a high level of in-house localization know-how, specialized resources, and considerable time to build such a complex setup. That’s where Milengo comes in, offering enterprises that want to scale quickly a turnkey solution that combines the power of LLMs with first-class localization expertise. 

Tell us about the latest project you’re working on with LLMs. 

Stephan: We are constantly evaluating new use cases with LLMs to make localization even more easy and affordable for our clients. As Head of Engineering at Milengo, I mainly focus on how LLMs can help us improve clients’ localization workflows and processes. For example, we develop prompt templates or tool integrations for LLMs that respond to current client requests or needs. 

Matt: With my background and experience as a translator, I have been exploring how we can use ChatGPT to boost translation quality. Some examples of this include improving source texts, ensuring gender-neutral phrasing, or quality assurance in general

Our main goal here is not to eliminate human linguists from the process but to reduce the amount of effort on the linguists’ side. It’s ultimately about delivering an even higher translation quality to our clients with a more flexible and customized range of translation products. 

Sarita: … and of course, we are also looking at how to use LLMs for translation itself! Due to the high-quality expectations of our clients, we need to thoroughly analyze which language pairs and text types ChatGPT’s translation quality holds up to, and which ones still need more time.  

We not only investigate well-performing language pairs such as English into German. But also language pairs where machine translation underperforms like Hebrew, Vietnamese, Thai, Finnish, and Estonian.  

As Matt mentioned earlier, it’s important to stress here that these solutions we are exploring serve only to complement the work of our human linguists, not completely replace them. 

Matt, can you give us some insight into how prompt engineering for translation works? 

Matt: Oof, how long do you have? There are so many different strategies when it comes to prompt engineering. But it really all depends on what set of tools you’re using, and what languages you’re translating into (and out of)… there are plenty of variables to consider! 

A system description like “act as a professional marketing translator” can help ChatGPT better understand the context of a translation. For some tasks, you might also need to break down your main prompt into more granular instructions. For instance, whether the reader should be addressed informally or formally.  

Technical aspects of your translated file format might also become relevant to your prompt. This includes things such as ensuring correct formatting, putting the file back into the right format, or making sure tags are handled correctly. Sometimes, you need to tell ChatGPT what NOT to do (negative prompts), such as adding comments in a translation. 

It’s also important to understand how ChatGPT “perceives” certain words and concepts. We learned a lot by simply asking it why it did something in a certain way – you’d be surprised at how it “views” the world! 

What kind of advancements can we still expect from ChatGPT and other LLMs in the future? 

Stephan: So far the development of LLMs has been towards improving the quality of results by building even larger models each time. However, we might see LLMs hit a saturation point soon as they are beginning to exhaust all their available training data. This may result in a slower evolution.

What we have started to see is a new trend for small language models (SLMs), and specialized models for particular domains or tasks. These smaller models need fewer resources and can even be used on a phone. They can be developed by individual companies on their own, for example, based on the Llama family by Meta. SLMs can be customized and trained more easily, although they might still require a physical infrastructure. 

Sarita: Multi-modal LLMs have also been a big focus recently. In fact, OpenAI just announced some really innovative multi-modal features in GPT-4o. These models don’t just handle text inputs and outputs, but images and audio too! Such abilities might prove useful for speech-to-speech translations, or for translating and laying out documents in one step. Currently, most commercial use cases we’ve seen are still in an early stage, apart from uses like voice cloning for video localization.  

How do you think LLMs such as GPT4, Gemini, and Meta AI will impact the future of the localization industry? 

Stephan: The localization and translation industry is still in the process of testing LLM capabilities and implementing them in production. We have therefore not seen all the possible applications of this technology yet. 

I believe that LLMs will impact the industry in a major way, just as neural machine translation (NMT) did before. It will change the overall possibilities, client expectations, industry pricing and vendor base. As such, I believe LLMs will become a basic technology for many localization products and tools. 

In the medium term, I foresee both technologies will likely co-exist since NMT is still cheaper, faster, and more accurate. However, it is not built for text adaptation. LLMs, on the other hand, allow for adjustments of texts to improve the understanding of the content and context. A marriage of both technologies could combine these benefits. 

What is the one thing that fascinates you the most about LLM technology?

Stephan: What fascinates me is how quickly and easily we interact with LLMs like a human and expect a similar reply. How you trust it to understand your needs, understand the context, follow a conversation and react accordingly.  

In fact, the most recent version of ChatGPT passed the Turing test, making it indistinguishable from a human response. We sometimes forget that we are dealing with a piece of software and treat ChatGPT like a human chat partner. We write in a polite manner and ChatGPT “apologizes” if it cannot or does not deliver the desired result. 

Matt: It’s incredible that the technology has reached that point. What fascinates me the most is seeing what’s next! We’re always finding new ways to use LLMs, and new updates continuously expand those options. 

Sarita: Great points, Matt and Stephan! That sums it up perfectly for me as well. 

The verdict: Can ChatGPT translate languages well? 

What we’ve learned from our experts is that ChatGPT can translate languages. It already produces high-quality translations for English-centric content, and often does better with rare languages than machine translations. The further value-add of LLMs is in their ability to simultaneously make workflows more efficient and improve translation quality.

While LLMs might produce more fluent translations for longer passages of text than classical machine translation, they are much slower to generate translations. This can create bottlenecks for companies looking to scale-up the volume of content for translation using LLMs. 

Even with all its merits, LLM translations still warrant caution as they sometimes “hallucinate” and outputs are dependent on prompt quality. This means that users wanting the best ChatGPT translation quality should still complete the translation process with post-editing. All-in-all, the industry will likely use LLMs in parallel with MT for the medium term. 

Want your translation workflows to benefit from AI and LLMs? Whether you want to integrate LLMs into your existing localization workflow or develop a new LLM-driven translation process for your company, Milengo’s localization experts are here to help!  

Johannes Rahm

read all posts

A seasoned translator, copywriter and multilingual SEO expert with over a decade of experience. Johannes specializes in high-value B2B marketing content for the DACH market, serving leading companies in the software, IT, and elearning industries. As an avid reader of science-fiction literature, he still regards human language to be our most mind-blowing technology and loves to explore its power to engage, inspire and connect people and organizations.