Google recently announced that it will be releasing 110 new languages to Google Translate as part of its 1000 Languages Initiative starting in 2022. 24 languages addedWith the addition of 110 new languages, we now support 243 languages. This rapid expansion is due to Zero-shot machine translationa technology in which machine learning models learn to translate into another language without prior examples. But together in the future we can see whether this advancement will become the ultimate solution to the challenges of machine translation, and in the meantime, explore how to make it happen. But first, the story.
What was it like before?
Statistical Machine Translation (SMT)
This is the method Google Translate originally used. It relied on statistical models, analyzing large parallel corpora, or collections of aligned sentence translations, to determine the most likely translation. First, the system had to translate the text into English and then into the target language as an intermediate step, cross-referencing the phrase with vast datasets from the minutes of the United Nations and the European Parliament. This differs from previous approaches, which required compiling exhaustive grammar rules. And its statistical approach allows it to adapt and learn from data, without relying on static linguistic frameworks that may soon become completely unnecessary.
However, this approach also has some drawbacks. First, Google Translate employs phrase-based translation, where the system breaks sentences into phrases and translates them separately. Although this is an improvement over word-for-word translation, it still has limitations such as awkward phrasing and contextual errors. It doesn’t fully understand nuances like we do. Also, SMT relies heavily on parallel corpora, and relatively rare languages are difficult to translate because there isn’t enough parallel data.
Neural Machine Translation (NMT)
In 2016, Google switched to neural machine translation, which uses deep learning models to translate entire sentences at once, providing more fluent and accurate translations. NMT works in a similar way to having a sophisticated multilingual assistant inside your computer. Using a sequence-to-sequence (seq2seq) architecture, NMT processes sentences in one language to understand their meaning. It then generates the corresponding sentence in another language. This method uses huge datasets for training, in contrast to statistical machine translation, which relies on statistical models analyzing large parallel corpora to determine the most likely translation. Unlike SMT, which focuses on phrase-based translation and requires a lot of manual effort to develop and maintain language rules and dictionaries, NMT can process entire sequences of words, which allows it to capture the subtle context of a language more effectively. This results in better quality translations across a wide range of language pairs, often reaching levels of fluency and accuracy comparable to human translators.
In fact, traditional NMT models use recurrent neural networks (RNNs) as their core architecture because they are designed to process continuous data by maintaining a hidden state that evolves every time a new input (word or token) is processed. This hidden state acts as a kind of memory that captures the context of previous inputs, allowing the model to learn dependencies over time. However, RNNs are computationally expensive and difficult to effectively parallelize, limiting their scalability.
Introducing the Transformers
In 2017, Google Research published the following paper: “Attention is all you need” It introduced Transformers to the world and marked a significant shift in neural network architecture away from RNNs.
Transformers rely solely on an attention mechanism, namely self-attention, which allows neural machine translation models to selectively focus on the most important parts of an input sequence. Unlike RNNs, which process words in a sequence within a sentence, self-attention evaluates each token in the entire text and determines which other tokens are important to understand the context. By computing on all words simultaneously, Transformers can effectively capture both short-term and long-term dependencies without relying on recurrent connections or convolutional filters.
Thus, by eliminating relapse, Transformers offer several important advantages:
- Parallelizability: Attention mechanisms can be computed in parallel across different segments of a sequence, speeding up training on modern hardware such as GPUs.
- Training EfficiencyIt also significantly reduces training time compared to traditional RNN- or CNN-based models, resulting in better performance in tasks such as machine translation.
Zero-Shot Machine Translation and PaLM 2
In 2022, Google released support for 24 new languages using zero-shot machine translation, marking a major milestone in machine translation technology. It also announced its 1,000 Languages initiative, which aims to support the 1,000 most spoken languages in the world. Currently, 110+ languagesZero-shot machine translation enables zero-parallel translation between source and target languages, eliminating the need to create training data for each language pair – a process that was previously costly, time-consuming, and even impossible for some language pairs.
This progress is made possible by the Transformer architecture and self-attention mechanism.Transformer Model Capabilities The ability to learn contextual relationships between languages, combined with the scalability to process multiple languages simultaneously, has enabled the development of more efficient and effective multilingual translation systems. However, zero-shot models typically produce lower quality than models trained on parallel data.
Later, based on the progress of Transformer, Google Palm 2 In 2023, 110 new languages are planned to be released in 2024. PaLM 2 includes significant improvements to translation capabilities learning related languages such as Awadhi and Marwadi (related to Hindi) and French Creole languages such as Seychellois and Mauritian Creole. Improvements in PaLM 2, including compute-optimized scaling, stronger datasets, and design refinements, enable more efficient language learning and support Google’s ongoing efforts to make language support better and larger, and to address the nuances of diverse languages.
Does Transformer really solve the problem of machine translation?
The evolution we are talking about here took 18 years, from Google’s adoption of SMT to the recent addition of 110 languages using Zero-Shot Machine Translation. This is a big leap, and may reduce the need for extensive parallel corpus collection, a historic and very labor-intensive task that the industry has been working on for over 20 years. However, it is premature to declare that machine translation is completely solved, given both technical and ethical considerations.
Current models still struggle with understanding context and coherence, making subtle mistakes that change the intended meaning of a text. These issues are especially pronounced in longer, more complex sentences, where the results require maintaining a logical flow and understanding nuances. Also, cultural nuances and idiomatic expressions are all too often lost or misleading, meaning that even grammatically correct translations may not have the intended effect or sound unnatural.
Pre-training data: PaLM 2 and similar models have been pre-trained on diverse multilingual text corpora, outperforming its predecessor, PaLM. This improvement enables PaLM 2 to outperform multilingual tasks and continues to highlight the importance of traditional datasets in improving translation quality.
Domain-specific or rare languages: In specialized domains such as law, medicine, and technology, parallel corpora ensure that models encounter specific terminology and linguistic nuances. Advanced models can struggle with domain-specific terminology and evolving linguistic trends, posing challenges for zero-shot machine translation. Low-resource languages also translate poorly because they lack the data required to train accurate models.
benchmark: Parallel corpora remain essential to evaluate and benchmark the performance of translation models, but this is especially challenging for languages that lack sufficient parallel corpus data. Automated metrics such as BLEU, BLERT, and METEOR have limitations in assessing nuances in translation quality apart from grammar. However, we humans are hindered by biases. Also, qualified raters are scarce, and it is hard to find perfect bilingual raters for each language pair to find subtle errors.
Resource Intensity: Training and implementation of LLMs requires significant resources, which remains a barrier and limits access to some applications and organizations.
Cultural preservation. The ethical implications are profound. Google Translate researcher Isaac Caswell says of zero-shot machine translation: “You can think of it as being a polyglot who knows many languages, but in addition you can look at texts in 1,000+ languages that haven’t been translated. If you’re a significant polyglot, you can just start reading a novel in another language and start to infer its meaning based on your knowledge of languages in general.” But it’s important to consider the long-term impact on minor languages that lack parallel corpora, which could have implications for cultural preservation as reliance on the language itself fades.