25 of the best large language models in 2025



natural machine translation :: Article Creator

AI And The Future Of Translation: A New Era Of Human-AI Collaboration

Artificial intelligence is transforming industries at an unprecedented pace, and the world of translation is no exception. As AI-driven language models grow more sophisticated, one question continues to surface: Will AI replace human translators? At RWS, we believe the answer is clear—AI will never replace human expertise, but it will fundamentally change how humans and AI collaborate.

This belief is rooted in what we call Genuine Intelligence—the idea that true intelligence is not just artificial, but a combination of machine efficiency and human expertise. AI alone cannot understand nuance, cultural context, or emotion. It can process language, but it cannot truly comprehend meaning. That's why the future of translation isn't about AI replacing people—it's about AI and people working together in smarter, more impactful ways.

A Hybrid Approach: Machine-First, Human-Optimized

Rather than seeing AI as a competitor, we see it as an enabler—one that enhances productivity, improves accuracy, and expands the capabilities of language specialists. AI excels at handling repetitive, time-consuming tasks such as pre-translating content, matching terminology, and analyzing linguistic patterns at scale. However, true translation goes far beyond direct word-for-word replacement. It requires cultural fluency, contextual understanding, and an emotional connection—elements that only human expertise can provide.

At RWS, we embrace a "machine-first, human-optimized" approach, where AI streamlines workflows while language specialists refine quality, fluency, and cultural nuance. This method isn't about automation for automation's sake. It's about using AI to free up human translators and language specialists to focus on the most meaningful aspects of their work—adding creativity, critical thinking, and strategic insight.

Beyond Text: AI's Role in Multimedia Localization and Creation

AI isn't just changing written translation; it's reshaping how multimedia content is produced, localized, and consumed worldwide. According to our recent study titled "Unlocked 2025: Riding the AI Shockwave," 70% of global consumers report seeing more AI-generated multimedia content—videos, images, and audio—since the launch of tools like ChatGPT. This shift has major implications for translation and localization.

In addition, generative AI is rapidly being adopted in industries such as film, music, and advertising, particularly in fast-growing digital markets like Sub-Saharan Africa, where streaming is driving demand. AI-powered tools are helping brands scale content creation while maintaining linguistic and cultural relevance. Consumers now associate leading Gen AI tools like ChatGPT, Gemini by Google, and Microsoft's CoPilot with enhanced creative capabilities, while emerging players from France, the UAE, and China are bringing fresh competition to AI-generated media.

As this digital content consumption grows, consumers increasingly expect global brands to provide seamless, localized multimedia experiences. AI-powered speech recognition, synthetic voices, and automated subtitling are now key to making video content accessible across languages. The demand for dubbing and subtitling has never been higher, particularly in linguistically diverse regions like APAC and Africa, where consumers expect brands to speak their language—literally and figuratively.

But localization goes beyond translation. It's about making content feel native to each audience. Localized imagery, for example, plays a critical role in establishing authenticity. Many markets, especially in the Global South, prefer culturally aligned visuals and narratives in advertising and corporate communications. AI can help automate this process, but human oversight remains essential to ensure content is not just translated but truly localized.

Generative AI is not only transforming enterprise workflows but also fueling a creative renaissance in emerging markets. In Nigeria and India, AI-powered tools are enabling filmmakers, musicians, and content creators to scale their reach globally. Streaming platforms are leveraging AI to automate editing, optimize translations, and produce regionally relevant content, making multimedia more accessible to diverse audiences.

At RWS, our Evolve linguistic AI solution is revolutionizing multimedia localization. By integrating translation management (Trados Enterprise), neural machine translation (Language Weaver), and AI-assisted quality estimation (MTQE), we enable language specialists to refine content efficiently ensuring fluency, accuracy, and cultural alignment.

Consumer Perceptions and Challenges

Despite AI's advancements in multimedia localization, consumers remain cautious. While Unlocked 2025 found that 57% of respondents have noticed improvements in AI-generated multimedia quality, concerns persist around accuracy, cultural relevance, and misinformation. Trust in AI-generated content is particularly low in regulated industries such as healthcare and finance, where errors in translated materials can have serious consequences.

Transparency is also a growing concern. According to the report, 81% of consumers want AI-generated content to be clearly labeled, underscoring the need for greater disclosure in AI-powered multimedia. Additionally, 56% of respondents report a rise in fake multimedia content, including deepfakes and manipulated visuals, raising ethical questions about AI's role in information integrity.

The Future of Multimedia Localization with AI

Looking ahead, AI's impact on multimedia will continue to evolve, driving new opportunities for immersive, personalized content experiences. AI is already enabling advancements in interactive videos, AR/VR applications, and dynamic advertisements tailored to individual user preferences. Initiatives like Mozilla's Common Voice project are also expanding voice AI capabilities, helping to generate high-quality voiceovers in underserved languages.

But here's what will set successful brands apart: finding the right balance between automation and human expertise. Hybrid human-AI approaches—where AI accelerates workflows and humans provide cultural and creative oversight—will be key to maintaining authenticity, trust, and engagement in multilingual content.

Final Thoughts: The Role of Genuine Intelligence

The future of translation and localization isn't about AI replacing humans—it's about using AI intelligently to enhance human expertise. This is the essence of Genuine Intelligence: a collaborative approach where AI accelerates workflows, and human specialists ensure accuracy, cultural authenticity, and emotional connection.

Generative AI is unlocking new possibilities for content creation and localization. However, long-term success will depend on balancing automation with human oversight to build trust, transparency, and engagement in multilingual content.

Ultimately, the most impactful brands won't just adopt AI—they'll integrate it thoughtfully, using technology to scale while ensuring content remains culturally resonant. But to truly connect with diverse audiences, human contribution is essential. Not just any human input—but the nuanced expertise of today's language specialists: professionals who combine domain knowledge, linguistic fluency, cultural sensitivity, technical skill, and creative instinct. It's this combination of capabilities that ensures AI-generated content isn't just fast and functional, but also fluent, relevant, and emotionally intelligent. AI may power the process—but it's human specialists who give content its meaning.


Effective Use Of Target-side Context For Neural Machine Translation

MINO Hideya, ITO Hitoshi,GOTO Isao, and YAMADA Ichiro

Through the progress made in a sentence-level neural machine translation (NMT), a context-aware NMT has rapidly been developed to exploit previous sentences as context. Recent work on the context-aware NMT incorporates source- or target-side contexts. In contrast to the source-side context, the target-side context causes a gap between training that utilizes a ground-truth sentence and inference by using a machine-translated sentence as context. This gap causes translation quality to deteriorate. In this paper, we propose sampling of both the ground truth and the machine-translated previous sentences of the target-side for the context-aware NMT. The models using our proposed approach show improvements over models using the previous approaches.

1. Introduction

Although conventional neural machine translation (NMT) systems commonly perform translation of individual sentences one at a time, there are cases where translation cannot be performed correctly using only information from a single sentence. Among the resulting problems is translation consistency. For example, it is generally not desirable for the translation of "Science & Technology Research Laboratories" to use a mixture of "NHK Hoso Gijutsu Kenkyusho", "NHK Giken", "Hoso Gijutsu Kenkyusho", and "STRL" within a sentence or a document. Furthermore, it is unnatural for Japanese documents to contain a mixture of the "desu/masu" and "dearu" sentence endings, and it is often difficult to uniformly translate these expressions using only the information contained in a single sentence. As a result, resolving the ambiguity of semantics is a problem in sentence-based translation.

As a well-known example, when "I go to the bank" is translated into Japanese, it is difficult to judge whether the "bank" in this sentence is a "ginko" (financial institution) or a "dote" (river embankment) from this sentence alone. There is also the problem of zero anaphora resolution*1, which is needed when translating from language such as Japanese, which permit zero pronouns*2, to language such as English, which do not permit subject omission, since information across sentences is often necessary. These problems can be resolved by using context information contained in the sentences before and after the sentence being translated, and there has been related work addressing these issues1). This paper focuses on NMT using context information of the target language*3.

Although Bawden et al.2) proposed an NMT model that inputs a previous sentence of the target language as context information, they reported lower translation quality. However, since there are numerous cases where the way the previous sentence was translated provides important information, such as the consistency of the "desu/masu" and "dearu" sentence endings described above, there is a requirement that translation technologies can appropriately handle context information of the target language.

With the above points in mind, we examine the problems encountered in existing research on NMT using context information of the target language and report on a proposed method for improving the translation quality by resolving these problems, along with the results of experiments aimed at confirming the effects of the proposed method.

  • The process of identifying the antecedent of a zero pronoun.
  • A noun phrase in the indispensable case that is omitted in a document.
  • The language being translated to. For example, in Japanese-to-English translation, the target language is English and the source language is Japanese.
  • 2. Problems with Existing Methods

    We use the following NMT method 2) for training and translation that use the previous sentence of the target language as context information.

    [Training]

    The translation model is trained by taking the reference translation (reference)*4 of the previous sentence of the target language and the sentence to be translated of the source language as inputs, and the reference of the sentence to be translated as the output.

    [Translation]

    The machine-translation output for the previous sentence of the target language and the sentence to be translated of the source language are input to the pretrained model from which the machine-translation output is obtained.

    In existing methods, whereas the reference is used as context information of the target language during training, the machine-translation output is used during translation. We believe that this difference reduces translation quality. Additionally, whereas the reference is the correct translation of context information of the target language, the machine-translation output contains errors that are particular to machine translation, such as missing error (under-translation), excess error (over-translation), and incorrect-translation error (mistranslation). Furthermore, even if the machine-translation output is free of these errors, it is known that machine-translation outputs contain a bias referred to as "translationese", that is, machine-translation outputs tend to be simpler and more standardized than human-translated text 3). Therefore, the predicted target-side sentence tends to have lower diversity than the reference.

    The model bias that occurs owing to the use of data with different features between during training and during translation is called exposure bias4), and this bias has been reported to lead to degraded translation quality 5). Since existing methods that use reference and machine-translation outputs are not considered to appropriately handle different features in the training and translation processes, the translation model that considers context information will not be trained effectively. This factor would lead to the degradation of translation quality. In the next section, we propose a method to alleviate the problem of differences (gaps) in the characteristics of context information used during training and during translation.

  • Text that is translated accurately.
  • 3. Proposed Method

    The simplest method to eliminate the gap in context information during training and during translation is to use contexts with a common feature. In other words, use either the reference or the machine-translation output during both training and translation. Although it is normally preferable to use the reference, which is free of translation errors, since the reference is not available during translation, the only remaining choice is to use the machine-translation output during both training and translation. Unfortunately, it may not be possible to extract the appropriate information from context information by performing training using the machine-translation output because of the presence of translationese, which means translation errors will be included.

    With these points in mind, a method of using both the reference and machine-translation outputs during training instead of using only the machine-translation output was proposed. This method controls context information sampling*5 using the curriculum learning 4) and schedule sampling 6) approaches. The proposed method trains the translation model by reconstructing the training data every l training epochs*6 and by changing the proportion (sampling rate p) of data using the reference as context information and the data proportion (sampling rate 1-p) using the machine-translation output as context information.

    (1) In the first epoch of training (l = 1), training data in which all the context information of the target language is reference (sampling rate p = 1) are used.

    (2) In the second and subsequent training epochs (l ≥ 2), training data that are reconstructed by changing the proportions of the reference and machine-translation outputs in accordance with the changed sampling rate p are used.

    Here, k (≥1) is a hyperparameter that controls the sampling rate p. If the translation performance is high when the first epoch ends, the value of k is taken close to 1, and the proportion of machine-translation output in the training data for the second and subsequent epochs is increased. By reducing the value of p, context information in the training data is gradually changed from reference to machine-translation outputs. By this procedure, the model learns to be robust against translation errors.

  • Probabilistically extracting and using either the reference or machine-translation output as context information of the target language for use in the training is called sampling.
  • Number of iterations of training over the whole training data. NMT models are usually trained numerous times using all of the training data.
  • 4. Experiments

    The effects of the proposed method were confirmed by English-to-Japanese and Japanese-to-English machine-translation experiments using a news corpus*7 based on Jiji News7) and English-to-Japanese, Japanese-to-English, English-to-German, and German-to-English machine-translation experiments using the TED talk corpus*8 published at IWSLT2017 8).

    The following three methods are used for comparison.

    Method 1: NMT method without using context information (Single-Sent.)

    Method 2: NMT method using only the reference of the target language as context information during training (Trg-Context GT).

    Method 3: NMT method using only the machine-translation output of the target language as context information during training (Trg-Context Pred.)

    Methods 1, 2, and 3 were implemented using an encoder and a decoder of the transformer model 9). Methods 2 and 3 and the proposed method used both concatenation-based context-aware NMT 10) and multi-encoder context-aware NMT 2).

    Figures 1 and 2 show overview diagrams of the proposed method implemented using each of the above methods. Although both consist of the encoder and the decoder, the methods of inputting context information differ. More specifically, the concatenation-based context-aware NMT method has an architecture where the sentence of context information and the sentence to be translated are concatenated using a special symbol ("_BREAK_" in Fig. 1), treated as a single piece of data, and input to a single encoder. In contrast, the multi-encoder context-aware NMT method has an architecture consisting of two (source text and context) encoders and one decoder, in which the sentence of context information and the sentence to be translated are input to separate encoders.

    In this paper, we report the experimental results of using the multi-encoder context-aware NMT method, which had the highest translation quality of the two architectures described above. The encoder and decoder in the context-aware NMT method were implemented using the same transformer model as used in Method 1. For translation performance comparisons, five models in which the initial pretraining parameters were changed randomly were created, and the median value of the Bilingual Evaluation Understudy (BLEU) score 11) of the translation output by each model was used. The case-insensitive BLEU-4*9 method was used to calculate the BLEU score.

    Table 1 shows the experimental results. First, compared with Method 1 (Single-Sent.), which does not use context information, we found that the methods that use context information have improved BLEU scores for all tasks and that adding context improves the translation quality. Next, comparing Method 2 (Trg-Context GT), which uses different context information between training and translation (references during training and machine-translation outputs during translation), with Method 3 (Trg-Context Pred.), which uses machine-translation output as context information both during training and during translation, we found that the translation quality of Method 3 was higher in all cases except for the Japanese-to-English machine-translation task of the news corpus and the German-to-English machine-translation task of the TED talk corpus. This indicates that it may be possible to improve the translation quality by using the machine-translation output as context information during both training and translation (depending on the translation task), thereby eliminating the gap mentioned above. Finally, comparing Methods 2 and 3 with the proposed method, we found that our proposed method was superior in terms of BLEU scores for all tasks, which confirmed the effectiveness of controlling and using the data of both the reference and machine-translation outputs.

    Figure 1: Overview of concatenation-based context-aware neural machine translation Figure 2: Overview of multi-encoder context-aware neural machine translation
  • The news corpus focuses on bilingual data consisting of Japanese language news from Jiji Press that has been translated into English.
  • The TED talk corpus consists of bilingual data extracted from multilingual subtitles of lectures published by TED.
  • A measure of precision of 4-gram (i.E., a sequence of four words) that evaluates the similarity to the reference without distinguishing between uppercase and lowercase letters.
  • Table 1: Experimental results using the news and TED talk corpora 5. Summary

    In this paper, we introduced a method of using the previous sentence of the target language of the text to be translated as context information. There has recently been considerable research on using various types of external information other than the text to be translated as ways of improving the quality of machine translations. External information also includes image information that is currently widely used in multimodal translations*10 and pretrained language models such as BERT 12). Although the probability of raising the translation quality is improved by increasing the amount of treated information, this requires increased computer calculation processing capacity. Moreover, blindly adding additional information is not necessarily appropriate. Furthermore, unless each piece of information is independent, it is difficult to analyze, for example, which part of the added information contributed to the improvement of the translation quality. Machine translations using context information are still evolving, and we will continue to research its potential.

    This paper was compiled and edited from a presentation given at the 28th International Conference on Computational Linguistics (COLING2020) and the following papers published in "Natural Language Processing", which is the journal of the Association for Natural Language Processing.

    H. Mino, H. Ito, I. Goto, I. Yamada and T. Tokunaga: "Effective Use of Target-side Context for Neural Machine Translation," Proc. 28th International Conference on Computational Linguistics, pp. 4483–4494 (2020).

    H. Mino, H. Ito, I. Goto, I. Yamada and T. Tokunaga: "Effective Use of Target-side Context for Neural Machine Translation," Natural Language Processing, Vol. 28, No. 4, pp. 1162–1183 (2021).

  • Multimodal translation refers to machine translation performed using multiple modalities, including sight, hearing, and touch.
  • References
  • S. Maruf, F. Saleh and G. Haffari: "Survey on Document-level Neural Machine Translation: Methods and Evaluation," ACM Computing Surveys, Vol.54, No.2 (2021).
  • R. Bawden, R. Sennrich, A. Birch and B. Haddow: "Evaluating Discourse Phenomena in Neural Machine Translation," Proc. NAACL-HLT 2018, Vol.1 pp.1304–1313 (2018).
  • A. Toral: "Post-editese: an Exacerbated Translationese," Proc. MT Summit XVII, Vol.1, pp.273–281 (2019).
  • Y. Bengio, J. Louradour, R. Collobert and J. Weston: Curriculum Learning, ICML, Vol.382 of ACM International Conference Proceeding Series (2009).
  • W. Zhang, Y. Feng, F. Meng, D. You and Q. Liu: "Bridging the Gap between Training and Inference for Neural Machine Translation," Proc. 57th Annual Meeting of the Association for Computational Linguistics, pp.4334–4343 (2019).
  • S. Bengio, O. Vinyals, N. Jaitly and N. Shazeer: "Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks," Advances in Neural Information Processing Systems, pp.1171–1179 (2015).
  • H. Mino, H. Tanaka, H. Ito, I. Goto, I. Yamada and T. Tokunaga: "Content-equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT," Proc. LREC 2020, pp.3616–3622 (2020).
  • M. Cettolo, C. Girardi and M. Federico: "WIT3: Web Inventory of Transcribed and Translated Talks," Proc. EAMT 2012, pp.261–268 (2012).
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin: "Attention is All You Need," Proc. NIPS 2017, pp.5998–6008 (2017).
  • J. Tiedemann and Y. Scherrer: "Neural Machine Translation with Extended Context," Proc. Third Workshop on Discourse in Machine Translation, pp.82–92 (2017).
  • K. Papineni, S. Roukos, T. Ward and W.-J. Zhu: "BLEU: a Method for Automatic Evaluation of Machine Translation," Proc. 40th Annual Meeting of the Association for Computational Linguistics, pp.311–318 (2002).
  • J. Devlin, M.-W. Chang, K. Lee and K. Toutanova: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proc. NAACL-HLT 2019 pp.4171–4186 (2019).
  • MINO Hideya

    MINO Hideya joined NHK in 2004. After working at NHK Kitami Station, he moved to NHK Science & Technology Research Laboratories (STRL) in 2009 to conduct research on natural language processing. He was seconded to the National Institute of Information and Communications Technology (NICT) from 2013 to 2017 and currently serves as a member of STRL's Smart Production Research Division. He holds a Ph.D. In engineering.

    ITO Hitoshi

    ITO Hitoshi joined NHK in 2012. After working at NHK Yamaguchi Station, he moved to NHK Science & Technology Research Laboratories (STRL) in 2015 to conduct research on speech recognition and natural language processing. He currently serves as a member of STRL's Smart Production Research Division.

    GOTO Isao

    GOTO Isao joined NHK in 1997. After working at NHK Sendai Station, he moved to NHK Science & Technology Research Laboratories (STRL) in 1999 to conduct research on natural language processing. He was seconded to the Advanced Telecommunications Research Institute International (ATR) from 2004 to 2006 and then to the National Institute of Information and Communications Technology (NICT) from 2008 to 2013. He currently serves as Principal Research Engineer of STRL's Smart Production Research Division. He holds a Ph.D. In information science.

    YAMADA Ichiro

    YAMADA Ichiro joined NHK in 1993. After working at NHK Nagoya Station, he moved to NHK Science & Technology Research Laboratories (STRL) in 1996 to conduct research on big data analysis and natural language processing. He was dispatched to Stanford University from 2003 to 2004 and then seconded to the National Institute of Information and Communications Technology (NICT) from 2008 to 2011. He currently serves as Executive Research Producer of STRL's Smart Production Research Division. He holds a Ph.D. In information science.






    Comments

    Follow It

    Popular posts from this blog

    What is Generative AI? Everything You Need to Know

    Top Generative AI Tools 2024

    60 Growing AI Companies & Startups (2025)