AI Translation Capabilities Reach New Milestone with GPT-4

According to recent study, GPT-4, a large language model created by OpenAI, excels in the art of translation. In reality, its performance is up to the standard of a junior/mid-level human translator. The study, conducted by a team of researchers, involved hiring six professional annotators to evaluate translation quality across various language pairs. This is…

Tina Reynolds Avatar

By

AI Translation Capabilities Reach New Milestone with GPT-4

According to recent study, GPT-4, a large language model created by OpenAI, excels in the art of translation. In reality, its performance is up to the standard of a junior/mid-level human translator. The study, conducted by a team of researchers, involved hiring six professional annotators to evaluate translation quality across various language pairs. This is an important breakthrough in machine learning algorithms, especially in the field of language translation.

The evaluation also determined that GPT-4 would create an average of 3.71 major errors in a given translation. This happened when it was translating segments of text that were about 200 sentences long. This controversial performance was evaluated compared to professional human translators as well as human translators of various experience levels. The comparisons were made between three sets of translators. This meant working with junior-level translators (one to two years of experience), medium-level translators (three to five years), and senior-level translators (ten plus years with a CATTI certification).

Yue Zhang, the associate dean of the school of engineering at Westlake University in Hangzhou, China, emphasized the implications of GPT-4’s performance. “While there have been claims of ‘human parity’ in the past, these have been debated,” Zhang noted, highlighting the ongoing discussions about the capabilities of artificial intelligence in language translation.

These results have shown improvement with senior-level translators, who produced the best quality translations on average, at 1.83 major errors per text segment. GPT-4’s performance is much more in line with junior and medium-level translators. GPT-4 performed extremely well on high resource language pairs too, such as English to Chinese. As in the case of the AI translator, both human and AI struggled with rarer language combinations, such as Chinese to Hindi.

Zhang reflected on the strengths and weaknesses built into human and machine translations. “This is both the advantage of human translators and the disadvantage,” he stated, pointing out the nuanced capabilities that experienced translators bring to complex tasks, particularly those requiring high precision or cultural adaptation.

The research aims to create a more rigorous structure for comparing the performance of an AI tool against that of a human expert at translation. “We wanted to move beyond vague comparisons and scientifically calibrate LLM performance against specific tiers of professional human expertise—ranging from junior to senior translators,” Zhang explained. This careful methodology makes it easier to know how GPT-4 stacks up against well-defined human standards.

For all of its successes, GPT-4’s inclination to over-literalize its translations reveals the shortcomings inherent in the system. For projects that require complex cultural nuances or creative reimaginings, like literary translation, Zhang argues that there’s no substitute for experienced, human translators. Machine learning technology is changing fast. Ingesting the next generation of translators, however promising, researchers are already itching for future iterations that’ll further increase translation accuracy and adaptability.