如何评价Google神经机器翻译(GNMT)系统?

来自@Google黑板报的微博: Google机器翻译实现重大突破!Google神经机器翻译(GNMT:Google Neural Machine Translation)系统面世,该系统克服在大型数据集上工作的挑战,不再将句子分解为词和短语独立翻译,而是翻译完整的句子,使得误差降低了 55%-85% 以上。目前这项技术已运用于GoogleTranslate的汉英翻译。 相关报道: Google神经网络机器翻译系统发布,实现机器翻译重大突破_Google黑板报
关注者
927
被浏览
64,719

60 个回答

很好,但是还不够好。

1)基本的翻译功能的确是实现了,比上一版好多了,但是仍然有错译、漏译的情况,所以暂时应该还是很难以替代人工翻译。

2)考虑到机器翻译的用途(网页翻译、辅助翻译等),Google Translate就更不可能替代人工翻译了。将来很有可能是,机器翻译用于辅助及低端的场合,人工翻译用于高端的场合,两者是互助而不是替代的关系,所以 @黄大师 说“作为翻译,看到这个新闻的此时此刻,我理解了18世纪纺织工人看到蒸汽机时的忧虑与恐惧”是不必要的。


我找了三类文本来测试,分别是日常翻译、技术翻译和文学翻译,详细情况如下。从翻译本身的角度来说:

1)汉译英比英译汉略优秀一些,大多数时候英文都能做到语法基本通顺,但是汉语则更容易出错。

2)词、词组的翻译往往没有问题,但是如果原文句式略微复杂一点,就容易判断出错。

3)日常语言翻译基本堪用,技术翻译问题略大,文学翻译基本上用不了。



第一类:日常翻译

来源:【冰】日常英语口语900句

注:下面的每个例子中的第一个原文(英)和第二个原文(汉)是原文档中的对应翻译。

1

I’ve never heard that piece before. Who wrote it?

我从来没有听说过的那件。谁写的?

【评】“that piece”没有上下文的确很难判断是“那件”还是“那首曲子”,但是联系句内上下文“heard that piece”,基本上还是可以判断为“听过那首曲子”,而不是“听说过那件”——“听说过”一般应是“hear about”。

我从没有听过这一段,是谁写的?

I have never heard of this paragraph, who wrote it?

【评】没有上下文,“这一段”翻译为“this paragraph”也不算错。

2

Have you ever thought about becoming a professional musician?

你有没有想过要成为一名职业音乐家?

【评】没错。

你有没有想过成为一名专业的音乐家。

Have you ever thought of becoming a professional musician?

【评】没错。

3

Who is the author of this novel?

谁是这本小说的作者吗?

【评】没错。


这部小说的作者是谁?

Who is the author of this novel?

【评】没错。

4

This writer uses vivid descriptions in his writings.

这位作家在他的作品中采用了生动的描述。

【评】没错。


这位作家在他的作品中运用了生动的手笔。

The writer used a lively handwriting in his work.

【评】没错。

5

How much do you know about the works of Henry Wadsworth Longfellow?

多少钱你知道朗费罗的作品?

【评】“How much”翻译严重错误。“Henry Wadsworth Longfellow”翻译过度简化。


你对亨利·沃兹沃思·朗费罗的作品了解多少?

How much do you know about Henry Wadsworth Longfellow?

【评】 “的作品”漏译。


第二类:技术翻译

例子1:英译汉

来源:3gpp.org/technologies/k

In LTE-Advanced, the possibility for efficient heterogeneous network planning – i.e. a mix of large and small cells - is increased by introduction of Relay Nodes (RNs). The Relay Nodes are low power base stations that will provide enhanced coverage and capacity at cell edges, and hot-spot areas and it can also be used to connect to remote areas without fibre connection. The Relay Node is connected to the Donor eNB (DeNB) via a radio interface, Un, which is a modification of the E-UTRAN air interface Uu. Hence in the Donor cell the radio resources are shared between UEs served directly by the DeNB and the Relay Nodes. When the Uu and Un use different frequencies the Relay Node is referred to as a Type 1a RN, for Type 1 RN Uu and Un utilize the same frequencies, see figure 7. In the latter case there is a high risk for self interference in the Relay Node, when receiving on Uu and transmitting on Un at the same time (or vice versa). This can be avoided through time sharing between Uu and Un, or having different locations of the transmitter and receiver. The RN will to a large extent support the same functionalities as the eNB – however the DeNB will be responsible for MME selection.

在高级LTE,高效异构网络规划的可能性 - 即大型和小型细胞的混合 - 增加通过导入中继节点(RNS)的。中继节点是低功率基站,这将在小区边缘提供增强的覆盖和容量,以及热点地区,它也可用于连接到远程地区没有光纤连接。中继节点被连接到经由无线电接口,元,它是在E-UTRAN无线接口支原体的变形例的供给方eNB(DENB)。因此,在施主小区的无线资源是共享的之间的UE直接服务由DENB和中继节点。当Uu和联合国使用不同的频率的中继节点被称为一个Ia型RN,对于类型1RN Uu和联合国利用相同的频率,参见图7.在后一种情况下,存在于自干扰的高风险中继节点上的Uu接收,并在同一时间(或反之亦然)上未发送时。这可以通过Uu和联合国,或具有发射器和接收器的不同位置之间的时间共享来避免。在RN将在很大程度上支持相同的功能作为所述eNB - 然而DENB将负责的MME的选择。

【评】“ cells”错译,“LTE-Advanced”不算错,但是专有名词处理不当,第一句话语法不通。第二句句式不通。后续错误类似,总之错误不少,基本上通不过基本的翻译检查。

例子2:汉译英

来源:欢迎访问上汽通用汽车

1.5L DVVT发动机最大功率83kW,最大扭矩141Nm;直到四缸,双顶置式凸轮轴,十六气门,可变进气长度进气歧管,多点顺序燃油电控喷射和独立点火模块,双可变正时系统,在动力性能提升10%-20%的同时,百公里综合油耗达到5.4L。目前赛欧3搭载此款发动机,且全系车型可享受3000元国家节能惠民补贴。

1.5L DVVT engine maximum power 83kW, maximum torque 141Nm; until the four-cylinder, double overhead camshaft, sixteen valves, variable intake length intake manifold, multi-point sequential fuel injection and independent ignition module, Variable timing system, the dynamic performance of 10% -20% increase at the same time, 100 km comprehensive fuel consumption to 5.4L. Sail 3 is currently equipped with this engine, and the entire line models can enjoy 3,000 yuan national energy-saving subsidies.

【评】第一个分句没有谓语动词。“直到四缸”错译,整句只是名词罗列,没有整合成英文句子。最后一句翻译的比较好。



第三类:文学翻译

例子1:英译汉

来源:The Picture of Dorian Gray

The studio was filled with the rich odour of roses, and when the light summer wind stirred amidst the trees of the garden, there came through the open door the heavy scent of the lilac, or the more delicate perfume of the pink-flowering thorn.

工作室里弥漫着玫瑰的浓郁的香气,当烟雨园里的树木搅拌光夏风,还有通过开着的门来到丁香沉重的气味,或粉红色花刺的更加细腻的香水。

【评】“and when the light summer wind stirred amidst the trees of the garden”翻译为“当烟雨园里的树木搅拌光夏风”不通顺。“ came”的主被动没有处理好,“ perfume”错译。

From the corner of the divan of Persian saddle-bags on which he was lying, smoking, as was his custom, innumerable cigarettes, Lord Henry Wotton could just catch the gleam of the honey-sweet and honey-coloured blossoms of a laburnum, whose tremulous branches seemed hardly able to bear the burden of a beauty so flamelike as theirs; and now and then the fantastic shadows of birds in flight flitted across the long tussore-silk curtains that were stretched in front of the huge window, producing a kind of momentary Japanese effect, and making him think of those pallid, jade-faced painters of Tokyo who, through the medium of an art that is necessarily immobile, seek to convey the sense of swiftness and motion. The sullen murmur of the bees shouldering their way through the long unmown grass, or circling with monotonous insistence round the dusty gilt horns of the straggling woodbine, seemed to make the stillness more oppressive. The dim roar of London was like the bourdon note of a distant organ.

从波斯马鞍袋上,他是在撒谎,吸烟沙发的一角,因为是他的习惯,无数的香烟,亨利勋爵沃顿可能只是搭上了金链花,他的蜂蜜香甜的蜂蜜色的花朵的光芒颤抖的分支似乎很难能够承受美的负担,flamelike像他们的;现在再在飞行中的鸟类横跨在巨大的橱窗前被拉长,长期tussore,丝绸窗帘闪过的梦幻般的阴影,产生一种瞬间的日本的影响,并让他想起那些苍白,玉石面画家东京谁,通过一门艺术,必然是不动的介质,力求传达迅捷和运动感。蜜蜂肩负自己的方式度过漫长unmown草,或单调的坚持下一轮零零落落的Woodbine的尘土飞扬的鎏金牛角盘旋的愠怒的杂音,似乎让寂静更加压抑。伦敦在朦胧的轰鸣声就像波登注意到远处器官。

【评】“ lying”错译,“ catch”错译,第一句句子根本不通。后面类似的错误不少。显然过不了基本的翻译检查。



例子2:汉译英

来源:《白鹿原》

白嘉轩后来引以豪壮的是一生里娶过七房女人。

Bai Jiaxuan later cited the heroic life is married seven room woman.

【评】英文句子语法是错误的,“七房女人”直接错译。

娶头房媳妇时他刚刚过十六岁生日。那是西原上巩家村大户巩增荣的头生女,比他大两岁。他在完全无知慌乱中度过了新婚之夜,留下了永远羞于向人道及的可笑的傻样,而自己却永生难以忘记。一年后,这个女人死于难产。

He had just passed his 16th birthday when he was married to his first wife. It is the former Nongyuan Gongjiacun large Gongzeng Rong's first daughter, two years older than him. He spent the night in the completely ignorant panic, leaving a humiliation of humiliation and humiliation forever, while he himself is hard to forget. A year later, the woman died of dystocia.

【评】第一句翻译的很好,“he was married to”用被动尤其好(但是我怀疑Google Translate有此文学鉴赏能力)。“西原上”直接错译,"巩增荣"写法错误,整句时态错误(说明Google Translate并不具备自行联系上下文的能力)。“a humiliation of humiliation and humiliation”不知所谓。

第二房娶的是南原庞家村殷实人家庞修瑞的奶干女儿。这女子又正好 比他小两岁,模样俊秀眼睛忽灵儿。她完全不知道嫁人是怎么回事,而他此 时已谙熟男女之间所有的隐秘。他看着她的羞怯慌乱而想到自己第一次的傻 样反倒觉得更富刺激。当他哄唆着把躲躲闪闪而又不敢违坳他的小媳妇裹入 身下的时候,他听到了她的不是欢乐而是痛苦的一声哭叫。当他疲惫地歇息 下来,才发觉肩膀内侧疼痛钻心,她把他咬烂了。他抚伤惜痛的时候,心里 就潮起了对这个娇惯得有点任性的奶干女儿的恼火。正欲发作,她却扳过他 的肩膀暗示他再来一次。一当经过男女间的第一次交欢,她就变得没有节制 的任性。这个女人从下轿顶着红绸盖巾进入白家门楼到躺进一具薄板棺材抬 出这个门楼,时间尚不足一年,是害痨病死的。

The second room to marry is the original Pang Village, South Pang Xunrui people dry milk daughter. This woman is just two years younger than him, looks handsome eyes Linger. She did not know how to get married, and he has been familiar with all the secrets between men and women at this time. He looked at her shyness and panic and think of their first silly hand feel more stimulating. When he coaxed to dodge and not dare to violate Au his daughter-in-law into the body when he heard her not a joy but a cry of pain. When he tired to rest down, only to find medial pain in the shoulders, she bitten him rotten. He relieved the pain when the heart on the tide from the spoiled on the spoiled daughter of the wayward little angry. Was about to attack, she was over his shoulder and hinted that he once again. Once the first intercourse between men and women, she becomes unruly wayward. The woman from the sedan wore red silk covered towel into the white house floor to lie down into a thin coffin carried out of the gatehouse, the time is less than a year, is the death of tuberculosis.

【评】“房”继续错译,“南原”错译,“殷实人家”人家错译,“奶干女儿”女儿,整句说不通——碰到这种文化特色的词语,翻译不通我想是正常的。其他的错误类似。
    搬运自下面这个问题,不过感觉其实更适合答在这里:
    如何评价「谷歌用神经机器系统把汉语翻译成英语,错误率最高下降85%」? - 王赟 Maigo 的回答

  1. 谷歌的神经网络翻译(GNMT)的性能与传统的基于词组的翻译(PBMT)相比,的确有了显著的提高。在不同的语言对上,GNMT把PBMT与人工翻译的鸿沟缩小了 58% ~ 87%,在某些语言对上可以说接近了人工翻译的水平。
  2. 但是,说GNMT将取代人工翻译,还为时尚早。GNMT仍然时不时地会犯一些很傻的错误,论文的最后一页列举了一些,机智的网友们也发现了不少。实际场合的翻译,尤其是书面翻译,对这样的错误容忍度很低。
  3. GNMT的贡献主要还是在不为用户所了解的技术方面。神经网络翻译与PBMT相比,模型「清爽」了许多,一个神经网络搞定一切,只是一直以来在性能和速度方面比不上PBMT。GNMT把神经网络翻译在性能和速度方面的潜力发挥了出来,我觉得神经网络翻译在不久的将来将成为主流。