Word-Alignment Emphasized Dependency Aware Decoder (WDAD) with Data Augmentation in Nonautoregressive Translation

EasyChair Preprint 11990

54 pages•Date: February 8, 2024

Yupei Li

Abstract

The Non-Auto-Regressive model (NAT) for machine translation offers increased effi

ciency compared to autoregressive models but faces challenges related to target-side

dependencies. Two issues arise: over and under-translation; and a multi-modal

problem of natural language. To mitigate these problems, previous researchers have

made extensive efforts, particularly with the Dependency Awareness Decoder (DAD)

model. While these models focus on retaining target-side dependencies to enhance

performance to some extent, they still leave two gaps in cross-lingual translation

tasks: word embeddings in shared embedding space and shared character sequences.

This paper proposes two solutions to address these issues, namely adaptation from

the Ernie-M model and data augmentation involving language BPE(LBPE), respec

tively. Additionally, the paper explores their combined effect, enabling language

prompts to help the model distinguish tokens from different languages and cluster

words from a semantic perspective. Thus, the Word-alignment Language-Prompted

DAD (WDAD) model with data augmentation is proposed, which indeed demon

strates progress.

Combination model of LBPE and CAMLM contributes approximately +0.5 BLEU

score points on the WMT14 De-En pair dataset, and CAMLM contributes approxi

mately +1 BLEU score points on the WMT16 En-Ro dataset, while the combined

model exhibits limitations in its interaction with the combined work due to the inap

propriate data augmentation strategy of LBPE, as evidenced by a mixed data strategy

and language embedding layer, and the baseline data augmentation strategy. But

this does not deny the principle of LBPE and any effects LBPE made at all. It is just

a sign that there are better solutions for data augmentation strategy.

Keyphrases: NAT, XML, word alignment

Links:

https://easychair.org/publications/preprint/qgQh

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:11990,
  author    = {Yupei Li},
  title     = {Word-Alignment Emphasized Dependency Aware Decoder (WDAD) with Data Augmentation in Nonautoregressive Translation},
  howpublished = {EasyChair Preprint 11990},
  year      = {EasyChair, 2024}}

Download PDF Open PDF in browser