BPE: Subword Units for Neural Machine Translation of Rare Words
No se pudo agregar al carrito
Add to Cart failed.
Error al Agregar a Lista de Deseos.
Error al eliminar de la lista de deseos.
Error al añadir a tu biblioteca
Error al seguir el podcast
Error al dejar de seguir el podcast
-
Narrado por:
-
De:
This 2016 academic paper addresses the challenge of translating rare and unknown words in Neural Machine Translation (NMT), a common issue as NMT models typically operate with a fixed vocabulary while translation itself is an open-vocabulary problem. The authors propose a novel approach where rare and unknown words are encoded as sequences of subword units, eliminating the need for a back-off dictionary. They introduce an adaptation of the Byte Pair Encoding (BPE) compression algorithm for word segmentation, which allows for an open vocabulary using a compact set of variable-length character sequences. Empirical results demonstrate that this subword unit method significantly improves translation quality, particularly for rare and out-of-vocabulary words, for English-German and English-Russian language pairs. The paper compares various segmentation techniques, concluding that BPE offers a more effective and simpler solution for handling the open-vocabulary problem in NMT compared to previous word-level models and dictionary-based approaches.
Source: https://arxiv.org/pdf/1508.07909