3rd Asia Pacific International Conference on Industrial Engineering and Operations Management

The Effect of Encoder and Decoder Stack Depth of Transformer Model to Performance of Machine Translator for Low-resource Languages

Yaya Heryadi, Cuk Tho, Bambang Dwi Wijanarko, Dina Fitria Murad & Kiyota Hashimoto
Publisher: IEOM Society International
0 Paper Citations
1 Views
1 Downloads
Track: IoT
Abstract

Automated language translator has wide potential applications especially in plural language countries.  Study on low-resource languages is very crucial such as making information accessible to people live in less-connected and technologically underdeveloped areas, making more digital content available, and making Natural Language Processing models more accessible to low-resource languages. Vanilla transformer model has achieved excellent performance to address machine translation task. Despite its high performance, the model contains adjustable hyperparameters such as the number of encoder-decoder stack depth. This paper presents exploration results on the effect of encoder-decoder stack depth to performance of the vanilla transformer model as a neural machine translation of Bahasa Indonesia-Sundanese languages. The empiric results of fine-tuning a pretrained vanilla transformer model showed that average performances of vanilla transformer model with 2, 4, or 6 stack depth are higher than average performance of the model with 8 stack depth. The highest performances achieved by the transformer model with 2 stack depth are: 0.99 average training accuracy, 0.97 average validation accuracy, and 0.99 average testing similarity. Interestingly, according to non-parameteric significance test results with 95% confidence interval, there is no siginificant difference on performance of vanilla transformer model with 2, 4, 6, and 8 stack depths. These results showed that using vanilla transformer with less number of depth stack is favourable for machine translation as it has less number of model parameters but it gives acceptable model performance. From experimentation results, it showed that vanilla transformer model with 2 stack depth is potential to be explored further.

Keywords

Neural Machine Translation, Transformer Model, Natural Language Processing, Encoder, Decoder, Stack Depth.

Published in: 3rd Asia Pacific International Conference on Industrial Engineering and Operations Management, Johor Bahru, Malaysia

Publisher: IEOM Society International
Date of Conference: September 13-15, 2022

ISBN: 978-1-7923-9162-0
ISSN/E-ISSN: 2169-8767