Enhancing machine translation: syntax and semantics-based word type and function extraction through multi-task transfer learning in Indonesian, Tolaki, and English

Authors

  • Muh Yamin
  • Riyanarto Sarno
  • Tambunan Tambunan

DOI:

https://doi.org/10.21533/pen.v12.i1.32

Abstract

This research aimed at constructing an effective Machine Translation (MT) system for the Indonesian, Tolaki, and English languages by integrating in-depth morphological, syntactic, and semantic analyses. Utilizing both supervised and unsupervised methods such as TF-IDF, Word2vec, BERT, and semantic similarity, this research extracted Indonesian and Tolaki words, categorizing them based on function and type within sentences and documents. The research method involved developing a morph tool to capture morphological elements, followed by rule-based algorithm formulation for syntactic analysis to extract word functions and types influencing translation within sentences. Three MT methods, Rule-Based MT (RBMT), Statistical MT (SMT) and SMT-RBMT (hybrid), were tested for translation accuracy. With an average accuracy of approximately 70%, the evaluation of the hybrid MT method demonstrated its superiority over SMT and RBMT, yielding translation accuracies of 0.71 from English into Indonesian into Tolaki and 0.74 from Indonesian into Tolaki into English.

Downloads

Published

2023-06-16

Issue

Section

Articles

How to Cite

Enhancing machine translation: syntax and semantics-based word type and function extraction through multi-task transfer learning in Indonesian, Tolaki, and English. (2023). Periodicals of Engineering and Natural Sciences, 12(1), 223-235. https://doi.org/10.21533/pen.v12.i1.32