Enhancing machine translation: Syntax and semantics-based word type and function extraction through multi-task transfer learning in Indonesian, Tolaki, and English

Muh Yamin, Riyanarto Sarno, Tambunan Tambunan

Abstract


This research aimed at constructing an effective Machine Translation (MT) system for the Indonesian, Tolaki, and English languages by integrating in depth morphological, syntactic, and semantic analyses. Utilizing both supervised and unsupervised methods such as TF-IDF, Word2vec, BERT, and semantic similarity, this research extracted Indonesian and Tolaki words, categorizing them based on function and type within sentences and documents. The research method involved developing a morph tool to capture morphological elements, followed by rule-based algorithm formulation for syntactic analysis to extract word functions and types influencing translation within sentences. Three MT methods, Rule-Based MT (RBMT), Statistical MT (SMT) and SMT-RBMT (hybrid), were tested for translation accuracy. With an average accuracy of approximately 70%, the evaluation of the hybrid MT method demonstrated its superiority over SMT and RBMT, yielding translation accuracies of 0.71 from English into Indonesian into Tolaki and 0.74 from Indonesian into Tolaki into English.

Keywords


Machine translation; Syntax; Semantics; SMT; RBMT; hybrid MT.

Full Text:

PDF


DOI: http://dx.doi.org/10.21533/pen.v12i1.4007

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Authors

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2303-4521

Digital Object Identifier DOI: 10.21533/pen

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License