Enhancing machine translation: Syntax and semantics-based word type and function extraction through multi-task transfer learning in Indonesian, Tolaki, and English
Abstract
This research aimed at constructing an effective Machine Translation (MT) system for the Indonesian, Tolaki, and English languages by integrating in depth morphological, syntactic, and semantic analyses. Utilizing both supervised and unsupervised methods such as TF-IDF, Word2vec, BERT, and semantic similarity, this research extracted Indonesian and Tolaki words, categorizing them based on function and type within sentences and documents. The research method involved developing a morph tool to capture morphological elements, followed by rule-based algorithm formulation for syntactic analysis to extract word functions and types influencing translation within sentences. Three MT methods, Rule-Based MT (RBMT), Statistical MT (SMT) and SMT-RBMT (hybrid), were tested for translation accuracy. With an average accuracy of approximately 70%, the evaluation of the hybrid MT method demonstrated its superiority over SMT and RBMT, yielding translation accuracies of 0.71 from English into Indonesian into Tolaki and 0.74 from Indonesian into Tolaki into English.
Keywords
Machine translation; Syntax; Semantics; SMT; RBMT; hybrid MT.
Full Text:
PDFDOI: http://dx.doi.org/10.21533/pen.v12i1.4007
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.
ISSN: 2303-4521
Digital Object Identifier DOI: 10.21533/pen
This work is licensed under a Creative Commons Attribution 4.0 International License