Enhancing machine translation: syntax and semantics-based word type and function extraction through multi-task transfer learning in Indonesian, Tolaki, and English
DOI:
https://doi.org/10.21533/pen.v12.i1.32Abstract
This research aimed at constructing an effective Machine Translation (MT) system for the Indonesian, Tolaki, and English languages by integrating in-depth morphological, syntactic, and semantic analyses. Utilizing both supervised and unsupervised methods such as TF-IDF, Word2vec, BERT, and semantic similarity, this research extracted Indonesian and Tolaki words, categorizing them based on function and type within sentences and documents. The research method involved developing a morph tool to capture morphological elements, followed by rule-based algorithm formulation for syntactic analysis to extract word functions and types influencing translation within sentences. Three MT methods, Rule-Based MT (RBMT), Statistical MT (SMT) and SMT-RBMT (hybrid), were tested for translation accuracy. With an average accuracy of approximately 70%, the evaluation of the hybrid MT method demonstrated its superiority over SMT and RBMT, yielding translation accuracies of 0.71 from English into Indonesian into Tolaki and 0.74 from Indonesian into Tolaki into English.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.




