Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification

Ruijuan Zhang¹

School of Foreign Languages, Zhengzhou University of Science and Technology
Zhengzhou, 450064, China
ruijzhang2024@163.com

Abstract

Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture longrange contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multifeature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT’s deep semantic features, Bi-LSTM’s bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.

Key words

Multi-feature fusion, multilingual pretrained model, English text classification, multi-level structure learning

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS240630004Z

Publication information

Volume 22, Issue 1 (January 2025)
Year of Publication: 2025
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

Download Available in PDF
Portable Document Format

How to cite

Zhang, R.: Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification. Computer Science and Information Systems, Vol. 22, No. 1, 133–152. (2025), https://doi.org/10.2298/CSIS240630004Z