Multilingual Security Document Understanding Based on XLNet Transfer Learning

Lesław Jura; Tadeusz Kacz; Bogdan Kalisz

doi:10.64972/jiic.2025v3.221p8s:97-110

Authors

Lesław Jura Faculty of Informatics, University of Gdansk, Gdansk, 80-952, Poland
Tadeusz Kacz Faculty of Informatics, University of Gdansk, Gdansk, 80-952, Poland
Bogdan Kalisz Faculty of Information Technology, University of Rzeszow, Rzeszow, 35-959, Poland

DOI:

https://doi.org/10.64972/jiic.2025v3.221p8s:97-110

Keywords:

Multilingual Document Processing, Security Text Analysis, Transfer Learning, XLNet

Abstract

In order to address the technological issue of automatic multilingual security document interpretation, this study proposes a specific architecture based on XLNet-based transfer learning. First, we discovered that the real security materials contain a variety of languages as well as different degrees of structural irregularity and semantic complexity after conducting a thorough technical examination of them. Permutations, adaptive tokenization, domain-specific feature learning, and other techniques are the foundation of the suggested method of handling contexts. The model achieved a macro-averaged accuracy of 92.2% for English, 90.6% for Chinese, and maintained an accuracy of over 87% across all low-resource languages using a relatively large-scale, high-quality benchmark of over 120,000 annotated security papers in six languages. This structure has demonstrated lower entity boundary errors and higher F1 scores for rare and code-mixed event categories when compared to the well-known models of BERT and RoBERTa. It has been discovered that the model is rather stable in identifying threats and resolving ambiguity among the compliance and vulnerability descriptions based on the aforementioned thorough error analysis and real case validation. A new engineering standard for cross-lingual cybersecurity intelligence and compliance analysis has been established based on the aforementioned results, which show that permutation-driven transfer learning can accomplish dependable, high-precision multilingual information extraction and categorization.

Multilingual Security Document Understanding Based on XLNet Transfer Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Information

Make a Submission