Multilingual Security Document Understanding Based on XLNet Transfer Learning

Authors

  • Lesław Jura Faculty of Informatics, University of Gdansk, Gdansk, 80-952, Poland
  • Tadeusz Kacz Faculty of Informatics, University of Gdansk, Gdansk, 80-952, Poland
  • Bogdan Kalisz Faculty of Information Technology, University of Rzeszow, Rzeszow, 35-959, Poland

DOI:

https://doi.org/10.64972/jiic.2025v3.221p8s:97-110

Keywords:

Multilingual Document Processing, Security Text Analysis, Transfer Learning, XLNet

Abstract

In order to address the technological issue of automatic multilingual security document interpretation, this study proposes a specific architecture based on XLNet-based transfer learning. First, we discovered that the real security materials contain a variety of languages as well as different degrees of structural irregularity and semantic complexity after conducting a thorough technical examination of them. Permutations, adaptive tokenization, domain-specific feature learning, and other techniques are the foundation of the suggested method of handling contexts. The model achieved a macro-averaged accuracy of 92.2% for English, 90.6% for Chinese, and maintained an accuracy of over 87% across all low-resource languages using a relatively large-scale, high-quality benchmark of over 120,000 annotated security papers in six languages. This structure has demonstrated lower entity boundary errors and higher F1 scores for rare and code-mixed event categories when compared to the well-known models of BERT and RoBERTa. It has been discovered that the model is rather stable in identifying threats and resolving ambiguity among the compliance and vulnerability descriptions based on the aforementioned thorough error analysis and real case validation. A new engineering standard for cross-lingual cybersecurity intelligence and compliance analysis has been established based on the aforementioned results, which show that permutation-driven transfer learning can accomplish dependable, high-precision multilingual information extraction and categorization.

Downloads

Published

2025-04-03

How to Cite

Jura, L., Kacz, T., & Kalisz, B. (2025). Multilingual Security Document Understanding Based on XLNet Transfer Learning. Journal of Intelligent Information and Communication, 3, 8s:97–110. https://doi.org/10.64972/jiic.2025v3.221p8s:97-110

Issue

Section

Articles

Similar Articles

1 2 3 > >> 

You may also start an advanced similarity search for this article.