Dual-Encoder BERT Approach for Cross-Domain Scientific Literature Retrieval

Authors

  • Marko Jovanović Faculty of Applied Information Technology, Belgrade Higher Professional School, 11070 Belgrade, Serbia
  • Jelena Nikolić Faculty of Information Technology, Alfa BK University, 11000 Belgrade, Serbia
  • Aleksandar Popović Faculty of Information Technology, Alfa BK University, 11000 Belgrade, Serbia
  • Dušan Đorđević Faculty of Computer Science, University of Novi Sad, 21000 Novi Sad, Serbia

DOI:

https://doi.org/10.64972/jiic.2026v4.143p7s:80-93

Keywords:

Neural Information Retrieval, Cross-Domain Search, BERT, Dual-Encoder Architecture, Contrastive Learning, Scientific Document Indexing, Scalability, Semantic Embedding

Abstract

Deep learning-based retrieval models perform well in large-scale scientific papers; however, due to domain and semantic differences, retrieving relevant literature from different knowledge domains remains a technical challenge. This paper introduces a BERT-based dual-encoder architecture for fast and accurate cross-domain scientific literature retrieval. The BERT encoder uses parameter sharing to encode queries and candidate documents, aligning their semantic representations in a shared embedding space. Throughout the entire experiment, the three heterogeneous scientific corpora covered over 4.2 million documents across twelve research fields. Using the BERT tokenizer, each text is normalized and processed, and then embedded into a vectorized index in FAISS for fast nearest neighbor search. Training uses batch InfoNCE loss and hard negative sampling, along with dynamic batch adjustment and early stopping mechanisms. According to the above empirical results, the proposed method achieves an average accuracy of 0.624, outperforming strong neural and traditional baselines such as ColBERT and BM25. In high-resource domains, the recall rate of the top ten exceeds 0.83, while it remains stable in low-resource domains, indicating its broad applicability. Ablation studies also indicate that batch-based negative sample mining and attention regularization require good performance; engineering analysis has already achieved efficient indexing and query latencies below 100 milliseconds. Based on the above findings, an interdisciplinary academic search engine can be constructed using a dual-encoder BERT model optimized with contrastive learning and scalable vector indexing.

Downloads

Published

2026-01-20

How to Cite

Jovanović, M., Nikolić, J., Popović, A., & Đorđević, D. (2026). Dual-Encoder BERT Approach for Cross-Domain Scientific Literature Retrieval. Journal of Intelligent Information and Communication, 4, 7s:80–93. https://doi.org/10.64972/jiic.2026v4.143p7s:80-93

Issue

Section

Articles

Similar Articles

1 2 3 > >> 

You may also start an advanced similarity search for this article.