Dual-Encoder BERT Approach for Cross-Domain Scientific Literature Retrieval
DOI:
https://doi.org/10.64972/jiic.2026v4.143p7s:80-93Keywords:
Neural Information Retrieval, Cross-Domain Search, BERT, Dual-Encoder Architecture, Contrastive Learning, Scientific Document Indexing, Scalability, Semantic EmbeddingAbstract
Deep learning-based retrieval models perform well in large-scale scientific papers; however, due to domain and semantic differences, retrieving relevant literature from different knowledge domains remains a technical challenge. This paper introduces a BERT-based dual-encoder architecture for fast and accurate cross-domain scientific literature retrieval. The BERT encoder uses parameter sharing to encode queries and candidate documents, aligning their semantic representations in a shared embedding space. Throughout the entire experiment, the three heterogeneous scientific corpora covered over 4.2 million documents across twelve research fields. Using the BERT tokenizer, each text is normalized and processed, and then embedded into a vectorized index in FAISS for fast nearest neighbor search. Training uses batch InfoNCE loss and hard negative sampling, along with dynamic batch adjustment and early stopping mechanisms. According to the above empirical results, the proposed method achieves an average accuracy of 0.624, outperforming strong neural and traditional baselines such as ColBERT and BM25. In high-resource domains, the recall rate of the top ten exceeds 0.83, while it remains stable in low-resource domains, indicating its broad applicability. Ablation studies also indicate that batch-based negative sample mining and attention regularization require good performance; engineering analysis has already achieved efficient indexing and query latencies below 100 milliseconds. Based on the above findings, an interdisciplinary academic search engine can be constructed using a dual-encoder BERT model optimized with contrastive learning and scalable vector indexing.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Marko Jovanović, Jelena Nikolić, Aleksandar Popović, Dušan Đorđević

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.