Biomedical Article Abstract Generation Based on the BART Seq2Seq Model
DOI:
https://doi.org/10.64972/jiic.2026v4.142p6s:67-79Keywords:
Computer-Aided Summarization, Sequence-to-Sequence Learning, Biomedical NLP, Domain-Specific Pretraining, Data Augmentation, Automatic Abstract GenerationAbstract
To meet the needs of biomedical text mining and information summarization, this paper introduces a structured solution based on the BART sequence-to-sequence (Seq2Seq) neural architecture. This study systematically examines the vocabulary, factual inconsistencies, and heterogeneous document structures in biomedical literature. Large-scale domain-specific pre-training, targeted model fine-tuning, and robust data augmentation methods are three approaches to achieve this goal. A dataset containing 250,000 pairs of biomedical document summaries has been widely used in experiments. It is divided into a training set, a validation set, and a test set. ROUGE-1, ROUGE-2, BLEU, and BERTScore were evaluated by both systems and human experts. The results show that ROUGE-1 reached 46.1, BLEU reached 22.5; the average human consistency score was 4.3 out of 5. Ablation analysis shows that all components of the model—pre-training strategies, data augmentation, architecture optimization, etc.—contribute to improving the model's performance and reducing over 50% of redundancy and hallucination errors. This program can reduce manual sorting time by over 35% in practice. It has performed well across various biomedical data and text lengths. The model can independently generate scientific narratives with high reliability and accuracy to support advanced management and research in the biomedical field.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Bence Kovács, Zsófia Szabó, Dávid Tóth

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.