Training Time Vulnerabilities in Large Language Models: Data Poisoning and Backdoor Attacks
Abstract
Large Language Models (LLMs) have revolutionized natural language processing tasks across diverse domains. However, their increasing complexity and reliance on massive training datasets have introduced novel security risks. This paper explores training time vulnerabilities in LLMs, with a focus on data poisoning and backdoor attacks. Data poisoning involves injecting malicious samples into the training data to subtly influence model behavior, while backdoor attacks embed hidden triggers that cause the model to behave maliciously only under specific conditions. We analyze the mechanics, potential impacts, and detection challenges associated with these threats, highlighting their implications for model integrity, trustworthiness, and deployment safety. Finally, we review recent defense strategies and propose future research directions to mitigate these emerging risks.
How to Cite This Article
Ifeanyi Kingsley Egbuna, Hanafi Musa Olayinka, Adegbola Bidemi Tijani, Saeed Hubairik Aliyu, Ann Ogechi Felix, OLUWSOLA Abiodun Elijah, Mathew Ayokunle Alabi (2025). Training Time Vulnerabilities in Large Language Models: Data Poisoning and Backdoor Attacks . International Journal of Future Engineering Innovations (IJFEI), 2(3), 60-68. DOI: https://doi.org/10.54660/IJFEI.2025.2.3.60-68