July 25 ~ 26, 2025, Virtual Conference
Vitalii Shutov, Arip Asadulaev, ITMO University, Russian Federation
Vision Transformers (ViTs) offer strong performance but face high computational costs from processing all tokens through their full depth. Standard ViTs lack adaptivity. This work introduces Adaptive Halting Transformer (AHT-ViT) to enhance efficiency by dynamically adjusting processing depth per token. AHT-ViT employs hierarchical ”planner” modules predicting token-specific target halting depths and an extremely parameter-efficient ”supervisor” mechanism (two shared parameters) generating per-layer halting scores. Tokens halt when their cumulative score crosses a threshold. A novel KL divergence-based loss, Ltarget depth, explicitly aligns executed halting distributions with planned depths. Evaluation on ImageNet, Places365, and CIFAR-100 using DeiT-S shows AHT-ViT achieves an improved accuracy-efficiency trade-off compared to its static baseline and demonstrates competitive performance against other adaptive methods (DynamicViT, A-ViT) evaluated under the same conditions, while significantly reducing FLOPs. Key hyperparameters were selected via grid search on a validation split.
Vision Transformer, Adaptive Computation, Early Exit, Dynamic Depth, Model Efficiency, Image Classification.
Saikrishna Rajanidi, Anbazhagan M, Ramya G. R, Department of Computer Science and Engineering, Amrita School of computing Amrita Vishwa Vidyapeetham Coimbatore, India
This paper explores the evolution of large language models (LLMs) and the growing role of retrieval-augmented generation (RAG) systems in overcoming challenges in domain-specific applications. Although LLMs have revolutionized natural language processing (NLP), they face critical limitations in high-stakes domains such as medicine, engineering, and law—where accuracy, factuality, and trust are paramount. These shortcomings include hallucinations, outdated knowledge, and vulnerability to adversarial prompts. RAG systems address these issues by integrating LLMs with external, domain-specific knowledge sources to improve factual grounding and response reliability. Frameworks like Almanac in clinical settings and KEAG in complex QA tasks demonstrate how RAG reduces hallucinations, enhances interpretability, and delivers accurate, evidence-backed responses. In healthcare, combining LLMs with RAG has raised accuracy from around 93.25 percent up to 99.25 percent, showing its impact on real-world decision support. This paper proposes a structured synthesis of advancements, challenges, and optimization strategies in RAG for specialized domains, paving the way for safer, transparent, and adaptive AI systems.
Retrieval Augmented Generation, Large Language Models, Fine Tuning, Maximum Marginal Relevance Retrieval, Neural Generative Question Answering.