Generative AI Agents at Enterprise Scale: Architecting RAG-Enhanced LLM Systems for Production Deployment
DOI:
https://doi.org/10.70153/IJCMI/2025.17303Keywords:
Generative AI Agents, Retrieval-Augmented Generation, Enterprise LLM Systems, Multi-Agent Orchestration, Vector Databases, AI GovernanceAbstract
The rapid evolution of Large Language Models (LLMs) has catalyzed a fundamental shift in enterprise AI capabilities, enabling organizations to deploy intelligent agents that combine generative AI with
retrieval-augmented generation (RAG) for autonomous decision-making and task execution. This paper presents
a comprehensive framework for architecting and deploying generative AI agents at enterprise scale, addressing the
unique challenges of production environments including data sovereignty, system reliability, and operational governance. We examine how RAG architectures mitigate LLM limitations by dynamically incorporating domainspecific knowledge from enterprise repositories, achieving significant improvements in response accuracy and
contextual relevance. The study details multi-layered architecture patterns encompassing agent orchestration,
memory systems, tool integration, and feedback loops essential for sustained performance. Our analysis covers
critical implementation dimensions including vector database optimization, chunking strategies, hybrid search
mechanisms, and semantic caching techniques that enable sub-second response times at scale. Security and
compliance frameworks are explored, including role-based access control, data lineage tracking, and audit mechanisms required for regulated industries. Performance benchmarking across financial services, healthcare, and
manufacturing deployments reveals 45-65% accuracy improvements over baseline LLMs and 30-55% reduction in
operational overhead through intelligent automation. We address scalability bottlenecks, cost optimization strategies, and integration patterns for legacy enterprise systems. The paper concludes with architectural blueprints,
best practices for iterative deployment, and a maturity model guiding organizations from proof-of-concept to
full-scale production systems capable of handling millions of daily interactions while maintaining governance,
explainability, and continuous improvement capabilities
Downloads
References
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot
learners. Advances in Neural Information Processing Systems. 2020;33:1877-1901.
Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. arXiv preprint
arXiv:2303.18223. 2023.
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for
knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems. 2020;33:9459-9474.[4] Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, et al. Retrieval-augmented generation for large language models:
A survey. arXiv preprint arXiv:2312.10997. 2023.
Xi Z, Chen W, Guo X, He W, Ding Y, Hong B, et al. The rise and potential of large language model based
agents: A survey. arXiv preprint arXiv:2309.07864. 2023.
Wang L, Ma C, Feng X, Zhang Z, Yang H, Zhang J, et al. A survey on large language model based autonomous
agents. arXiv preprint arXiv:2308.11432. 2023.
Zhu Y, Wang X, Chen J, Qiao S, Ou Y, Yao Y, et al. LLMs for knowledge graph construction and reasoning:
Recent capabilities and future opportunities. arXiv preprint arXiv:2305.13168. 2023.
Mialon G, Dess`ı R, Lomeli M, Nalmpantis C, Pasunuru R, Raileanu R, et al. Augmented language models:
a survey. arXiv preprint arXiv:2302.07842. 2023.
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. On the opportunities and risks
of foundation models. arXiv preprint arXiv:2108.07258. 2021.
Liu X, Yu H, Zhang H, Xu Y, Lei X, Lai H, et al. AgentBench: Evaluating LLMs as agents. arXiv preprint
arXiv:2308.03688. 2023.
Sumers TR, Yao S, Narasimhan K, Griffiths TL. Cognitive architectures for language agents. arXiv preprint
arXiv:2309.02427. 2023.
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, et al. PaLM: Scaling language modeling
with pathways. arXiv preprint arXiv:2204.02311. 2022.
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and
fine-tuned chat models. arXiv preprint arXiv:2307.09288. 2023.
Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. Technical Report. 2024.
Asai A, Wu Z, Wang Y, Sil A, Hajishirzi H. Self-RAG: Learning to retrieve, generate, and critique through
self-reflection. arXiv preprint arXiv:2310.11511. 2023.
Shi W, Min S, Yasunaga M, Seo M, James R, Lewis M, et al. REPLUG: Retrieval-augmented black-box
language models. arXiv preprint arXiv:2301.12652. 2024.
Johnson J, Douze M, J´egou H. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data.
;7(3):535-547.
Malkov YA, Yashunin DA. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020;42(4):824-
Douze M, Guzhva A, Deng C, Johnson J, Szilvasy G, Mazar´e PE, et al. The Faiss library. arXiv preprint
arXiv:2401.08281. 2024.
Park JS, O’Brien JC, Cai CJ, Morris MR, Liang P, Bernstein MS. Generative agents: Interactive simulacra
of human behavior. arXiv preprint arXiv:2304.03442. 2023.





