Agentic AI Meets Data Engineering: Toward Self-Directed, Interpretable, and Balanced Pipelines
DOI:
https://doi.org/10.70153/IJCMI/2025.17202Keywords:
Agentic AI, Neuro-Symbolic Learning, Generative Data Engineering, AutoML, Data Pipeline Automation, Cognitive Systems, Zero-Shot Adaptation, Explainable AIAbstract
The Agentic AI framework represents a significant advancement in data engineering by unifying automation, interpretability, and adaptability within a single intelligent system. Unlike traditional approaches, which struggle to autonomously adjust to dynamic data environments or provide transparent reasoning, Agentic AI integrates four synergistic agents: an AutoML agent for model selection and tuning, a neuro-symbolic agent for interpretable inference, a generative agent leveraging GANs for rare event synthesis, and an agentic planner that dynamically orchestrates decisions using reinforcement learning. Experimental evaluation on diverse datasets including credit card fraud detection, breast cancer diagnosis, and industrial sensor failure demonstrated the framework’s superior performance, achieving an F1-score of 0.91, 94% rule fidelity, and a reduced adaptation time of 85 seconds. These results significantly surpass baseline AutoML, standalone neuro-symbolic systems, and GAN-based models. The generative component improved minority class representation, while the neuro-symbolic engine provided rule-based explanations closely aligned with the model’s predictions. The agentic planner enabled real-time model drift detection and automatic retraining, ensuring continuous optimization. Collectively, this framework offers a powerful, self-improving pipeline capable of transforming static data processes into adaptive, goal-directed, and explainable workflows suitable for real-world, high-stakes applications.
Downloads
References
Peddisetti, S. (2021). AutoML meets big data: A framework for intelligent and automated predictive modelling. International Journal of Information and Electronics Engineering, 11(4), 46–57.
Peddisetti, S. (2022). Neuro-symbolic data engineering: A hybrid intelligence framework for interpretable and adaptive data pipelines. International Journal of Computer Artificial Intelligence, 3(1), 55–61.
Peddisetti, S. (2023). AI-driven data engineering: Streamlining data pipelines for seamless automation in modern analytics. International Journal of Computer and Mathematical Ideas, 15(1), 1066–1075.
Peddisetti, S. (2024). Generative data engineering for the unseen: Harnessing GANs for rare event synthesis. International Journal of Engineering and Computer Science, 6(1), 71–76.
Arrieta, A. B., et al. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges. Information Fusion, 58, 82–115.
Bengio, Y., et al. (2019). Meta-transfer learning for zero-shot super-resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3516–3525.
Chen, M., et al. (2023). AgenticPipe: Autonomous data pipeline orchestration using multi-agent reinforcement learning. Data Mining and Knowledge Discovery, 37(5), 1892–1915.
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning.
Feurer, M., et al. (2015). Efficient and robust automated machine learning. Advances in Neural Information Processing Systems, 28, 2962–2970.
Goodfellow, I., et al. (2016). NIPS 2016 tutorial: Generative adversarial networks.
Gupta, A., et al. (2024). Self-adapting neuro-symbolic pipelines for dynamic IoT data environments. IEEE Transactions on Knowledge and Data Engineering.
Hutter, F., et al. (2019). Automated machine learning: Methods, systems, challenges. Springer.
Kambatla, K., et al. (2018). Towards cognitive data pipelines: A survey. ACM Computing Surveys, *51*(3), 1–36.
Liu, X., et al. (2022). GANBLR: Generative adversarial networks for biased and limited data repair. Proceedings of the VLDB Endowment, 15(11), 2803–2816.
Marcus, G. (2020). The next decade in AI: Four steps towards robust artificial intelligence.
Metz, L., et al. (2023). Zero-shot pipeline adaptation for edge data streams. Proceedings of the 29th ACM SIGKDD Conference, 3210–3221.
Mohseni, S., et al. (2021). A multidisciplinary survey on multimodal explainable AI. IEEE Access, 9, 153478–153496.
Nargesian, F., et al. (2017). Learning feature engineering for classification. Proceedings of the 26th IJCAI, 2529–2535.
Pawlowski, N., et al. (2020). Deep structural causal models for debiasing data engineering. Proceedings of the Conference on Causal Learning and Reasoning, 850–861.
Ribeiro, M. T., et al. (2016). "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD, 1135–1144.
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions. Nature Machine Intelligence, 1(5), 206–215.
Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84–90.
Sun, C., et al. (2019). Adaptive data augmentation for imbalanced class learning. IEEE Winter Conference on Applications of Computer Vision, 1587–1596.
Kavala, Y. (2022). Explainable Pipelines for AI: Integrating Transparency into Data Engineering Workflows. International Journal of Computational Mathematical Ideas (IJCMI), 14(1), 14322-14334.
Wang, K., et al. (2021). AutoGluon: AutoML for text, image, and tabular data. Proceedings of the 27th ACM SIGKDD, 3869–3870.
Zhang, A., et al. (2023). Agentic data augmentation: Dynamic GANs for rare-event synthesis in financial fraud detection. Expert Systems with Applications, 216, 119432.
Zheng, X., et al. (2017). Automated feature engineering: Learning to learn from data. IEEE ICDM Workshops, 41–46.
Medisetty, A. (2021). Intelligent Data Flow Automation for AI Systems via Advanced Engineering Practices. International Journal of Computational Mathematical Ideas (IJCMI), 13(1), 957-968.
Baltrusaitis, T., et al. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, *41*(2), 423–443.
Chinthalapally, A. R. (2023). Blockchain and AI Convergence: Creating Explainable, Auditable, and Immutable Data Ecosystems. International Journal of Computational Mathematical Ideas (IJCMI), 15(1), 1233-1247.
Shylaja. (2021). Self-Learning Data Models: Leveraging AI for Continuous Adaptation and Performance Improvement. International Journal of Computational Mathematical Ideas (IJCMI), 13(1), 969-981.
Deng, L. (2018). The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), 141–142.
Gulrajani, I., et al. (2017). Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems, 30.
Singamsetty, S. (2022). EdgeNexus: Bridging AI and Data Engineering for Seamless Edge Computing tojqi. net .13(1),2343-2351.
Kraska, T., et al. (2018). The case for learned index structures. Proceedings of the 2018 ACM SIGMOD, 489–504.
Lake, B. M., et al. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.
Lakkaraju, H., et al. (2020). Faithful and customizable explanations of black box models. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 131–138.
LeCun, Y., et al. (2015). Deep learning. Nature, 521(7553), 436–444.
Singamsetty, S. (2021). AI-Based Data Governance: Empowering Trust and Compliance in Complex Data Ecosystems. International Journal of Computational Mathematical Ideas (IJCMI), 13(1), 1007-1017.
Satyanarayana, S., Tayar, Y., & Prasad, R. S. R. (2019). Efficient DANNLO classifier for multi-class imbalanced data on Hadoop. International Journal of Information Technology, 11, 321-329.