Issue

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game
Corresponding Author(s) : Marvin Yonathan Hadiyanto
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 2, May 2026
Abstract
Training multi-agent reinforcement learning (MARL) systems often requires a significant amount of time due to sample inefficiency, particularly when agents must perform extensive exploration in complex environments and coordinate among multiple entities. This study proposes the use of imitation learning to accelerate the MARL training process in a 2v2 pong game by leveraging demonstrations from a 1v1 pong game to shape the initial policy without undergoing inefficient exploration procedures. We employ a deep Q-network (DQN) framework with centralized training and decentralized execution (CTDE) to compare the performance of pretrained and untrained agents in the 2v2 pong environment. Experimental results show that learning from demonstrations in the 1v1 setting improves reward accumulation and game scores of pretrained agents in the 2v2 pong game. The performance improvement peaks at 700 demonstration learning steps and diminishes at larger learning steps due to excessive memorization of the demonstration gameplay. Furthermore, comparative experiments demonstrate that imitation learning with 700 learning steps achieves learning efficiency improvements of approximately 300% and 571% compared to the zonation method and standard reinforcement learning pretraining, respectively. These results indicate that imitation learning from demonstrations can effectively reduce the prolonged training process in MARL, offering a viable solution, particularly when data collection, computational resources, and training time are severely constrained.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017. https://doi.org/10.1109/MSP.2017.2743240
- R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020. https://doi.org/10.1016/j.compchemeng.2020.106886
- S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. https://doi.org/10.1007/978-3-319-56991-8_32
- J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025.
- J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021. https://doi.org/10.1609/aaai.v35i12.17300
- V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.
- S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. https://doi.org/10.1109/ROBIO.2012.6491170
- A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018. https://doi.org/10.1145/3054912
- J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016.
- B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017. https://doi.org/10.1109/TNNLS.2016.2543000
- S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. https://doi.org/10.48550/arXiv.1011.0686
- M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024. https://doi.org/10.1109/TCYB.2024.3395626
- T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018. https://doi.org/10.1609/aaai.v32i1.11757
- Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. https://doi.org/10.48550/arXiv.1802.05313
- T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. https://doi.org/10.48550/arXiv.2308.10188
- P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023. https://doi.org/10.1109/ACCESS.2023.3282168
- Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025. https://doi.org/10.1109/TG.2025.3588809
- L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022. https://doi.org/10.1109/TITS.2022.3144867
- S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022. https://doi.org/10.1155/2022/4384954
- R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. https://doi.org/10.1109/IROS.2018.8593758
- K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. https://doi.org/10.48550/arXiv.1911.10635
- P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. https://doi.org/10.48550/arXiv.2107.14316
- J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002. https://doi.org/10.1016/S0968-090X(02)00030-X
- Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. https://doi.org/10.48550/arXiv.2305.17352
- M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023. https://doi.org/10.26877/asset.v6i1.17562
References
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017. https://doi.org/10.1109/MSP.2017.2743240
R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020. https://doi.org/10.1016/j.compchemeng.2020.106886
S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. https://doi.org/10.1007/978-3-319-56991-8_32
J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025.
J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021. https://doi.org/10.1609/aaai.v35i12.17300
V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.
S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. https://doi.org/10.1109/ROBIO.2012.6491170
A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018. https://doi.org/10.1145/3054912
J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016.
B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017. https://doi.org/10.1109/TNNLS.2016.2543000
S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. https://doi.org/10.48550/arXiv.1011.0686
M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024. https://doi.org/10.1109/TCYB.2024.3395626
T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018. https://doi.org/10.1609/aaai.v32i1.11757
Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. https://doi.org/10.48550/arXiv.1802.05313
T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. https://doi.org/10.48550/arXiv.2308.10188
P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023. https://doi.org/10.1109/ACCESS.2023.3282168
Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025. https://doi.org/10.1109/TG.2025.3588809
L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022. https://doi.org/10.1109/TITS.2022.3144867
S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022. https://doi.org/10.1155/2022/4384954
R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. https://doi.org/10.1109/IROS.2018.8593758
K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. https://doi.org/10.48550/arXiv.1911.10635
P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. https://doi.org/10.48550/arXiv.2107.14316
J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002. https://doi.org/10.1016/S0968-090X(02)00030-X
Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. https://doi.org/10.48550/arXiv.2305.17352
M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023. https://doi.org/10.26877/asset.v6i1.17562