
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game
Corresponding Author(s) : Marvin Yonathan Hadiyanto
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 2, May 2026 (Article in Progress)
Abstract
Training multi-agent reinforcement learning (MARL) systems often requires a significant amount of time due to sample inefficiency, particularly where agents need to do a considerable amount of exploration in a complex environment and coordination among multiple entities. This study proposes the use of imitation learning to accelerate the MARL training process in a 2v2 pong game by learning from demonstrations in 1v1 pong game to shape the initial policy without undergoing inefficient exploration procedure. We use deep Q-network (DQN) with centralized training with decentralized execution (CTDE) to observe the difference of performance between pretrained and untrained agents in 2v2 pong game. Experimental results show that learning from demonstration in 1v1 setting significantly improved reward accumulation and game scores of pretrained agent in 2v2 pong game. The improvement peaks at 700 learning steps of demonstration and diminishes at the larger learning steps due to excessive memorization of the demonstration gameplay. This work demonstrates that imitation learning from demonstrations can be used to reduce a prolonged training process in MARL, offering a viable solution especially when data collecting, computational resources, and training are the severely constrained.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.
- R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.
- S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. Available: 10.1007/978-3-319-56991-8_32.
- J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/f02208a057804ee16ac72ff4d3cec53b-Paper.pdf.
- J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021, doi: 10.1609/aaai.v35i12.17300.
- V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.
- S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. doi: 10.1109/ROBIO.2012.6491170.
- A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018, doi: 10.1145/3054912.
- J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016. Available: https://papers.nips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.
- B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017, doi: 10.1109/TNNLS.2016.2543000.
- S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. Available: https://arxiv.org/abs/1011.0686.
- M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024, doi: 10.1109/TCYB.2024.3395626.
- T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11757.
- Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. Available: https://arxiv.org/abs/1802.05313.
- T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. Available: https://arxiv.org/abs/2308.10188.
- P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023, doi: 10.1109/ACCESS.2023.3282168.
- Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025, doi: 10.1109/TG.2025.3588809.
- L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022, doi: 10.1109/TITS.2022.3144867.
- S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/4384954.
- R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. doi: 10.1109/IROS.2018.8593758.
- K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. Available: https://arxiv.org/abs/1911.10635.
- P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. Available: https://arxiv.org/abs/2107.14316.
- J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002, doi: 10.1016/S0968-090X(02)00030-X.
- Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. Available: https://arxiv.org/abs/2305.17352.
- M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023, doi: 10.26877/asset.v6i1.17562.
References
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.
R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.
S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. Available: 10.1007/978-3-319-56991-8_32.
J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/f02208a057804ee16ac72ff4d3cec53b-Paper.pdf.
J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021, doi: 10.1609/aaai.v35i12.17300.
V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.
S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. doi: 10.1109/ROBIO.2012.6491170.
A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018, doi: 10.1145/3054912.
J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016. Available: https://papers.nips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.
B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017, doi: 10.1109/TNNLS.2016.2543000.
S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. Available: https://arxiv.org/abs/1011.0686.
M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024, doi: 10.1109/TCYB.2024.3395626.
T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11757.
Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. Available: https://arxiv.org/abs/1802.05313.
T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. Available: https://arxiv.org/abs/2308.10188.
P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023, doi: 10.1109/ACCESS.2023.3282168.
Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025, doi: 10.1109/TG.2025.3588809.
L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022, doi: 10.1109/TITS.2022.3144867.
S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/4384954.
R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. doi: 10.1109/IROS.2018.8593758.
K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. Available: https://arxiv.org/abs/1911.10635.
P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. Available: https://arxiv.org/abs/2107.14316.
J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002, doi: 10.1016/S0968-090X(02)00030-X.
Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. Available: https://arxiv.org/abs/2305.17352.
M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023, doi: 10.26877/asset.v6i1.17562.