Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game

Marvin Yonathan Hadiyanto; Budi Harsono; Indra Karnadi; Ivan Tanra

doi:10.22219/kinetik.v11i2.2564

Issue

Vol. 11, No. 2, May 2026 (Article in Progress)

Issue Published : Apr 26, 2026

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game

https://doi.org/10.22219/kinetik.v11i2.2564

Marvin Yonathan Hadiyanto

Universitas Kristen Krida Wacana

Budi Harsono

Universitas Kristen Krida Wacana

Indra Karnadi

Universitas Kristen Krida Wacana

Ivan Tanra

Universitas Kristen Krida Wacana

Corresponding Author(s) : Marvin Yonathan Hadiyanto

marvin.yonathan@ukrida.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 11, No. 2, May 2026 (Article in Progress)
Article Published : May 3, 2026

Abstract

Training multi-agent reinforcement learning (MARL) systems often requires a significant amount of time due to sample inefficiency, particularly where agents need to do a considerable amount of exploration in a complex environment and coordination among multiple entities. This study proposes the use of imitation learning to accelerate the MARL training process in a 2v2 pong game by learning from demonstrations in 1v1 pong game to shape the initial policy without undergoing inefficient exploration procedure. We use deep Q-network (DQN) with centralized training with decentralized execution (CTDE) to observe the difference of performance between pretrained and untrained agents in 2v2 pong game. Experimental results show that learning from demonstration in 1v1 setting significantly improved reward accumulation and game scores of pretrained agent in 2v2 pong game. The improvement peaks at 700 learning steps of demonstration and diminishes at the larger learning steps due to excessive memorization of the demonstration gameplay. This work demonstrates that imitation learning from demonstrations can be used to reduce a prolonged training process in MARL, offering a viable solution especially when data collecting, computational resources, and training are the severely constrained.

Keywords

Imitation learning MARL Reinforcement learning Pong game Learning from demonstration

Hadiyanto, M. Y., Harsono, B., Karnadi, I., & Tanra, I. (2026). Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game . Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 11(2). https://doi.org/10.22219/kinetik.v11i2.2564

Download Citation

References

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.
R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.
S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. Available: 10.1007/978-3-319-56991-8_32.
J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/f02208a057804ee16ac72ff4d3cec53b-Paper.pdf.
J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021, doi: 10.1609/aaai.v35i12.17300.
V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.
S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. doi: 10.1109/ROBIO.2012.6491170.
A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018, doi: 10.1145/3054912.
J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016. Available: https://papers.nips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.
B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017, doi: 10.1109/TNNLS.2016.2543000.
S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. Available: https://arxiv.org/abs/1011.0686.
M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024, doi: 10.1109/TCYB.2024.3395626.
T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11757.
Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. Available: https://arxiv.org/abs/1802.05313.
T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. Available: https://arxiv.org/abs/2308.10188.
P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023, doi: 10.1109/ACCESS.2023.3282168.
Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025, doi: 10.1109/TG.2025.3588809.
L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022, doi: 10.1109/TITS.2022.3144867.
S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/4384954.
R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. doi: 10.1109/IROS.2018.8593758.
K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. Available: https://arxiv.org/abs/1911.10635.
P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. Available: https://arxiv.org/abs/2107.14316.
J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002, doi: 10.1016/S0968-090X(02)00030-X.
Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. Available: https://arxiv.org/abs/2305.17352.
M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023, doi: 10.26877/asset.v6i1.17562.

References

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.

R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.

S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. Available: 10.1007/978-3-319-56991-8_32.

J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/f02208a057804ee16ac72ff4d3cec53b-Paper.pdf.

J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021, doi: 10.1609/aaai.v35i12.17300.

V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.

S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. doi: 10.1109/ROBIO.2012.6491170.

A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018, doi: 10.1145/3054912.

J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016. Available: https://papers.nips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.

B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017, doi: 10.1109/TNNLS.2016.2543000.

S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. Available: https://arxiv.org/abs/1011.0686.

M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024, doi: 10.1109/TCYB.2024.3395626.

T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11757.

Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. Available: https://arxiv.org/abs/1802.05313.

T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. Available: https://arxiv.org/abs/2308.10188.

P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023, doi: 10.1109/ACCESS.2023.3282168.

Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025, doi: 10.1109/TG.2025.3588809.

L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022, doi: 10.1109/TITS.2022.3144867.

S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/4384954.

R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. doi: 10.1109/IROS.2018.8593758.

K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. Available: https://arxiv.org/abs/1911.10635.

P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. Available: https://arxiv.org/abs/2107.14316.

J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002, doi: 10.1016/S0968-090X(02)00030-X.

Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. Available: https://arxiv.org/abs/2305.17352.

M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023, doi: 10.26877/asset.v6i1.17562.

Author biographies is not available.

Issue

Vol. 11, No. 2, May 2026 (Article in Progress)

Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game

Corresponding Author(s) : Marvin Yonathan Hadiyanto

Abstract

Keywords

Download Citation

References

Downloads