Quick jump to page content
  • Main Navigation
  • Main Content
  • Sidebar

  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  • Register
  • Login
  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  1. Home
  2. Archives
  3. Vol. 11, No. 2, May 2026 (Article in Progress)
  4. Articles

Issue

Vol. 11, No. 2, May 2026 (Article in Progress)

Issue Published : Apr 26, 2026
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game

https://doi.org/10.22219/kinetik.v11i2.2564
Marvin Yonathan Hadiyanto
Universitas Kristen Krida Wacana
Budi Harsono
Universitas Kristen Krida Wacana
Indra Karnadi
Universitas Kristen Krida Wacana
Ivan Tanra
Universitas Kristen Krida Wacana

Corresponding Author(s) : Marvin Yonathan Hadiyanto

marvin.yonathan@ukrida.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 11, No. 2, May 2026 (Article in Progress)
Article Published : May 3, 2026

Share
WA Share on Facebook Share on Twitter Pinterest Email Telegram
  • Abstract
  • Cite
  • References
  • Authors Details

Abstract

Training multi-agent reinforcement learning (MARL) systems often requires a significant amount of time due to sample inefficiency, particularly where agents need to do a considerable amount of exploration in a complex environment and coordination among multiple entities. This study proposes the use of imitation learning to accelerate the MARL training process in a 2v2 pong game by learning from demonstrations in 1v1 pong game to shape the initial policy without undergoing inefficient exploration procedure. We use deep Q-network (DQN) with centralized training with decentralized execution (CTDE) to observe the difference of performance between pretrained and untrained agents in 2v2 pong game. Experimental results show that learning from demonstration in 1v1 setting significantly improved reward accumulation and game scores of pretrained agent in 2v2 pong game. The improvement peaks at 700 learning steps of demonstration and diminishes at the larger learning steps due to excessive memorization of the demonstration gameplay. This work demonstrates that imitation learning from demonstrations can be used to reduce a prolonged training process in MARL, offering a viable solution especially when data collecting, computational resources, and training are the severely constrained.

Keywords

Imitation learning MARL Reinforcement learning Pong game Learning from demonstration
Hadiyanto, M. Y., Harsono, B., Karnadi, I., & Tanra, I. (2026). Imitation Learning to Accelerate Training Process of Multi-Agent Reinforcement Learning in 2v2 Pong Game . Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 11(2). https://doi.org/10.22219/kinetik.v11i2.2564
  • ACM
  • ACS
  • APA
  • ABNT
  • Chicago
  • Harvard
  • IEEE
  • MLA
  • Turabian
  • Vancouver
Download Citation
Endnote/Zotero/Mendeley (RIS)
BibTeX
References
  1. K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.
  2. R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.
  3. S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. Available: 10.1007/978-3-319-56991-8_32.
  4. J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/f02208a057804ee16ac72ff4d3cec53b-Paper.pdf.
  5. J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021, doi: 10.1609/aaai.v35i12.17300.
  6. V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.
  7. S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. doi: 10.1109/ROBIO.2012.6491170.
  8. A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018, doi: 10.1145/3054912.
  9. J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016. Available: https://papers.nips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.
  10. B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017, doi: 10.1109/TNNLS.2016.2543000.
  11. S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. Available: https://arxiv.org/abs/1011.0686.
  12. M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024, doi: 10.1109/TCYB.2024.3395626.
  13. T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11757.
  14. Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. Available: https://arxiv.org/abs/1802.05313.
  15. T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. Available: https://arxiv.org/abs/2308.10188.
  16. P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023, doi: 10.1109/ACCESS.2023.3282168.
  17. Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025, doi: 10.1109/TG.2025.3588809.
  18. L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022, doi: 10.1109/TITS.2022.3144867.
  19. S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/4384954.
  20. R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. doi: 10.1109/IROS.2018.8593758.
  21. K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. Available: https://arxiv.org/abs/1911.10635.
  22. P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. Available: https://arxiv.org/abs/2107.14316.
  23. J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002, doi: 10.1016/S0968-090X(02)00030-X.
  24. Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. Available: https://arxiv.org/abs/2305.17352.
  25. M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023, doi: 10.26877/asset.v6i1.17562.
Read More

References


K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process Mag, vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.

R. Nian, J. Liu, and B. Huang, “A review On reinforcement learning: Introduction and applications in industrial process control,” Comput Chem Eng, vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.

S. S. Mousavi, M. Schukat, and E. Howley, “Deep Reinforcement Learning: An Overview,” arXiv, Jun. 2018. Available: 10.1007/978-3-319-56991-8_32.

J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal: Curran Associates, Inc., 2018. Accessed: Jul. 02, 2025. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/f02208a057804ee16ac72ff4d3cec53b-Paper.pdf.

J. Zhang, J. Kim, B. O’Donoghue, and S. Boyd, “Sample Efficient Reinforcement Learning with REINFORCE,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 10887–10895, May 2021, doi: 10.1609/aaai.v35i12.17300.

V. Kain et al., “Sample-efficient reinforcement learning for CERN accelerator control,” Physical Review Accelerators and Beams, vol. 23, no. 12, p. 124801, Dec. 2020, doi: 10.1103/PhysRevAccelBeams.23.124801.

S. Raza, S. Haider, and M.-A. Williams, “Teaching coordinated strategies to soccer robots via imitation,” in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, Dec. 2012, pp. 1434–1439. doi: 10.1109/ROBIO.2012.6491170.

A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation Learning: A Survey of Learning Methods,” ACM Comput Surv, vol. 50, no. 2, pp. 1–35, Mar. 2018, doi: 10.1145/3054912.

J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona: Curran Associates, Inc., 2016. Available: https://papers.nips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.

B. Piot, M. Geist, and O. Pietquin, “Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning,” IEEE Trans Neural Netw Learn Syst, vol. 28, no. 8, pp. 1814–1826, Aug. 2017, doi: 10.1109/TNNLS.2016.2543000.

S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligenc, JMLR Workshop and Conference Proceedings, 2011, pp. 627–635. Available: https://arxiv.org/abs/1011.0686.

M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges,” IEEE Trans Cybern, vol. 54, no. 12, pp. 7173–7186, Dec. 2024, doi: 10.1109/TCYB.2024.3395626.

T. Hester et al., “Deep Q-learning From Demonstrations,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11757.

Y. Gao, H. Xu, J. Lin, F. Yu, S. Levine, and T. Darrell, “Reinforcement Learning from Imperfect Demonstrations,” in Proceedings of the 35th International Conference on Machine Learning, May 2019. Available: https://arxiv.org/abs/1802.05313.

T. Viet Bui, T. Mai, and T. Hong Nguyen, “Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games,” in 38th Conference on Neural Information Processing Systems (NeurIPS 2024, Aug. 2023. Available: https://arxiv.org/abs/2308.10188.

P. Brackett, S. Liu, and Y. Liu, “SC-MAIRL: Semi-Centralized Multi-Agent Imitation Reinforcement Learning,” IEEE Access, vol. 11, pp. 57965–57976, 2023, doi: 10.1109/ACCESS.2023.3282168.

Z. Li, Q. Ji, X. Ling, and Q. Liu, “A Comprehensive Review of Multi-Agent Reinforcement Learning in Video Games,” IEEE Trans Games, pp. 1–21, 2025, doi: 10.1109/TG.2025.3588809.

L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A Survey on Imitation Learning Techniques for End-to-End Autonomous Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14128–14147, Sep. 2022, doi: 10.1109/TITS.2022.3144867.

S. Li and W. Guo, “Supervised Reinforcement Learning for ULV Path Planning in Complex Warehouse Environment,” Wirel Commun Mob Comput, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/4384954.

R. P. Bhattacharyya, D. J. Phillips, B. Wulfe, J. Morton, A. Kuefler, and M. J. Kochenderfer, “Multi-Agent Imitation Learning for Driving Simulation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Oct. 2018, pp. 1534–1539. doi: 10.1109/IROS.2018.8593758.

K. Zhang, Z. Yang, and T. Başar, “Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms,” arXiv, Apr. 2021. Available: https://arxiv.org/abs/1911.10635.

P. K. Sharma, E. G. Zaroukian, R. Fernandez, A. Basak, and D. E. Asher, “Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, Jul. 2021. Available: https://arxiv.org/abs/2107.14316.

J. L. Adler and V. J. Blue, “A cooperative multi-agent transportation management and route guidance system,” Transp Res Part C Emerg Technol, vol. 10, no. 5–6, pp. 433–454, Oct. 2002, doi: 10.1016/S0968-090X(02)00030-X.

Y. Zhou et al., “Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?,” arXiv, May 2025. Available: https://arxiv.org/abs/2305.17352.

M. Y. Hadiyanto, B. Harsono, and I. Karnadi, “Zonation Method for Efficient Training of Collaborative Multi-Agent Reinforcement Learning in Double Snake Game,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 02401011, Dec. 2023, doi: 10.26877/asset.v6i1.17562.

Author biographies is not available.
Download this PDF file
Statistic
Read Counter : 61

Downloads

Download data is not yet available.

Quick Link

  • Author Guidelines
  • Download Manuscript Template
  • Peer Review Process
  • Editorial Board
  • Reviewer Acknowledgement
  • Aim and Scope
  • Publication Ethics
  • Licensing Term
  • Copyright Notice
  • Open Access Policy
  • Important Dates
  • Author Fees
  • Indexing and Abstracting
  • Archiving Policy
  • Scopus Citation Analysis
  • Statistic
  • Article Withdrawal

Meet Our Editorial Team

Ir. Amrul Faruq, M.Eng., Ph.D
Editor in Chief
Universitas Muhammadiyah Malang
Google Scholar Scopus
Prof. Robert Lis
Editorial Board
Wrocław University of Science and Technology
Orcid  Scopus
Hanung Adi Nugroho
Editorial Board
Universitas Gadjah Mada
Google Scholar Scopus
Prof. Roman Voliansky
Editorial Board
Dniprovsky State Technical University, Ukraine
Google Scholar Scopus
Read More
 

KINETIK: Game Technology, Information System, Computer Network, Computing, Electronics, and Control
eISSN : 2503-2267
pISSN : 2503-2259


Address

Program Studi Elektro dan Informatika

Fakultas Teknik, Universitas Muhammadiyah Malang

Jl. Raya Tlogomas 246 Malang

Phone 0341-464318 EXT 247

Contact Info

Principal Contact

Amrul Faruq
Phone: +62 812-9398-6539
Email: faruq@umm.ac.id

Support Contact

Fauzi Dwi Setiawan Sumadi
Phone: +62 815-1145-6946
Email: fauzisumadi@umm.ac.id

© 2020 KINETIK, All rights reserved. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License