Issue

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Optimizing Autonomous Navigation: Advances in LiDAR-based Object Recognition with Modified Voxel-RCNN
Corresponding Author(s) : Firman
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 10, No. 2, May 2025
Abstract
This study aimed to enhance the object recognition capabilities of autonomous vehicles in constrained and dynamic environments. By integrating Light Detection and Ranging (LiDAR) technology with a modified Voxel-RCNN framework, the system detected and classified six object classes: human, wall, car, cyclist, tree, and cart. This integration improved the safety and reliability of autonomous navigation. The methodology included the preparation of a point cloud dataset, conversion into the KITTI format for compatibility with the Voxel-RCNN pipeline, and comprehensive model training. The framework was evaluated using metrics such as precision, recall, F1-score, and mean average precision (mAP). Modifications to the Voxel-RCNN framework were introduced to improve classification accuracy, addressing challenges encountered in complex navigation scenarios. Experimental results demonstrated the robustness of the proposed modifications. Modification 2 consistently outperformed the baseline, with 3D detection scores for the car class in hard scenarios increasing from 4.39 to 10.31. Modification 3 achieved the lowest training loss of 1.68 after 600 epochs, indicating significant improvements in model optimization. However, variability in the real-world performance of Modification 3 highlighted the need for balancing optimized training with practical applicability. Overall, the study found that the training loss decreased up to 29.1% and achieved substantial improvements in detection accuracy under challenging conditions. These findings underscored the potential of the proposed system to advance the safety and intelligence of autonomous vehicles, providing a solid foundation for future research in autonomous navigation and object recognition.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- S. Thrun, “Toward Robotic Cars,” Commun. ACM, vol. 53, pp. 99–106, Apr. 2010, doi: 10.1145/1721654.1721679.
- Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
- C. Szegedy et al., “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9. doi: 10.1109/CVPR.2015.7298594.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 3354–3361, 2012, doi: 10.1109/CVPR.2012.6248074.
- H. Caesar et al., “Nuscenes: A multimodal dataset for autonomous driving,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., no. March, pp. 11618–11628, 2020, doi: 10.1109/CVPR42600.2020.01164.
- Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 4490–4499, 2018, doi: 10.1109/CVPR.2018.00472.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 77–85, 2017, doi: 10.1109/CVPR.2017.16.
- C. R. Qi, L. Yi, H. Su, and L. Guibas, “1-PointNet++: Deep Hierarchical Feature Learning on,” NIPS’17 Proc. 31st Int. Conf. Neural Inf. Process. Syst., no. Dec, pp. 5105–5114, 2017.
- M. Simon, S. Milz, K. Amende, and H. M. Gross, “Complex-YOLO: An euler-region-proposal for real-time 3D object detection on point clouds,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11129 LNCS, pp. 197–209, 2019, doi: 10.1007/978-3-030-11009-3_11.
- Z. Yang, Y. Sun, S. Liu, and J. Jia, “3DSSD: Point-based 3d single stage object detector,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 11037–11045, 2020, doi: 10.1109/CVPR42600.2020.01105.
- S. Shi, C. Guo, and H. Li, “PV-RCNN : Point-Voxel Feature Set Abstraction for 3D Object Detection”.
- J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel R-CNN : Towards High Performance Voxel-based 3D Object Detection,” 2020.
- H. Zhao, L. Jiang, J. Jia, P. Torr, and V. Koltun, “Point Transformer,” Proc. IEEE Int. Conf. Comput. Vis., pp. 16239–16248, 2021, doi: 10.1109/ICCV48922.2021.01595.
- Y. Li et al., “Li_DeepFusion_Lidar-Camera_Deep_Fusion_for_Multi-Modal_3D_Object_Detection_CVPR_2022_paper,” no. Figure 1, pp. 17182–17191, 2021.
- F. Engelmann, T. Kontogianni, J. Schult, and B. Leibe, “Know what your neighbors do: 3D semantic segmentation of point clouds,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11131 LNCS, pp. 395–409, 2019, doi: 10.1007/978-3-030-11015-4_29.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
- F. Z. Ouadiay, H. Bouftaih, E. H. Bouyakhf, and M. M. Himmi, “Simultaneous object detection and localization using convolutional neural networks,” 2018 Int. Conf. Intell. Syst. Comput. Vision, ISCV 2018, vol. 2018-May, no. April, pp. 1–8, 2018, doi: 10.1109/ISACV.2018.8354045.
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 4489–4497, 2015, doi: 10.1109/ICCV.2015.510.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
- T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020, doi: 10.1109/TPAMI.2018.2858826.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.
- N. L. W. Keijsers, “Neural Networks,” Encycl. Mov. Disord. Three-Volume Set, pp. V2-257-V2-259, 2010, doi: 10.1016/B978-0-12-374105-9.00493-7.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.
- J. Shin, J. Kim, K. Lee, H. Cho, and W. Rhee, “Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion,” Proc. 37th AAAI Conf. Artif. Intell. AAAI 2023, vol. 37, pp. 2282–2291, 2023, doi: 10.1609/aaai.v37i2.25323.
- A. Dosovitskiy et al., “an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale,” ICLR 2021 - 9th Int. Conf. Learn. Represent., 2021.
- D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, 2015.
- Y. Bengio, Learning deep architectures for AI, vol. 2, no. 1. 2009. doi: 10.1561/2200000006.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.
- J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.
- Z. Chao, F. Pu, Y. Yin, B. Han, and X. Chen, “Research on real-time local rainfall prediction based on MEMS sensors,” J. Sensors, vol. 2018, pp. 1–9, 2018, doi: 10.1155/2018/6184713.
- G. Cohen and R. Giryes, “Generative Adversarial Networks,” Mach. Learn. Data Sci. Handb. Data Min. Knowl. Discov. Handbook, Third Ed., pp. 375–400, 2023, doi: 10.1007/978-3-031-24628-9_17.
- Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one-stage object detection,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 9626–9635, 2019, doi: 10.1109/ICCV.2019.00972.
- Q. Zhong and X.-F. Han, “Point Cloud Learning with Transformer,” 2021, [Online]. Available: http://arxiv.org/abs/2104.13636
- A. Howard et al., “Searching for MobileNetV3,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1314–1324. doi: 10.1109/ICCV.2019.00140.
References
S. Thrun, “Toward Robotic Cars,” Commun. ACM, vol. 53, pp. 99–106, Apr. 2010, doi: 10.1145/1721654.1721679.
Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
C. Szegedy et al., “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9. doi: 10.1109/CVPR.2015.7298594.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91.
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI vision benchmark suite,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 3354–3361, 2012, doi: 10.1109/CVPR.2012.6248074.
H. Caesar et al., “Nuscenes: A multimodal dataset for autonomous driving,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., no. March, pp. 11618–11628, 2020, doi: 10.1109/CVPR42600.2020.01164.
Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 4490–4499, 2018, doi: 10.1109/CVPR.2018.00472.
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 77–85, 2017, doi: 10.1109/CVPR.2017.16.
C. R. Qi, L. Yi, H. Su, and L. Guibas, “1-PointNet++: Deep Hierarchical Feature Learning on,” NIPS’17 Proc. 31st Int. Conf. Neural Inf. Process. Syst., no. Dec, pp. 5105–5114, 2017.
M. Simon, S. Milz, K. Amende, and H. M. Gross, “Complex-YOLO: An euler-region-proposal for real-time 3D object detection on point clouds,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11129 LNCS, pp. 197–209, 2019, doi: 10.1007/978-3-030-11009-3_11.
Z. Yang, Y. Sun, S. Liu, and J. Jia, “3DSSD: Point-based 3d single stage object detector,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 11037–11045, 2020, doi: 10.1109/CVPR42600.2020.01105.
S. Shi, C. Guo, and H. Li, “PV-RCNN : Point-Voxel Feature Set Abstraction for 3D Object Detection”.
J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel R-CNN : Towards High Performance Voxel-based 3D Object Detection,” 2020.
H. Zhao, L. Jiang, J. Jia, P. Torr, and V. Koltun, “Point Transformer,” Proc. IEEE Int. Conf. Comput. Vis., pp. 16239–16248, 2021, doi: 10.1109/ICCV48922.2021.01595.
Y. Li et al., “Li_DeepFusion_Lidar-Camera_Deep_Fusion_for_Multi-Modal_3D_Object_Detection_CVPR_2022_paper,” no. Figure 1, pp. 17182–17191, 2021.
F. Engelmann, T. Kontogianni, J. Schult, and B. Leibe, “Know what your neighbors do: 3D semantic segmentation of point clouds,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11131 LNCS, pp. 395–409, 2019, doi: 10.1007/978-3-030-11015-4_29.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
F. Z. Ouadiay, H. Bouftaih, E. H. Bouyakhf, and M. M. Himmi, “Simultaneous object detection and localization using convolutional neural networks,” 2018 Int. Conf. Intell. Syst. Comput. Vision, ISCV 2018, vol. 2018-May, no. April, pp. 1–8, 2018, doi: 10.1109/ISACV.2018.8354045.
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 4489–4497, 2015, doi: 10.1109/ICCV.2015.510.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020, doi: 10.1109/TPAMI.2018.2858826.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.
N. L. W. Keijsers, “Neural Networks,” Encycl. Mov. Disord. Three-Volume Set, pp. V2-257-V2-259, 2010, doi: 10.1016/B978-0-12-374105-9.00493-7.
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.
J. Shin, J. Kim, K. Lee, H. Cho, and W. Rhee, “Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion,” Proc. 37th AAAI Conf. Artif. Intell. AAAI 2023, vol. 37, pp. 2282–2291, 2023, doi: 10.1609/aaai.v37i2.25323.
A. Dosovitskiy et al., “an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale,” ICLR 2021 - 9th Int. Conf. Learn. Represent., 2021.
D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, 2015.
Y. Bengio, Learning deep architectures for AI, vol. 2, no. 1. 2009. doi: 10.1561/2200000006.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.
Z. Chao, F. Pu, Y. Yin, B. Han, and X. Chen, “Research on real-time local rainfall prediction based on MEMS sensors,” J. Sensors, vol. 2018, pp. 1–9, 2018, doi: 10.1155/2018/6184713.
G. Cohen and R. Giryes, “Generative Adversarial Networks,” Mach. Learn. Data Sci. Handb. Data Min. Knowl. Discov. Handbook, Third Ed., pp. 375–400, 2023, doi: 10.1007/978-3-031-24628-9_17.
Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional one-stage object detection,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 9626–9635, 2019, doi: 10.1109/ICCV.2019.00972.
Q. Zhong and X.-F. Han, “Point Cloud Learning with Transformer,” 2021, [Online]. Available: http://arxiv.org/abs/2104.13636
A. Howard et al., “Searching for MobileNetV3,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1314–1324. doi: 10.1109/ICCV.2019.00140.