Moving Objects Semantic Segmentation  using SegNet with VGG Encoder for Autonomous Driving

Wahyudi Setiawan; Kori Cahyono

doi:10.22219/kinetik.v6i2.1203

Issue

Vol. 6, No. 2, May 2021

Issue Published : May 31, 2021

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Moving Objects Semantic Segmentation using SegNet with VGG Encoder for Autonomous Driving

https://doi.org/10.22219/kinetik.v6i2.1203

Wahyudi Setiawan

Universitas Trunojoyo Madura

Kori Cahyono

Badan Penelitian dan Pengembangan Provinsi Riau

Corresponding Author(s) : Wahyudi Setiawan

wsetiawan@trunojoyo.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 6, No. 2, May 2021
Article Published : May 31, 2021

Abstract

Segmentation and recognition become the general steps to identify objects. This research discusses pixel-wise semantic segmentation based on moving objects. The data from the CamVid video which is a collection of autonomous driving images. The image data consist of 701 images accompanied by labels. The segmentation and recognition of 11 objects contained in the image (sky, building, pole, road, pavement, tree, sign-symbol, fence, car, pedestrian and bicyclist) is representing. This moving object segmentation is carried out using SegNet which is one of the Convolutional Neural Network (CNN) methods. Image segmentation on CNN generally consists of two parts: Encoder and Decoder. VGG16 and VGG19 pre-trained networks are used as encoders, while decoders are the upsampling of encoders. Network optimization uses stochastic gradient descent of Momentum (SGDM). The test produces the best recognition was road objects with an accuracy of 0.96013, IoU 0.93745, F1-Score 0.8535 using VGG19 encoder, while when using VGG16 encoder accuracy was 0.94162, IoU 0.92309, and F1-Score 0.8535.

Keywords

Autonomous Driving CamVid SegNet Semantic Segmentation VGG encoder

Setiawan, W., & Cahyono, K. (2021). Moving Objects Semantic Segmentation using SegNet with VGG Encoder for Autonomous Driving. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 6(2). https://doi.org/10.22219/kinetik.v6i2.1203

Download Citation

References

J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440. https://doi.org/10.1109/cvpr.2015.7298965
Y. Li, J. Dai, and X. Ji, “Fully Convolutional Instance-aware Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2359–2367. https://doi.org/10.1109/cvpr.2017.472
R. Girshick, J. Donahue, T. Darrell, U. C. Berkeley, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2–9. https://doi.org/10.1109/CVPR.2014.81
J. Dai, K. He, and J. Sun, “Instance-aware Semantic Segmentation via Multi-task Network Cascades,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3150–3158. https://doi.org/10.1109/cvpr.2014.81
A. Khoreva, R. Benenson, J. Hosang, and M. Hein, “Simple Does It : Weakly Supervised Instance and Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 876–885. https://doi.org/10.1109/CVPR.2017.181
M. Yahiaoui et al., “FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving,” 2019, pp. 1–4.https:// doi.org/10.21427/v1ar-t994
H. Rashed, M. Ramzy, V. Vaquero, A. El Sallab, G. Sistu, and S. Yogamani, “FuseMODNet : Real-Time Camera and LiDAR based Moving Object Detection for robust low-light Autonomous Driving,” in IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 1–10. https://doi.org/10.1109/ICCVW.2019.00293
H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, and Q. Xu, “nuScenes : A multimodal dataset for autonomous driving,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, no. March, pp. 1–16. https://doi.org/10.1109/CVPR42600.2020.01164
G. Gordon, “Social behaviour as an emergent property of embodied curiosity : a robotics perspective,” Philos. Trans. B, vol. 374, pp. 1–7, 2019. https://doi.org/10.1098/rstb.2018.0029
W. Lumchanow and S. Udomsiri, “Image classification of malaria using hybrid algorithms : convolutional neural network and method to find appropriate K for K-Nearest neighbor,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 1, pp. 382–388, 2019. https://doi.org/10.11591/ijeecs.v16.i1.pp382-388
W. Setiawan, M. I. Utoyo, and R. Rulaningtyas, “Transfer learning with multiple pre-trained network for fundus classification,” TELKOMNIKA, vol. 18, no. 3, pp. 1382–1388, 2020. https://doi.org/10.12928/telkomnika.v18i3.14868
M. Siam, V. Sepehr, M. Jagersand, and N. Ray, “Convolutional Gated Recurrent Networks for Video Segmentation,” in IEEE International Conference on Image Processing (ICIP), 2017, pp. 1–5. https://doi.org/10.1109/ICIP.2017.8296851
N. Fatihahsahidan, A. K. Juha, N. Mohammad, and Z. Ibrahim, “Flower and leaf recognition for plant identification using convolutional neural network,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 2, pp. 737–743, 2019. https://doi.org/10.11591/ijeecs.v16.i2.pp737-743
M. Syarief and W. Setiawan, “Convolutional neural network for maize leaf disease image classification,” TELKOMNIKA, vol. 18, no. 3, pp. 1376–1381, 2020. https://doi.org/10.12928/telkomnika.v18i3.14840
G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video : A high-definition ground truth database,” Pattern Recognit. Lett., pp. 1–10, 2008. https://doi.org/10.1016/j.patrec.2008.04.005
C. Yu, J. Wang, C. Peng, C. Gao, and N. Sang, “BiSeNet : Bilateral Segmentation Network for Real-time Semantic Segmentation,” in European COnference on Computer Vision, 2018, pp. 1–17. https://doi.org/10.1007/978-3-030-01261-8_20
J. Simon, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The One Hundred Layers Tiramisu : Fully Convolutional DenseNets for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 11–19. https://doi.org/10.1109/cvprw.2017.156
M. Siam, S. Elkerdawy, and M. Jagersand, “Deep Semantic Segmentation for Automated Driving : Taxonomy , Roadmap and Challenges,” in IEEE International Conference on Intelligent Transportation Systems (ITSC), 2017, pp. 1–8. https://doi.org/10.1109/ITSC.2017.8317714
F. Visin, M. Ciccone, and A. Romero, “ReSeg : A Recurrent Neural Network-based Model for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016, pp. 41–48. https://doi.org/10.1109/CVPRW.2016.60
V. Badrinarayanan, A. Kendall, R. Cipolla, and S. Member, “SegNet : A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 1–14, 2016. https://doi.org/10.1109/TPAMI.2016.2644615
E. Fernandez-moral, R. Martins, D. Wolf, and P. Rives, “A new metric for evaluating semantic segmentation : leveraging global and contour accuracy,” in IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1–8. https://doi.org/10.1109/IVS.2018.8500497
M. Kampffmeyer, A. Salberg, and R. Jenssen, “Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016, pp. 1–9. https://doi.org/10.1109/CVPRW.2016.90
D. Eigen and R. Fergus, “Predicting Depth , Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1–9. https://doi.org/10.1109/ICCV.2015.304
R. Dong, X. Pan, and F. Li, “DenseU-Net-Based Semantic Segmentation of Small Objects in Urban Remote Sensing Images,” IEEE Access, vol. 7, no. June, pp. 65347–65356, 2019. https://doi.org/10.1109/ACCESS.2019.2917952
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets Robotics : The KITTI Dataset,” Int. J. Rob. Res., vol. 32, no. 11, pp. 1–6, 2011. https://doi.org/10.1177/0278364913491297
G. Neuhold, T. Ollmann, S. R. Bul, and P. Kontschieder, “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1–10. https://doi.org/10.1109/ICCV.2017.534

References

J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440. https://doi.org/10.1109/cvpr.2015.7298965

Y. Li, J. Dai, and X. Ji, “Fully Convolutional Instance-aware Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2359–2367. https://doi.org/10.1109/cvpr.2017.472

R. Girshick, J. Donahue, T. Darrell, U. C. Berkeley, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2–9. https://doi.org/10.1109/CVPR.2014.81

J. Dai, K. He, and J. Sun, “Instance-aware Semantic Segmentation via Multi-task Network Cascades,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3150–3158. https://doi.org/10.1109/cvpr.2014.81

A. Khoreva, R. Benenson, J. Hosang, and M. Hein, “Simple Does It : Weakly Supervised Instance and Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 876–885. https://doi.org/10.1109/CVPR.2017.181

M. Yahiaoui et al., “FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving,” 2019, pp. 1–4.https:// doi.org/10.21427/v1ar-t994

H. Rashed, M. Ramzy, V. Vaquero, A. El Sallab, G. Sistu, and S. Yogamani, “FuseMODNet : Real-Time Camera and LiDAR based Moving Object Detection for robust low-light Autonomous Driving,” in IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 1–10. https://doi.org/10.1109/ICCVW.2019.00293

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, and Q. Xu, “nuScenes : A multimodal dataset for autonomous driving,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, no. March, pp. 1–16. https://doi.org/10.1109/CVPR42600.2020.01164

G. Gordon, “Social behaviour as an emergent property of embodied curiosity : a robotics perspective,” Philos. Trans. B, vol. 374, pp. 1–7, 2019. https://doi.org/10.1098/rstb.2018.0029

W. Lumchanow and S. Udomsiri, “Image classification of malaria using hybrid algorithms : convolutional neural network and method to find appropriate K for K-Nearest neighbor,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 1, pp. 382–388, 2019. https://doi.org/10.11591/ijeecs.v16.i1.pp382-388

W. Setiawan, M. I. Utoyo, and R. Rulaningtyas, “Transfer learning with multiple pre-trained network for fundus classification,” TELKOMNIKA, vol. 18, no. 3, pp. 1382–1388, 2020. https://doi.org/10.12928/telkomnika.v18i3.14868

M. Siam, V. Sepehr, M. Jagersand, and N. Ray, “Convolutional Gated Recurrent Networks for Video Segmentation,” in IEEE International Conference on Image Processing (ICIP), 2017, pp. 1–5. https://doi.org/10.1109/ICIP.2017.8296851

N. Fatihahsahidan, A. K. Juha, N. Mohammad, and Z. Ibrahim, “Flower and leaf recognition for plant identification using convolutional neural network,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 2, pp. 737–743, 2019. https://doi.org/10.11591/ijeecs.v16.i2.pp737-743

M. Syarief and W. Setiawan, “Convolutional neural network for maize leaf disease image classification,” TELKOMNIKA, vol. 18, no. 3, pp. 1376–1381, 2020. https://doi.org/10.12928/telkomnika.v18i3.14840

G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video : A high-definition ground truth database,” Pattern Recognit. Lett., pp. 1–10, 2008. https://doi.org/10.1016/j.patrec.2008.04.005

C. Yu, J. Wang, C. Peng, C. Gao, and N. Sang, “BiSeNet : Bilateral Segmentation Network for Real-time Semantic Segmentation,” in European COnference on Computer Vision, 2018, pp. 1–17. https://doi.org/10.1007/978-3-030-01261-8_20

J. Simon, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The One Hundred Layers Tiramisu : Fully Convolutional DenseNets for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 11–19. https://doi.org/10.1109/cvprw.2017.156

M. Siam, S. Elkerdawy, and M. Jagersand, “Deep Semantic Segmentation for Automated Driving : Taxonomy , Roadmap and Challenges,” in IEEE International Conference on Intelligent Transportation Systems (ITSC), 2017, pp. 1–8. https://doi.org/10.1109/ITSC.2017.8317714

F. Visin, M. Ciccone, and A. Romero, “ReSeg : A Recurrent Neural Network-based Model for Semantic Segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016, pp. 41–48. https://doi.org/10.1109/CVPRW.2016.60

V. Badrinarayanan, A. Kendall, R. Cipolla, and S. Member, “SegNet : A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 1–14, 2016. https://doi.org/10.1109/TPAMI.2016.2644615

E. Fernandez-moral, R. Martins, D. Wolf, and P. Rives, “A new metric for evaluating semantic segmentation : leveraging global and contour accuracy,” in IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1–8. https://doi.org/10.1109/IVS.2018.8500497

M. Kampffmeyer, A. Salberg, and R. Jenssen, “Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016, pp. 1–9. https://doi.org/10.1109/CVPRW.2016.90

D. Eigen and R. Fergus, “Predicting Depth , Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1–9. https://doi.org/10.1109/ICCV.2015.304

R. Dong, X. Pan, and F. Li, “DenseU-Net-Based Semantic Segmentation of Small Objects in Urban Remote Sensing Images,” IEEE Access, vol. 7, no. June, pp. 65347–65356, 2019. https://doi.org/10.1109/ACCESS.2019.2917952

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets Robotics : The KITTI Dataset,” Int. J. Rob. Res., vol. 32, no. 11, pp. 1–6, 2011. https://doi.org/10.1177/0278364913491297

G. Neuhold, T. Ollmann, S. R. Bul, and P. Kontschieder, “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1–10. https://doi.org/10.1109/ICCV.2017.534