Classification of Lexile Level Reading Load Using the K-Means Clustering and Random Forest Method

Harits Ar Rosyid; Utomo Pujianto; Moch Rajendra Yudhistira

doi:10.22219/kinetik.v5i2.897

Issue

Vol. 5, No. 2, May 2020

Issue Published : May 12, 2020

Classification of Lexile Level Reading Load Using the K-Means Clustering and Random Forest Method

https://doi.org/10.22219/kinetik.v5i2.897

Harits Ar Rosyid

Universitas Negeri Malang

Utomo Pujianto

Universitas Negeri Malang

Moch Rajendra Yudhistira

Universitas Negeri Malang

Corresponding Author(s) : Moch Rajendra Yudhistira

mrajendray@gmail.com

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 5, No. 2, May 2020
Article Published : May 6, 2020

Abstract

There are various ways to improve the quality of someone's education, one of them is reading. By reading, insight and knowledge of various kinds of things can increase. But, the ability and someone's understanding of reading is different. This can be a problem for readers if the reading material exceeds his comprehension ability. Therefore, it is necessary to determine the load of reading material using Lexile Levels. Lexile Levels are a value that gives a size the complexity of reading material and someone's reading ability. Thus, the reading material will be classified based a value on the Lexile Levels. Lexile Levels will cluster the reading material into 2 clusters which is easy, and difficult. The clustering process will use the k-means method. After the clustering process, reading material will be classified using the reading load Random Forest method. The k-means method was chosen because of the method has a simple computing process and fast also. Random Forest algorithm is a method that can build decision tree and it’s able to build several decision trees then choose the best tree. The results of this experiment indicate that the experiment scenario uses 2 cluster and SMOTE and GIFS preprocessing are carried out shows good results with an accuracy of 76.03%, precision of 81.85% and recall of 76.05%.

Keywords

Text Classification Lexile Levels Clustering K-Means Random Forest

Rosyid, H. A., Pujianto, U., & Yudhistira, M. R. (2020). Classification of Lexile Level Reading Load Using the K-Means Clustering and Random Forest Method. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 5(2), 139-146. https://doi.org/10.22219/kinetik.v5i2.897

Download Citation

References

J. Oakhill, “Children’s difficulties in reading comprehension,” Educational Psychology Review, Vol. 5, No. 3, Pp. 223–237, 1993. https://doi.org/10.1007/BF01323045
K. Glasswell and M. P. Ford, “Teaching flexibly with leveled texts: More power for your reading block,” The Reading Teacher, Vol. 64, No. 1, Pp. 57–60, 2010. https://doi.org/10.1598/RT.64.1.7
C. Lennon and H. Burdick, “The lexile framework as an approach for reading measurement and success,” electronic publication on www. lexile. com, 2004.
M. Awad and R. Khanna, Efficient learning machines: theories, concepts, and applications for engineers and system designers. Apress, 2015. https://dx.doi.org/10.1007/978-1-4302-5990-9
S. Yaram, “Machine learning algorithms for document clustering and fraud detection,” in 2016 International Conference on Data Science and Engineering (ICDSE), Pp. 1–6, 2016. https://doi.org/10.1109/ICDSE.2016.7823950
M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?,” Journal of Machine Learning Research, Vol. 15, Pp. 3133–3181, 2014.
L. Breiman, “Random Forests,” Machine Learning, Vol. 45, No. 1, Pp. 5–32, Oct. 2001. https://doi.org/10.1023/A:1010933404324
J. R. Quinlan, “Induction of decision trees,” Mach Learn, Vol. 1, No. 1, Pp. 81–106, Mar. 1986. https://doi.org/10.1007/BF00116251
B. Wang, “a new clustering algorithm compared with the simple K-Means,” in 2009 International Conference on Management and Service Science, Pp. 1–5, 2009. https://doi.org/10.1109/ICMSS.2009.5302386
I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016. https://doi.org/10.1016/C2009-0-19715-5
S. Robertson, “Understanding inverse document frequency: on theoretical arguments for IDF,” Journal of Documentation, Vol. 60, No. 5, Pp. 503–520, Oct. 2004. https://doi.org/10.1108/00220410410560582
Z. K. A. Baizal, M. A. Bijaksana, and A. S. Sastrawan, “Analisis pengaruh metode over sampling dalam churn prediction untuk perusahaan telekomunikasi,” Jurnal Fakultas Hukum UII, 2009.
W. Zhu, J. Feng, and Y. Lin, “Using Gini-Index for Feature Selection in Text Categorization,” presented at the 2014 International Conference on Information, Business and Education Technology (ICIBET 2014), 2014. https://dx.doi.org/10.2991/icibet-14.2014.22
A. Van Assche, C. Vens, H. Blockeel, and S. Džeroski, “First order random forests: Learning relational classifiers with complex aggregates,” Mach Learn, Vol. 64, No. 1, Pp. 149–182, Sep. 2006.
L. Breiman, “Bagging Predictors,” Machine Learning, Vol. 24, No. 2, Pp. 123–140, Aug. 1996. https://doi.org/10.1023/A:1018054314350
U. Pujianto, “Random forest and novel under-sampling strategy for data imbalance in software defect prediction,” International Journal of Engineering and Technology(UAE), Vol. 7, Pp. 39–42, Jan. 2018. http://dx.doi.org/10.14419/ijet.v7i4.15.21368
E. Olivetti, S. Greiner, and P. Avesani, “Statistical independence for the evaluation of classifier-based diagnosis,” Brain Inf., Vol. 2, No. 1, Pp. 13–19, Mar. 2015. https://doi.org/10.1007/s40708-014-0007-6

References

J. Oakhill, “Children’s difficulties in reading comprehension,” Educational Psychology Review, Vol. 5, No. 3, Pp. 223–237, 1993. https://doi.org/10.1007/BF01323045

K. Glasswell and M. P. Ford, “Teaching flexibly with leveled texts: More power for your reading block,” The Reading Teacher, Vol. 64, No. 1, Pp. 57–60, 2010. https://doi.org/10.1598/RT.64.1.7

C. Lennon and H. Burdick, “The lexile framework as an approach for reading measurement and success,” electronic publication on www. lexile. com, 2004.

M. Awad and R. Khanna, Efficient learning machines: theories, concepts, and applications for engineers and system designers. Apress, 2015. https://dx.doi.org/10.1007/978-1-4302-5990-9

S. Yaram, “Machine learning algorithms for document clustering and fraud detection,” in 2016 International Conference on Data Science and Engineering (ICDSE), Pp. 1–6, 2016. https://doi.org/10.1109/ICDSE.2016.7823950

M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?,” Journal of Machine Learning Research, Vol. 15, Pp. 3133–3181, 2014.

L. Breiman, “Random Forests,” Machine Learning, Vol. 45, No. 1, Pp. 5–32, Oct. 2001. https://doi.org/10.1023/A:1010933404324

J. R. Quinlan, “Induction of decision trees,” Mach Learn, Vol. 1, No. 1, Pp. 81–106, Mar. 1986. https://doi.org/10.1007/BF00116251

B. Wang, “a new clustering algorithm compared with the simple K-Means,” in 2009 International Conference on Management and Service Science, Pp. 1–5, 2009. https://doi.org/10.1109/ICMSS.2009.5302386

I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016. https://doi.org/10.1016/C2009-0-19715-5

S. Robertson, “Understanding inverse document frequency: on theoretical arguments for IDF,” Journal of Documentation, Vol. 60, No. 5, Pp. 503–520, Oct. 2004. https://doi.org/10.1108/00220410410560582

Z. K. A. Baizal, M. A. Bijaksana, and A. S. Sastrawan, “Analisis pengaruh metode over sampling dalam churn prediction untuk perusahaan telekomunikasi,” Jurnal Fakultas Hukum UII, 2009.

W. Zhu, J. Feng, and Y. Lin, “Using Gini-Index for Feature Selection in Text Categorization,” presented at the 2014 International Conference on Information, Business and Education Technology (ICIBET 2014), 2014. https://dx.doi.org/10.2991/icibet-14.2014.22

A. Van Assche, C. Vens, H. Blockeel, and S. Džeroski, “First order random forests: Learning relational classifiers with complex aggregates,” Mach Learn, Vol. 64, No. 1, Pp. 149–182, Sep. 2006.

L. Breiman, “Bagging Predictors,” Machine Learning, Vol. 24, No. 2, Pp. 123–140, Aug. 1996. https://doi.org/10.1023/A:1018054314350

U. Pujianto, “Random forest and novel under-sampling strategy for data imbalance in software defect prediction,” International Journal of Engineering and Technology(UAE), Vol. 7, Pp. 39–42, Jan. 2018. http://dx.doi.org/10.14419/ijet.v7i4.15.21368

E. Olivetti, S. Greiner, and P. Avesani, “Statistical independence for the evaluation of classifier-based diagnosis,” Brain Inf., Vol. 2, No. 1, Pp. 13–19, Mar. 2015. https://doi.org/10.1007/s40708-014-0007-6

Issue

Vol. 5, No. 2, May 2020

Classification of Lexile Level Reading Load Using the K-Means Clustering and Random Forest Method

Corresponding Author(s) : Moch Rajendra Yudhistira

Abstract

Keywords

Download Citation

References

Downloads