Paragraph Selection Methods Using Feature-Based on Segment-Based Clustering Process Using Paragraphs for Identifying Topics on Indication Detection of Plagiarism System
Abstract views: 4477

Paragraph Selection Methods Using Feature-Based on Segment-Based Clustering Process Using Paragraphs for Identifying Topics on Indication Detection of Plagiarism System

Denar Regata Akbi, Arini Rahmawati Rosyadi

Abstract

In segment-based clustering, the paragraphs selection as a dataset in the clustering process has a very important role. This is because the paragraph used as the dataset can affect the clustering result. This research uses paragraph selection using feature-based method which aims to optimize the clustering process conducted in the previous research. Based on the evaluation results using Silhouette Coefficient and Sum Square Errors evaluation methods to find the proper k value, it is found that with the utilization of Feature-based method, better results can be acquire compared to the evaluation result from the previous research.

Keywords

Feature-Based, Paragraphs Selection, Segment-Based, Silhouette Coefficient, Sum Square Errors

Full Text:

PDF

References

[1] J. Brooke, and G. Hirst, “Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector-Space Model with Extrinsic Features,” Notebook for PAN at CLEF, 2012.

[2] J. Brooke, A. Hammond, and G. Hirst, “Unsupervised Stylistic Segmentation of Poetry with Change Curves and Extrinsic Features,” In CLfL@ NAACL-HLT, Pp. 26-35, June 2012.

[3] P. Shrestha, and T. Solorio, “Using a Variety of n-Grams for the Detection of Different Kind of Plagiarism,” CLEF, 2013.

[4] M. Jiffriya, M.A. Jahan, R.G. Ragel, and S. Deegalla, “AntiPlag: Plagiarism Detection on Electronic Submissions of Text Based Assignments,” Industrial and Information Systems (ICIIS) 8th IEEE International Conference, Pp. 376 – 380, Peradeniya: IEEE, 2013.

[5] A. Rosyadi, A.Z. Arifin, and & D. Purwitasari, “Clusterization Based on Segment Using Paragraph to Identify Topic on Plagiarism Indication Detection,” Inspiration Journal, Pp. 6(2), 2016.

[6] S. Ladda, N. Salim, and M.S. Binwahlan, "Automatic Text Summarization Using Feature Based Fuzzy Extraction," Journal of Information Declaration, Pp.105-115, December 2008.

[7] H.P. Luhn, “The Automatic Creation of Literature Abstracts: Advances in Automatic Text Summarization,” Pp. 15, 1999.

[8] A. Tagarelli, & G. Karypis, “A Segment-Based Approach to Clustering Multi-Topic Documents,” Knowledge and Information System, Pp. 563-595, 2013.

[9] P.J. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis," Computational and Applied Mathematics, 20: 53–65, 1987.

[10] N.P.E Merliana, & A.J. Santoso, “Analysis of Best Cluster Number Determination on K-Means Clustering Method,” Proceeding Sendi_U, 2015.

Refbacks

  • There are currently no refbacks.

Referencing Software:

Checked by:

Supervised by:

Statistic:

View My Stats


Creative Commons License Kinetik : Game Technology, Information System, Computer Network, Computing, Electronics, and Control by http://kinetik.umm.ac.id is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.