POS Tagger Tweet Bahasa Indonesia

POS Tagger Tweet Bahasa Indonesia

Yuda Munarko, yufis azhar, Maulina Balqis, Susi Ekawati

Abstract

Pada penelitian ini dilakukan investigasi POS Tagger dengan pendekatan Cyclic Dependency Network untuk data tweet dalam Bahasa Indonesia. Untuk koleksi tweet, digunakan tiga koleksi data, yakni tweet dengan gaya bahasa formal, informal dan gabungan. Sumber koleksi tweet formal adalah tweet dari akun berita, sedangkan koleksi tweet informal didapatkan dari akun umum.  Adapun jenis tag yang digunakan berjumlah 41, dimana 35 adalah standar tag Bahasa Indonesia dan 6 adalah tambahan tag untuk twitter. Hasilnya adalah untuk koleksi data formal ketepatan deteksi mencapai 95,42%. Sedangkan untuk koleksi data informal dan gabungan ketepatannya mencapai 92,42% dan 90,69% secara berurutan. Kami juga mendapatkan hasil bahwa untuk tag yang sering muncul cenderung untuk memiliki nilai ketepatan yang tinggi juga, sedangkan tag yang kemunculannya lebih sedikit menyebabkan penurunan rata-rata ketepat secara keseluruhan.

Keywords

Cyclic dependency network, POS Tagger

Full Text:

PDF

References

[1] I. Marshall, “Tag Selection Using Probabilistic Methods,” The Computational Analysis Of English: A Corpusbased Approach, Pp. 42–65, 1987.

[2] K. Church, “A Stochastic Parts Program And Noun Phrase Parser For Unrestricted Text,” In Proceedings Of The Second Conference On Applied Natural Language Processing, 1988, Pp. 136–143.

[3] K. Toutanova, D. Klein, And C. Manning, “Feature-Rich Part-Of-Speech Tagging With A Cyclic Dependency Network,” In Proceedings Of The 2003 Conference Of The North American Chapter Of The Association For Computational Linguistics On Human Language Technology, 2003, Vol. 1.

[4] F. Pisceldo, R. Manurung, And M. Adriani, “Probabilistic Part-Of-Speech Tagging For Bahasa Indonesia,” In Third International MALINDO Workshop, Colocated Event ACL-IJCNLP, 2009.

[5] A. Wicaksono And A. Purwarianti, “Hmm Based Part-Of-Speech Tagger For Bahasa Indonesia,” In Fourth International MALINDO Workshop, Jakarta, 2010.

[6] H. Mohamed, N. Omar, And M. A. Aziz, “Statistical Malay Part-Of-Speech (Pos) Tagger Using Hidden Markov Approach,” In Semantic Technology And Information Retrieval (STAIR), International Conference On IEEE, 2011.

[7] K. Toutanova And C. Manning, “Enriching The Knowledge Sources Used In A Maximum Entropy Part-Of-Speech Tagger,” Proceedings Of The 2000 Joint SIGDAT Conference On Empirical Methods In Natural Language Processing And Very Large Corpora: Held In Conjunction With The 38th Annual Meeting Of The Association For Computational Linguistics, Vol. 13, Pp. 63–70, 2000.

[8] S. Lee, J. Tsujii, And H. Rim, “Part-Of-Speech Tagging Based On Hidden Markov Model Assuming Joint Independence,” In Proceedings Of The 38th Annual Meeting On Association For Computational Linguistics, 2000, Pp. 263–169.

[9] K. Gimpel, N. Schneider, B. O’connor, And D. Das, “Part-Of-Speech Tagging For Twitter: Annotation, Features, And Experiments,” In Proceedings Of The 49th Annual Meeting Of The Association For Computational Linguistics: Human Language Technologies: Short Papers, Association For Computational Linguistics, 2011, Vol. 2.

[10] J. Foster, O. Cetinoglu, J. Wagner, And J. Le Roux, “# Hardtoparse: Pos Tagging And Parsing The Twitterverse,” In Workshops At The Twenty-Fifth AAAI Conference On Artificial Intelligence, 2011.

[11] L. Derczynski, A. Ritter, S. Clark, And K. Bontcheva, “Twitter Part-Of-Speech Tagging For All: Overcoming Sparse And Noisy Data.,” In Proceedings Of The International Conference On Recent Advances In Natural Language Processing, ACL, 2013.

[12] Z. Luo, M. Osborne, S. Petrovic, And T. Wang, “Improving Twitter Retrieval By Exploiting Structural Information.,” In Aaai, 2012.

Refbacks

  • There are currently no refbacks.

Referencing Software:

Checked by:

Supervised by:

Statistic:

View My Stats


Creative Commons License Kinetik : Game Technology, Information System, Computer Network, Computing, Electronics, and Control by http://kinetik.umm.ac.id is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.