Difficulty-Level Classification for English Writings

Authors

  • Hiromi Ban Nagaoka University of Technology
  • Rei Oguri Graduate School of Natural Science and Technology, Kanazawa University, Ishikawa, Japan;
  • Haruhiko Kimura Graduate School of Natural Science and Technology, Kanazawa University, Ishikawa, Japan

DOI:

https://doi.org/10.14738/tmlai.33.1245

Keywords:

Accuracy, Difficulty-level, F-measure, Machine learning

Abstract

The popularity of e-books has grown recently.  As the number of e-books continues to increase, the task of categorizing all books manually requires a significant amount of time.  If English sentences can be categorized according to their level of difficulty, it becomes possible to recommend a foreign-language book compatible with the reader’s level of competency in English.  This study extracted eleven types of attribute from English text data, with the aim of classifying English text according to level of difficulty by learning and categorization.  Using the method of “leave-one-out cross-validation,” text was subjected to machine learning and categorization.  In order to improve accuracy, furthermore, an experiment was carried out in which the size of text data was varied, and the attribute selection method was implemented.  As a result, accuracy was improved to 77.04%, and F-measure to 63.96%.

References

(1) ITmedia eBook USER | What is the total number of titles of e-books and e-magazines distributed within Japan? http://ebook.itmedia.co.jp/ebook/articles/1412/19/news033.html

(2) Kindle Store, http://www.amazon.co.jp/Kindle-%E3%82%AD%E3%83%B3%E3%83%89%E3%83%AB-%E9%9B%BB%E5%AD%90%E6%9B%B8%E7%B1%8D/b?node=2250738051

(3) Hiromi Ban and Takashi Oyabu, Text Mining of English Textbooks in Finland, “Proceedings of the Asia Pacific Industrial Engineering & Management Systems Conference 2012”, V. Kachitvichyanukul, H.T. Luong and R. Pitakaso eds., pp.1674-1679.

(4) Wow! 3 (2002, WSOY) Wow! 4 (2003, WSOY) Wow! 5 (2005, WSOY) Wow! 6 (2006, WSOY), http://www.kknews.co.jp/developer/finland/

(5) Weka: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/

Downloads

Published

2015-07-03

How to Cite

Ban, H., Oguri, R., & Kimura, H. (2015). Difficulty-Level Classification for English Writings. Transactions on Engineering and Computing Sciences, 3(3), 24. https://doi.org/10.14738/tmlai.33.1245