Arabic Text Categorization using k-nearest neighbour, Decision Trees (C4.5) and Rocchio Classifier: A Comparative Study
DOI:
https://doi.org/10.14741/Keywords:
Text Categorization, k-nearest neighbour, Decision tress, C4.5, Rocchio classifierAbstract
No doubt that text classification is an important research area in information retrieval. In fact there are many researches about text classification in English language. A few researchers in general talk about text classification using Arabic data set. This research applies three well known classification algorithm. Algorithm applied are K-Nearest neighbour (K-NN), C4.5 and Rocchio algorithm. These well-known algorithms are applied on in-house collected Arabic data set. Data set used consists from 1400 documents belongs to 8 categories. Results show that precision and recall values using Rocchio classifier and K-NN are better than C4.5. This research makes a comparative study between mentioned algorithms. Also this study used a fixed number of documents for all categories of documents in training and testing phase.
