New Classification Method Based on Decision Tree for Web Spam Detection

Authors

  • Rashmi R. Tundalwar Dept. of Computer Engineering, Modern College of Engineering, Pune. India Author
  • Manasi Kulkarni Dept. of Computer Engineering, Modern College of Engineering, Pune. India Author

Keywords:

Classification, Classifiers, Data mining, Web spam detection, Decision tree.

Abstract

Web spam is a serious problem for search engine spiders because the qualities of results are severely degraded by the presence of this kind of page. Web spamming refers to hosting ranking algorithm for giving some pages higher ranking than the others to divert the user. Now a day, waste increase in amount of spam, degrades search engine results. To get over of this some proper classification methods and algorithms are needed. For finding the mine rule from the large database Classification is most common method used. For classification various data mining algorithms available from that entire decision tree mining is simplest one, because it’s having simple hierarchical structure for the user understanding and decision makes process. We are using C5.0 as modified decisions tree algorithm of C4.5. Some rules are derived by applying boosting decision tree algorithm such as C5.0 on datasets and these rules are used for creation of Decision tree, which helps in improving the accuracy. The data from dataset is preprocced and stored into matrix form. The resultant system that significantly improves the detection of Web spam using C5.0 algorithm on public datasets WEBSPAM-UK2006 and WEBSPAM-UK2007. This system can also be used in improving the accuracy.

References

Downloads

Published

2014-06-30

Issue

Section

Articles

How to Cite

New Classification Method Based on Decision Tree for Web Spam Detection. (2014). International Journal of Current Engineering and Technology, 4(3), 1826-1830. https://ijcet.evegenis.org/index.php/ijcet/article/view/941