An Approach towards Record Linkage using Genetic Algorithm along with Hash Algorithm

Authors

  • J. R. Waykole Computer Engineering Department, Pune University, India Author
  • S. M. Shinde Computer Engineering Department, Pune University, India Author

Keywords:

Cosine similarity, Dataset, genetic algorithm, MD5, SHA-1 and string distance.

Abstract

Several systems that depends on the integrity of the data in order to offer high quality services, such as digital libraries and e-commerce brokers, may be affected due to the existence of duplicates in their warehouse. Due to this, more time is required to retrieve high quality data. Here deduplication or record linkage is computed by using hash algorithm i.e., MD5 and SHA-1 algorithm for finding similarity to detect duplicate records and eliminate them using evolutionary i.e., genetic algorithm. This approach removes the duplicate dataset samples in the system.

References

Downloads

Published

2014-06-30

Issue

Section

Articles

How to Cite

An Approach towards Record Linkage using Genetic Algorithm along with Hash Algorithm. (2014). International Journal of Current Engineering and Technology, 4(3), 2142-2146. https://ijcet.evegenis.org/index.php/ijcet/article/view/1008