Scalable Machine Learning Pipelines for Big Telemetry Data in Semiconductor Manufacturing

Authors

  • Ivan Martis Independent Researcher Author

Keywords:

Telemetry Data, Semiconductor Manufacturing, Fault Detection, Classification Methodology, High-Dimensional Data, Imbalanced Data, SECOM Dataset, SMOTE, Proactive Maintenance, Production Analysis

Abstract

The semiconductor industry faces increasing challenges in maintaining high yields and reducing costs as manufacturing processes become more complex. A new and effective tool for optimising processes is big data analytics, enabling manufacturers to extract valuable insights from vast amounts of production data and make data-driven decisions. This study proposes a comprehensive machine learning (ML) pipeline tailored for analyzing telemetry data using the SECOM dataset from the UCI repository. The methodology includes data cleaning, missing value imputation, feature scaling via Min-Max normalization, dimensionality reduction, and Synthetic Minority Oversampling Technique (SMOTE) to handle class imbalance. A Decision Tree Classifier (DTC) is utilized to classify good and defective products, achieving an accuracy of 88% in addition to excellent results in terms of recall, F1-score, ROC-AUC, and accuracy. Based on a comparison, the offered DTC model performs much better than popular traditional and deep learning techniques and can be trusted for spotting and addressing faults in real life.

References

Downloads

Published

2025-08-31

Issue

Section

Articles

How to Cite

Scalable Machine Learning Pipelines for Big Telemetry Data in Semiconductor Manufacturing. (2025). International Journal of Current Engineering and Technology, 15(4), 335-344. https://ijcet.evegenis.org/index.php/ijcet/article/view/1680