Scalable Machine Learning Pipelines for Big Telemetry Data in Semiconductor Manufacturing

Ivan Martis

Authors

Ivan Martis Independent Researcher Author

Keywords:

Telemetry Data, Semiconductor Manufacturing, Fault Detection, Classification Methodology, High-Dimensional Data, Imbalanced Data, SECOM Dataset, SMOTE, Proactive Maintenance, Production Analysis

Abstract

The semiconductor industry faces increasing challenges in maintaining high yields and reducing costs as manufacturing processes become more complex. A new and effective tool for optimising processes is big data analytics, enabling manufacturers to extract valuable insights from vast amounts of production data and make data-driven decisions. This study proposes a comprehensive machine learning (ML) pipeline tailored for analyzing telemetry data using the SECOM dataset from the UCI repository. The methodology includes data cleaning, missing value imputation, feature scaling via Min-Max normalization, dimensionality reduction, and Synthetic Minority Oversampling Technique (SMOTE) to handle class imbalance. A Decision Tree Classifier (DTC) is utilized to classify good and defective products, achieving an accuracy of 88% in addition to excellent results in terms of recall, F1-score, ROC-AUC, and accuracy. Based on a comparison, the offered DTC model performs much better than popular traditional and deep learning techniques and can be trusted for spotting and addressing faults in real life.

Scalable Machine Learning Pipelines for Big Telemetry Data in Semiconductor Manufacturing

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

journal_details

IMPACT METRIC: 8.7

Information

call_for_papers

Make a Submission

indexed_in

facts_and_figures

Share