Impact on Supervised Machine Learning

David Blondheim

Supervised machine learning algorithms are trained on input data and the known results to find patterns to accurately make predictions. There is limited ability to overcome poor training data that does not represent the prediction trying to be made. Accurate classifications are critical to successful machine learning (ML) applications. The impact of misclassification of results will be reviewed within this section.

Bias-Variance Tradeoff

The Bias-Variance tradeoff graph is often used to visualize the error associated with supervised machine learning models. The generalization error of a ML model is comprised of three components: inherent error, bias, and variance, as described in Eqn. 1. The generalization error is the error produced by the model when applied to independent test data.

The generalization error is used to select the tuning parameters during the training of a supervised ML algorithm and even which learning algorithm is used for a specific data set. The goal is to avoid underfitting and overfitting to the training data. This is done by minimizing the squared bias component while ensuring the model can be generalized to future data as represented by the variance. The inherent error is error that exists based on the data used and will be discussed in detail later. This bias-variance tradeoff can be seen in Figure 1.

The accuracy and the generalization error of the ML model are related as seen in Eqn. 2 where Accuracy is the fraction of samples correctly classified, with a value of 1 representing perfect classification. When there is a high level of accuracy in the model, then there is a low generalized error rate.

Confusion Matrices

A confusion matrix is used in ML classification problems to understand how well the trained model performs. The confusion matrix shows the accuracy of the predictions made by the ML model in comparison to the true conditions of the test data. If the ML model performs well, it will correctly identify true positives (TP) and true negatives (TN), while only having a small number of false positives (FP) and false negatives (FN). The organization of the traditional confusion matrix is seen in Figure 2.

A confusion matrix can be normalized when the individual counts, such as TP, are divided by the overall total count. This normalization process allows the user to understand the percent predictions the model makes in each of the categories. The details of the normalized confusion matrix are seen in Figure 3. This normalized percentage for the TP, FP, FN, and TN is carried forward in the balance of the equations presented in this work. The values used in equations are now labeled with a percent sign (%) to indicate they are a fraction and no longer a count. This approach was taken because the discussion focuses on error rates and percentages and not a count as traditionally used in a confusion matrix.

Figure 4 is created to help illustrate a generic example of how the counts of a model as shown in Figure 2 are calculated as percentages in a normalized confusion matrix as seen in Figure 3.

The concept of measuring the accuracy of a ML model can be hotly debated. Multiple types of accuracy measures exist to summarize the model in different ways. Traditional accuracy, balanced accuracy, F1 score, or Matthews correlation coefficient are some of the most commonly used accuracy metrics. A shortcoming of the traditional accuracy measure is evident in highly unbalanced data.

These unbalanced data sets often exist in manufacturing. A high traditional accuracy value may give a user a false sense about how well the model performs. For example, a recorded accuracy of 95% may lead one to believe the model is ‘‘good.” In fact, this could be a result of the model predicting 100% acceptable product and the product having a 5% scrap rate. In this case, the prediction provides no value even though it’s 95% accurate. Other accuracy calculations provide a different perspective on the overall model accuracy for unbalanced data. A theoretical example in Figure 5 highlights the value differences in the accuracy metrics.

The best calculation of model accuracy is not for this article to help decide. Instead, consistency must occur when comparing a decision made before and after the implementation of ML. The traditional accuracy approach is used in this work for several reasons. First, this metric was selected because it is commonly used in industry even with its shortcoming for unbalanced data. Additionally, it is expected the ML user will review the entire confusion matrix and not just an accuracy statistic calculated to truly understand the prediction of the ML model. Finally, the traditional accuracy creates a situation where the math is simplified and provides insight into creating the critical error threshold equation.

Prior to ML implementation, the decision automatically made in manufacturing is all product is acceptable, so there are no predicted negative conditions. The product with actual negative conditions is rejected after additional processing. These are the false positives created by assuming all product is good. After ML implementation, the costs associated with false positives (FP% ), false negatives (FN% ), and true negatives (TN% ) all must be considered. The traditional accuracy calculation for normalized confusion matrices is used and can be seen in Eqn. (3).

With Eqn. 1, the generalized error rate in Eqn. 2 can be simplified to focus on the incorrect predictions of the ML model as shown in Eqn. 4.

Critical Error Threshold (CET)

A concept not found in literature to date, but of great importance to applications of ML in manufacturing, is understanding a critical error threshold (CET) exists for the generalized error in the bias-variance tradeoff graph. This CET defines applications of trained ML that provide value to the organization to implement. Figure 6 shows the CET updated bias-variance tradeoff graph. Financial value is achieved by the manufacturing organization if a model’s generalized error falls under the CET.

CET is a financial review of the accuracy associated with a machine learning model. The prediction accuracy of the ML model, the costs associated with the scrap, and the cost to implement must be justified when compared to current scrap costs, as seen in Eqn. 5

With details regarding specific current scrap performance, scrap costs, value-add costs, volumes, cost for ML implementation, and the percentages from the normalized confusion matrix, Eqn. 5 can be expanded to provide additional analysis in defining the CET as seen in Eqn. 6. Note that the value-add costs could represent more than scrap costs added to the part before the defect is uncovered. This variable can also represent additional inspection costs, schedule adjustments, additional setups, warrantee claims, and other items that drive manufacturing costs due to defective products in the supply chain.

If there is no optimization within the manufacturing process with the implementation of the ML model, then the scrap rate before ML must equal the scrap rate after model implementation. As per the normalized confusion matrix, this scrap rate is the summation of the True Negative condition, regardless of prediction as shown in Eqn. 7.

The generalized error rate and CET relationship exist per Eqn. 8. There is motivation to implement ML in manufacturing when the generalized error rate is below the CET.

If no optimization has happened with the implementation of the ML model as defined in Eqn. 7, the CET value can be simplified as seen in Eqn. 9. This is accomplished by solving for the ratio between the right and left hand of Eqn. 6 and substituting the generalized error rate from Eqn. 4 and the scrap percentage as per Eqn. 7.

The values for FN and FP will vary by tuning and training of the ML algorithm. The casting cost and value-add costs for defects are dependent on the part and processing. Slight improvements in predictions could yield financially favorable results for highly expensive value-add costs. If the value-add costs are low, the accuracy of the model is poor, or the cost to implement the ML is expensive, then value gained by implementation will not be justified. The generalized error must fall under the calculated CET for the organization to achieve a financial benefit. There is no motivation to adopt and implement ML within manufacturing if the accuracy of the model prediction does not yield financial benefits. Therefore, it is important to understand how the inherent error and squared bias affect the generalized error and implication of the CET.

Inherent and Bias Error

In Eqn. 1, the inherent error exists because the data used for the prediction do not completely represent the ability to predict the output. Additional input variable data likely exists that would improve the prediction on the model. To reduce the inherent error, these additional variables would need to be identified and included as predictors in the model. Within manufacturing, the impact of the inherent error may be large. The data collected may struggle to accurately train the model because critical variables are not included. This would increase the inherent error, thereby increasing the generalized error of the prediction.

Additionally, by not including the critical parameter, the data space that exists will likely overlap since the true cause for the results are not known. If the generalized error is increased above the CET, as seen in Figure 7, the value of the ML model is not worth the investment.

The squared bias term from Eqn. 1 is described as the amount by which the mean model estimate based on the trained data differs from the true mean. Typically, a more complex model created from the training data means a lower squared bias term. The model will be overfit to the training data. When applied to other data beyond the training set, it will perform poorly. This performance is captured in the variance term.

The squared bias term is directly related to the four elements of classification issues identified within this paper. By misclassifying the training data, the squared bias component shifts upward, increasing the generalized error of the model. The bias term in the graph could be thought of as approaching an asymptotic limit of the misclassification rate associated with the training data. The larger the rate of misclassification within the training data, the higher the shift upward in the bias term and the resulting generalized error of the algorithm. If this shift up is substantial enough, the generalized error may exceed the CET as seen in Figure 8.

Although machine learning is a powerful tool to help find patterns, the data it is trained on must represent the critical parameters required for a good prediction to eliminate data space overlap (inherent error) and classified correctly to provide proper training (bias). The challenge for supervised ML in manufacturing is addressing both items. Data collected in manufacturing are limited and may not capture the total data space needed for an accurate prediction. The probability of all the results being properly classified, especially from visual inspection, is poor. The CET calculation could provide a benefit if the ML is tuned to focus on false positives or false negatives, provided there is an economic advantage with the scrap or value-add costs.

This is further compounded by the highly unbalanced nature of manufacturing data. Consider an example in an ML algorithm created to improve the click-rate for web advertising. Success in this application could be achieved by increasing from a 2% click rate to a 5% click rate. This can also be considered as utilizing ML to move from a 98% error rate to a 95% error rate. Getting under the CET may be easy in this advertising case. Compare this to a manufacturing example where the success of product being produced is theoretically 95% with a 5% scrap rate. The goal of ML is to increase this to a near perfect prediction.

Because quality yield rates within manufacturing are typically good, the ability to implement ML requires perfect data collection and process knowledge. It should not be a surprise to see this technology fail to produce meaningful results in many manufacturing settings. The generalized error rate is above the CET needed due to this imbalance of the data, missing data, and misclassified results.

Recommendations and Future Study

The work presented here should challenge the typical approaches for classification and use of supervised ML within the foundry and manufacturing industries. A systems approach is needed to gather all data parameters within the process to minimize the generalization error. If the CET is crossed, the supervised machine learning model adds no financial value to the operation. Three recommended areas of study or improvements needed are: (1) systems approach to process data (inherent error); (2) four elements of classifications issues (bias); and (3) unsupervised ML and feature importance.

Supervised machine learning is a powerful tool that has been successfully applied in many industries. There have been substantial advances in ML domains such as image classification and natural language processing. Comparing these to applications of supervised ML in manufacturing is like comparing apples and oranges. Both use ML, yet they are distinctly different and must be treated accordingly.

The challenges of supervised machine learning in manufacturing are significant due to classification issuesand limitations in data collection. The four elements proposed (Binary Acceptance Specifications, Stochastic Formation of Defects, Secondary Process Variation, and Visual Defect Inspection) influence the final classification of a part.

Misclassification creates data space overlap. This overlap alters the bias in the training of supervised machine learning, possibly rendering the model financially useless in a production environment. Understanding the critical error threshold provides economic guidance on when ML can be successfully applied.

Until manufacturing can establish system-wide gathering of process variables and eliminate classification issues, the success of supervised ML will be limited to highly controlled research or academic experiments. Much noise exists in the system today. This does not mean ML is to be abandoned; instead, different approaches are needed for manufacturing to see the benefit.

Beyond traditional uses of supervised ML, feature importance and unsupervised ML provide entry points for manufacturers looking to enter and start using machine learning. The potential time savings and guidance on critical input parameters feature importance can provide needs to be better understood and utilized within manufacturing. This could be a noteworthy savings in experimentation and the optimization process for diecasters. Additionally, utilizing unsupervised ML for process control and anomaly detection allows for the use of machine learning in manufacturing, while creating the foundation needed for future supervised ML. This foundation is created when a company improves its classification of parts produced (reducing the bias and overlap) and optimizes the data that could improve the prediction model (reducing the inherent error).

In the end, these changes will position manufacturing to benefit from accurate predictions of supervised machine learning while obtaining an improved understanding of the process.