Improving Manufacturing Applications of Machine Learning by Understanding Defect Classification and the Critical Error Threshold

David Blondheim

September 13, 2023

Machine learning (ML) unlocks patterns and provides insight into data that organizations are using for financial value and knowledge. Use of machine learning in manufacturing environments is increasing, yet sometimes these applications fail to produce meaningful results. A critical review of how defects are classified is needed to appropriately apply machine learning in a production foundry and other manufacturing processes. Four elements associated with defect classification are proposed: Binary Acceptance Specifications, Stochastic Formation of Defects, Secondary Process Variation, and Visual Defect Inspection. These four elements create data space overlap, which influences the bias associated with training supervised machine learning algorithms. If this influence is significant enough, the predicted error of the model exceeds a critical error threshold (CET). There is no financial motivation to implement the ML model in the manufacturing environment if its error is greater than the CET. The goal is to bring awareness to these four elements, define the critical error threshold, and offer guidance and future study recommendations on data collection and machine learning that will increase the success of ML within manufacturing.

Studies on ML have highlighted the challenges with applying it in manufacturing in three key areas:

Pre-deployment stage: Companies must gather the right quality data in sufficient amounts.
Deployment stage: Challenges are associated with gathering large amounts of data and ensuring hardware and software systems can handle the volume of data.
Nontechnical items: Acceptance of ML model for people who have no background in data science and people questioning the practical value ML can provide in manufacturing.

The pre-deployment stage is critical to ensure the data used for ML training is correct. Misclassification of training data can lead to errors, specifically within supervised ML algorithms. Supervised and unsupervised ML are the most used algorithm types in ML applications. In supervised ML, algorithms are trained from labeled data sets. Multiple input values are provided to a supervised ML algorithm to find a pattern to help accurately predict a future result. The algorithm is trained on both the inputs and results. It is then tested against a data set that the algorithm was not trained on to see how well the model can predict results. As will be discussed, misclassification of products occurs in production manufacturing environments. This noise introduced into the supervised ML algorithm leads to poor modeling, prediction failures, and ultimately ML not being implemented. In unsupervised ML, no results are known in the data sets. Only the input parameters are provided for the algorithm to detect patterns. Unsupervised ML focuses on clustering analysis and anomaly detection. The focus of this work is on supervised ML as it is commonly used in manufacturing applications of ML. Supervised ML would be used in classification predictions, such as predicting part quality.

Misclassification of results causes data space overlap in machine learning that can fundamentally diminish the results. Overlap occurs when multiple classifications of training data results occur in the same dimensional data space (Figure 1).

Misclassifications of defects can make applications of supervised ML exceed a critical error threshold (CET) rendering the model financially useless. Proposed are four elements of defect classifications in production environments that cause these issues for ML:

Binary acceptance specifications.
Stochastic formation of defects.
Secondary process variation.
Visual defect inspection.

These four elements are reviewed here from the high-pressure die casting (HPDC) perspective. Due to the modularity of these elements and the generality of the CET, these concepts have applications in other manufacturing processes such as sand casting, permanent mold casting, machining, painting, and assembly.

Background in High-Pressure Die Casting

High-pressure die casting (HPDC) is composed of multiple systems that control hydraulic, mechanical, and thermal processes to produce near net-shape castings with short cycle times in complex metal molds. HPDC typically focuses on large production volumes due to the capital investment in equipment, tooling costs, and short production cycle times.

Porosity is one of multiple potential defects in HPDC. Approximately 30% of foundries within the industry have identified the need to address porosity as a top concern. The HPDC community will see great value by applying ML to improve porosity and other quality defects.

Porosity that impacts the quality in production castings is created from entrapped gas, called gas porosity, or volumetric shrink, called shrink porosity. These two types of porosity can combine to form gas-assisted shrink. These defects’ causes and physical descriptions are well published. Production examples of porosity defects can be seen in Figure 2.

Porosity defects are commonly internal to the casting. As a result, the defect is usually uncovered after additional processing. This means manufacturing costs such as trimming, shot blasting, painting, machining, testing, and inspection are added to the casting cost prior to the defect being found. In addition to the costs, there is a time delay associated between casting production and when the feedback is received. To minimize this risk, foundries utilize casting simulations; experiments to help determine process settings and limits; and radiographic (or X-ray) audits to identify changes to the HPDC process, which may result in increased porosity. Examples of porosity in X-ray images are seen in Figure 3.

Even by reducing the possibility of porosity with simulations and control of the process with X-ray, porosity defects still pass through the supply chain and are identified after additional value is added. Without 100% X-ray, porosity scrap is found after machining when the void is exposed with milling or drilling operations. After the machining operation is complete, human operators visually inspect the machined castings to determine if they pass a porosity specification. The decision made is binary. Castings will either be classified as acceptable or scrap. The goal is to create ML predictions that would help eliminate poor quality regardless of its ability to be repaired. This binary classification of castings contributes to the issues with applying machine learning and will be discussed in detail in the next section.

Four Elements of Defect Classification Binary Acceptance Specifications

In Juran’s “Quality Handbook,” a defect is described as ‘‘anything that does not meet or exceed the requirements of the customer, the business, or the process’’ and states the ‘‘importance to have a realistic threshold for what is called a defect.” In manufacturing environments, this threshold is given as a quality specification for the product. The specification is typically set during the design and testing phases to ensure the product achieves the functionality intended.

A formal specification is included in manufacturing prints or as a stand-alone document to ensure a conforming product is provided to the customer. This specification becomes an aid during the inspection and will include a threshold of acceptable level of defects. Parts that exceed this threshold are classified as scrap, while parts below are classified as acceptable. A binary classification is made.

A common approach used for porosity in a production environment is defining a maximum permissible porosity size and number of pores per region. Figure 4 shows an example of a theoretical specification for a casting based on a max pore size of 2mm and a max allowable number of pores to be four in a 25-sq.mm area.

In most applications, some level of porosity is acceptable. It is common for one casting to have varying porosity zones on different part features. Each zone may have its own unique threshold based on functional needs. Sealing surfaces between machined castings or critical threaded holes might have a tighter tolerance than non-functional machined surfaces or clearance bores. Typically, these zones are identified on a manufacturing print. No universal standards exist for acceptable porosity levels within castings. Thresholds are set based on supplier requirements on assembled products (like sealants or o-rings), past design practice, and functional testing. These thresholds are often deemed proprietary information to original equipment manufacturers and are protected through non-disclosure agreements.

The binary classification process associated with specification requirements creates a problem for ML applications. Defects form along a continuous measure of size. That important detail is lost when a binary acceptable/scrap result recorded. As seen in Figure 4, a pore with a 2.1 mm size is labeled as scrap, but a pore at 1.9 mm is acceptable. The loss of fidelity on the defect measurement with a binary classification creates problems for supervised ML. Results in the data space may overlap due to this lack of distinction with a binary classification.

Stochastic Formation of Defects

The injection and solidification of castings follows known physical rules that are modeled in casting simulation software. Rules for fluid flow, heat transfer, feeding, cooling, and many physical calculations are factored into the simulations. As a result, simulations have proven to be good at predicting locations or zones as to where porosity defects will occur during the casting process. Figure 5 shows an example of this porosity predicted zone produced by simulation software.
In production, this predicted porosity zone does not create the same porosity from casting to casting. There is a stochastic, or random, nature to porosity formation within a casting. Theory says this stochastic formation occurs due to the random formation of dendrites as the metal starts to solidify which causes shrink porosity and the heterogenous nucleation sites for pores that can cause gas porosity. Oxides and inclusions are examples of these heterogenous nucleation sites for porosity that are randomly distributed through the liquid metal.

This stochastic theory was shown in a recent industrial experiment. In this experiment, 100 castings were produced with no process changes. These castings were serialized and inspected using a Bosello SRE Max with a 225 max KV power rating X-ray unit. Results showed significantly different porosity formation between castings. Figure 6 shows two castings from this experiment that were sequentially produced. The formation of the porosity was in the simulated predicted zone, as seen in Figure 5. However, the porosity was random and different between sequential castings even with no process changes.
That experiment showed no statistical difference in the critical process parameters between the best nine castings and the worst nine castings.

If the predicted porosity zone is away from any machined surface, the randomness associated with the porosity formation will have no impact on the classification of the final part. The porosity will not be uncovered or seen during visual inspection. If a hole or machined surface is cut into a zone predicted to hold porosity, the machining could potentially expose the porosity, pending random formation of the porosity. Figure 7 provides a visual example of how different stochastic porosity formations can alter classifications of the castings.

The randomness of the porosity formation directly connects to the ML problem of overlap. The data space of these points will be the same if two castings have identical input parameters. However, the random formation of the porosity causes one casting to be scrap and the other acceptable.

The overlap will cause the ML algorithm to struggle and possibly fail at providing meaningful insight. The collected data amounts to noise that the ML cannot pattern.

Secondary Process Variation

Selecting machining tolerances for a casting is a critical part of the product design. Machining tolerances are selected based on process capability, manufacturing costs, quality, life-cycle impacts, and functional requirements. There are various methods for optimizing the tolerance selection that have been studied and published.

The natural variation that occurs in machining processes and the tolerances associated with the feature create allowable differences part to part. Geometric dimensioning and tolerancing (GD&T) are used on manufacturing prints to control the measurement of features. The ASME Y14.5 standard is often referenced for GD&T requirements on prints. The variation associated with the manufacturing process by the tolerancing affects the classification of casting defects. Figure 8 shows an example of the effect tolerancing and machining variation can have on defect classifications for machined surfaces and a drilled hole.

Like the random formation of defects, the variability within the processing of the part has the potential to create data space overlap. This overlap could potentially be avoided if every machined dimension was also collected and included into the algorithm. However, this additional inspection would be extremely costly to do in a production environment.

Additionally, it would still not guarantee results of the ML prediction since the true condition of the part below the machined surface is still unknown. The ground truth needed to train ML algorithms for accurate predictions is not collected by traditional means within manufacturing.

Visual Defect Inspection

The last element that affects the classification of castings is the visual inspection process. Although the technology exists for computer vision inspection in certain applications—such as two-dimensional surfaces—cost and product mix within most manufacturing plants prevent this from being widely applied. Humans often complete the inspection and classification of defect results.

Much research has been published on a human’s ability to complete visual inspections on machined product. A person can identify 50%–80% of defective machined products with 100% visual inspection. Considerable training on visual inspection is needed to achieve the high side of this range. This includes a comprehensive knowledge of defects, planned eye scanning paths on parts, and appropriate environmental conditions such as lighting. Many manufacturing companies do not undertake this considerable effort, even though there are large costs associated with poor quality being passed as acceptable. As a result, classification rates will be on the inferior end of the range.

The ramifications of poor classification practices because of visual inspection cannot be overlooked for supervised machine learning. Most manufacturers stay in business because they can develop a process that is capable of a high yield. As a result, the data sets generated are highly unbalanced. Acceptable results greatly outnumber scrap results. This issue is compounded when potentially half the defective product is labeled as acceptable instead of its true scrap classification.

The adage ‘‘garbage in, garbage out’’ can easily be applied to supervised ML algorithms based on visual inspections. Poor visual inspection leads to incorrect classification of results. These incorrect results create data space with overlap. Overlap will cause ML algorithms to struggle to find a pattern in the noise collected within manufacturing data sets.

Combination and Summary

The four elements described can individually contribute to misclassification of defects. Unfortunately, these elements do not act independently of each other. Instead, they combine and change through time to create more classification confusion for supervised machine learning.

In particular, the Stochastic Formation of Defects and the Secondary Process Variation combine to create misclassifications. In some combinations, depending on where the porosity forms and how it is machined, the results are classified as acceptable. In other cases, the casting may be classified as scrap. Figure 9 is created to visualize these combinations. The first column in the figure shows different levels of random porosity in the predicted porosity zone. In the first row, the parts are classified as acceptable based on a theoretical specification. In the second row, the parts are classified as scrap. In the second column, the random porosity formations are exchanged between the top and bottom row.

Changes to the location of the hole based on machining variation are shown that make the previously acceptable parts scrap and scrap parts acceptable. The true classification is never known without a 100% X-ray inspection of each casting.

These two elements can also be combined with the Visual Defect Inspection to further complicate results. Visual inspection is dependent upon the person who is performing the task. Operator-to-operator performance of visual inspection can vary considerably. Proper classification becomes a probability based on the chances of porosity forming, if the machining opens it up, and whether the inspection catches the defect.

All four of these elements change over time. As previously mentioned, operators who perform the inspection task will change shift-to-shift but also will likely change with turnover. New inspectors face a learning curve of defect identification while simultaneously fighting off the repetitive nature of the work. The probability of detecting defective castings changes through time. Unfortunately, the validity of the classification results is unknown as one looks at historical data.

Specification thresholds can also change based on new suppliers or additional testing. Perhaps a different vendor can allow a slightly larger porosity specification because its o-ring is improved. Or a new part failure has shown the product is used in ways it was not designed. Now a maximum porosity size previously accepted could be rejected. To build large data sets for ML, this data would need to be consistent through time. This knowledge is lost with the binary classification of scrap. The previous data become useless for supervised ML.

Finally, manufacturing processes vary over time. Both the casting process and machining process change based on equipment maintenance, tool wear, tool replacement, die changes, or process setting improvements. To provide a detailed example, consider tool changes in machining. Part-to-part variation in machining tolerances is often very small, given the repeatability of modern machining equipment. However, once a tool breaks and is replaced by a new tool, there is a functional change in the manufacturing system. Provided the new machined dimension falls within the designed tolerances, manufacturing will proceed without a second thought even if the new dimension is a step change from the previous tool. This changes the dynamic of the porosity exposed and the classification of the part through time.

These four elements of Binary Acceptance Specifications, Stochastic Formation of Defects, Secondary Process Variation, and Visual Defect Inspection all influence the final classification of a part. As discussed, many of these elements can create an overlap within the data space, and ML will struggle to produce meaningful predictions. This struggle can be compounded by the highly unbalanced data sets that often exist in manufacturing.

A cursory review of current operational practices would suggest manufacturers are collecting the needed data to apply ML in production manufacturing settings. However, without understanding how the ML algorithm interprets the results, the user will struggle to gain the promised value from ML technology. The industry may have very clearly defined specifications, but if it only collects acceptable/scrap and not the actual size or clustering of the void, valuable information for ML is lost. A surface is machined and then visually inspected. However, the actual truth is unknown since an operator cannot see below the surface to understand if the void still exists in the casting. Also, dimensions and locations of features are not captured on 100% of the product, so that data does not exist for ML to utilize. Manufacturers may feel good they inspect 100% of castings to ensure only good parts get to their customer but fail to realize the fallacy of human inspection rates in repetitive visual tasks.

In the end, it is not the complexity of ML technology and implementation that fails the manufacturer. Instead, it is the data the manufacturer has put considerable effort in to specify, create, and inspect parts that prevents ML from being successful.

Part 2, which will be published in the October 2023 issue of Modern Casting, will provide insight into how the bias-variance tradeoff within ML is influenced by these misclassifications and the importance of a critical error threshold with highly unbalanced data that exists in manufacturing. This will help a user understand the financial impact and ML accuracy levels required for successful implementation of ML and why they often fall short in providing value in manufacturing.

Click here to view the article and figures in the digital edition of September 2023 Modern Casting.