The structural integrity of big-diameter underground steel pipelines is a critical determinant of the long-term viability and operational resilience of bulk water utilities distribution networks, particularly in regions grappling with aging infrastructure and constrained resource. This study explores the application of factor analysis, a multivariate statistical technique grounded in machine learning principles, to assess the condition assessment data gathered from these pipelines managed by a major South African bulk water utility.
A diverse condition assessment campaign was undertaken, integrating diverse diagnostic methodologies including External Corrosion Direct Assessment, Direct Current Voltage Gradient, Guided Ultrasonic Testing, and acoustic leak detection technologies (SmartBall and Sahara). These were complemented by asset registry attributes such as pipeline age, wall thickness, joint configuration, and coating condition, culminating in a robust dataset for machine learning analysis.
Employing Principal Component Analysis (PCA) followed by Exploratory Factor Analysis (EFA), the study identified four latent constructs that encapsulate the underlying dimensions of pipeline degradation: (1) Structural Aging and Joint Vulnerability, (2) Cathodic Protection Efficacy, (3) Leak Incidence and Defect Density, and (4) Material Durability in Corrosive Contexts. The factor structure was substantiated through orthogonal rotation (Varimax), yielding a statistically robust model.
The final factor model was delineated as follows:
- Parallel Analysis 1: Age and Integrity – encapsulating variables such as pipeline age, wall thickness, and historical leak frequency, indicative of cumulative structural fatigue.
- Parallel Analysis 2: Construction and Design – emphasising design-centric attributes, including joint type and wall configuration, which modulate mechanical resilience.
- Parallel Analysis 3: Material and Protection – reflecting the interplay between protective coatings, environmental corrosivity, and cathodic protection performance.
- Parallel Analysis 4: Performance and Risk – aggregating operational metrics such as defect frequency, pipeline length, and fault incidence, offering a predictive lens on system reliability.
Model fit indices affirm the solution's adequacy: χ² = 14.5, p = 0.63; RMSEA = 0.00; TLI = 1.043; RMSA = 0.03, indicating excellent model-data congruence despite a modest Kaiser-Meyer-Olkin (KMO) measure. This research contributes literature, integrative analytical framework for underground big-diameter steel pipeline infrastructure diagnostics, enabling data-driven prioritisation of maintenance and rehabilitation in resource-constrained water utility environments.