Stochastic gradients are used in the training and optimization of many algorithms and techniques used in modern artificial intelligence (AI) and machine learning (ML), e.g., playing a central role in stochastic gradient descent used for training deep neural networks. Prior to such AI/ML applications, these techniques have been around for over half a century, arising in diverse areas of industrial engineering (IE) and operations research (OR) areas such as queueing networks, production/inventory and preventive maintenance systems, and quality control, as well as in financial applications. This talk will review the history of stochastic gradients, as well as survey the various approaches, both in the IE/OR and AI/ML contexts. More recent research has focused on discontinuous sample performance measures, for which conditional Monte Carlo and the generalized likelihood ratio method are two of the most successful approaches. IE/OR applications that require these approaches include problems that can be modeled as Markov decision processes (MDPs), for which it is often the case that optimal policies often take the form of a threshold policy. One healthcare example that will be presented has to do with optimal acceptance of a donated organ, e.g., a kidney patient with end-stage renal disease in need of a transplantation, which is modeled as an optimal stopping MDP. Another example that will be presented is estimating higher-order derivatives for queueing systems, focusing on the well-known single-server queue to serve as illustrative example.