Subsurface fluid flow and solute transport take place in a multiscale heterogeneous environment. Neither these phenomena nor their host environment can be observed or described with certainty at all scales and locations of relevance. The resulting ambiguity has led to alternative conceptualizations of flow and transport and multiple ways of addressing their scale and space-time dependencies. We focus our attention on four approaches that have led to nonlocal representations of advective and dispersive transport of nonreactive tracers in randomly heterogeneous porous or fractured continua. We compare these approaches theoretically on the basis of their underlying premises and the mathematical forms of the corresponding nonlocal advective-dispersive terms. One of the four approaches describes transport at some reference support scale by a classical (Fickian) advection-dispersion equation (ADE) in which velocity is a spatially (and possibly temporally) correlated random field. The randomness of the velocity, which is given by Darcy’s law, stems from random fluctuations in hydraulic conductivity (and advective porosity though this is often disregarded). Averaging the stochastic ADE over an ensemble of velocity fields results in a space-time nonlocal representation of mean advective-dispersive flux, an approach we designate as stnADE. A closely related space-time nonlocal representation of ensemble mean transport is obtained upon averaging the motion of solute “particles” through a random velocity field within a Lagrangian framework, an approach we designate stnL. The concept of continuous time random walk (CTRW) has yielded a representation of advective-dispersive flux that is nonlocal in time but local in space. Closely related to the latter are forms of ADE entailing fractional derivatives (fADE) which have led to representations of advective-dispersive flux that are nonlocal in space but local in time; nonlocality in time arises in the context of multirate mass transfer models, which we exclude from consideration in this paper. We describe briefly each of these four nonlocal approaches and offer a perspective on their differences, commonalities, and relative merits as analytical and predictive tools.