Even in numpy array representations of rasters using focal calculations (moving/strided windows), the median takes about 4X longer on small arrays
/blogs/dan_patterson/2018/08/19/focal-and-local-statistics-for-rasters-without-the-spatial-analyst and
https://www.arcgis.com/home/item.html?id=564707aec4f144c083e57656dae98022
This would scale if the window size is larger since there is more 'sorting' needed to arrange the values for median determination. Here is an example for a small floating point array with a 3x3 moving window and a median and mean calculation. If you have nodata values the np.mean and np.median would have to be replaced with np.nanmean and np.nanmedian respectively. It would take longer still to mask out the nodata values prior to the calculations
import arraytools as art
a = np.random.randint(0, 10, (1000,1000))*1.
arr = art.stride(a)
av = np.mean(arr, axis=(1,2))
md = np.median(arr, axis=(1,2))
%timeit np.mean(arr, axis=(1,2))
34.5 ms ± 263 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.median(arr, axis=(1,2))
146 ms ± 3.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)