The performance of statistical downscaling methods is critically reassessed with respect to their robust applicability in climate change studies. For this purpose, we cross-validate the different techniques focusing on three different aspects: a) accuracy scores (correlation, RMSE, etc.), b) distributional similarity scores (KS statistic and PDF-score), and c) stationarity/robustness under climate change conditions. In the latter case we consider anomalous cold/warm periods of the observational record ––which serve as surrogates of possible climate alterations-- and test the performance of the methods, as compared to random periods. It is found that a near-perfect association between observed and downscaled series (e.g. in terms of the Pearson Correlation) may hide important shortcomings in terms of robustness and distributional similarity. We found that this adverse effect can be alleviated including near-surface temperature (e.g. 2m temperature) information into the predictor field, instead of only relying on data of the free troposphere. Free tropospheric absolute humidity should not be used as predictor since it leads to non-robust results in any case. In order to determine whether or not near-surface temperature is a suitable predictor for climate change downscaling applications, we tested GCM outputs in control scenarios w.r.t. the reanalysis predictors used to train the statistical downscaling methods. Moreover, we analyze both the standard perfect prognosis approach (applied to GCM predictors and used in this field during the last two decades) and some recent MOS-like versions suitable to be applied to RCM data. Examples are provided using methods and data from the ENSEMBLES project.