Modelers typically test alternative conceptualizations of a groundwater system because the ‘true’ hydrogeological model of the system is unknown, and the observed data are limited. This study elaborates on how to rank different conceptualizations of a groundwater system by using selection techniques and criteria that make a balance between complexity and accuracy of models. Depending on the parameterization method, number of spatial zones in parameterization, and the calibration approach, 30 different models have been built and compared against 12 discriminating selection criteria in the Dehloran groundwater system application in Iran. The validation results have shown that the best model calibrated by a two-step approach is far superior than the one calibrated by a one-step approach. Nevertheless, the low ranked models of the first step should not be disregarded, because the high ranked models selected in the first and second steps of calibration are not necessarily the same. In addition, although the ratio of the number of total observations to the number of parameters strongly affects the performance of a selection criterion, a model's complexity assessment term of a number of selection criteria is not continuous with respect to the number of parameters that causes an inappropriate performance of these criteria at some certain points.