I'm very new to cluster analysis. In papers such as Richette et al.1 (which tries to see which concomitant diseases cluster together), authors first cluster the variables and then the observations (i.e., patients). (Bevis et al.2, did the same thing.) They used SAS's
PROC VARCLUS and factor analysis (others have used PCA) for clustering variables, and cluster analysis for the patients. I don't understand why they would (need to) do both? In the first paper, all their discussion centered on the latter.
From a mathematical point of view, a standard dataset is just a matrix of numbers organized into rows and columns. We attach meanings to these, and think of the rows as pertaining to patients and the columns as representing variables, but they're just numbers and you can perform mathematical operations on them. The question is whether any given operation is meaningful.
Variables can be understood to be manifestations of some underlying truth that we don't have access to. In such a case, people often seek to combine the variables to get a better picture of the reality. These are called latent variables. The standard is to determine them through factor analysis, but PCA will typically yield almost the same results, and clustering algorithms can be applied to the columns (variables) to do the same thing. The latter guarantees that the result will have simple structure, at the cost of a worse empirical fit. That's presumably what they were after. This is done first because there's no point in clustering patients on the wrong variables—that would bias the results.