I have a school rating ($1$ thru $4$) instrument that consists of $9$ subscales (e.g., classroom instruction, school management etc.). Under each subscale, I have $6$ items on which rating occurs.
I have hired $6$ raters to each rate $5$ schools. That is, all raters will visit each school and rate it independently.
My goal is to do a G (Generalizability) study to establish the reliability of this rating instrument across all schools and all raters.
Q: First, should I do separate G studies for each subscale?
Second, What is a possible design for my G study?