# Likert scale question divided into different group. How to calculate mean of different group?

by Rajib   Last Updated September 12, 2019 03:19 AM

I want to do a survey to determine how satisfied the (16) employees are with the company training program. The survey has 30 questions, each using a 5 point Likert scale for responses. The questions are divided into different groups (9 for the utility of the program, 6 regarding the trainer, 7 regarding the balance of the program, 2 about training content, 4 about the training facilities, and 2 about the implementation of training. I have calculated the mean, mode, frequency, percent and score of each question, but I need a result for a group of questions. Specifically, a result for the 9 questions regarding the utility of the program. I am currently thinking of using the mean or score of the group of the questions.

What else I can do with this data? Please give suggestions.

Tags :

The mean would be totally acceptable. Some people like to think that it doesn't make sense because the distance between two values (e.g. 2 - 3 vs. 4 - 5) are not necessarily homogeneous. Those aren't the kinds of issues that keep me awake at night. Another option is to consider proportions of top-box responses. In survey design, you can look at the binomial outcome of getting a 5 or 4 or above as Y/N indicators of achieving a desired level of satisfaction. You would easily be able to cluster within sets of questions by taking the average number of top-box responses. A 100% here would indicate that the trainer achieved top-box responses on all questions whereas 66% indicates that 4 out of 6 achieved satisfactory levels.

Often questions are grouped according to domains (your groups are like domains) and comparisons are made on the domains by summing the individual question scores (equivalent to averaging). This makes sense if the scoring is consistent from one question to another.

See this question: Analyzing Likert scales

Agresti does a lot of this ordinal data analysis (e.g., "Analysis of Ordinal Categorical Data").

For your particular problem, I would suggest looking at three methods: multiple hypothesis testing http://en.wikipedia.org/wiki/Multiple_comparisons, mixed effects models http://en.wikipedia.org/wiki/Mixed_model package lme4 function lmer() in R, and cumulative link mixed models http://cran.r-project.org/web/packages/ordinal/vignettes/clmm2_tutorial.pdf package ordinal function clmm() in R.

In general, I wouldn't recommend doing traditional multiple testing since that assumes the data is ratio (rather than ordinal like you have). If you want to make that assumption though, you can just test to see which questions have an average response different from to the center of the Lickert scale, and then use a correction to take into account the fact that you did 9+6+7+2+4+2 tests.

For the mixed effects models use random effects, and treat each group of questions separately ("utility of the program", etc.). Treat each question as a random effect (there is a population of possible questions you could have chosen, and you happened to pick these 9 questions about utility), and treat the respondent as a random effect (there is a population of possible people who you want to gather opinions about, and you happened to sample this group). Hence, the model is $y_{ij}=\mu + a_i + b_j + e_{ij}$ where $y_{ij}$ is the response of person $i$ to question $j$, $a_i$ is the random effect due to person $i$ (you have 16 people), $b_j$ is the random effect due to question $j$ (you have 9 questions in the group "utility"), and $e_{ij}$ is the error of how much person $i$'s response to question $j$ differed from the model.

Using the lme4 package, you can estimate $\mu$ and test if it is significantly different from the center of the Likert scale.

Using the ordinal package, you can do this more carefully taking into account that your data is ordinal instead of ratio, but you lose some of the interpretability of the linear mixed effects model.

Those packages use a sort of funny notation. Suppose your data is in a dataframe called dat with columns response, question, person. Then you can implement this as follows:

require(lme4)
lmer(response ~ 1 + (1 | question) + (1 | person), data=dat)
require(ordinal)
clmm(ordered(response) ~ 1 + (1 | question) + (1 | person), data=dat)


With 16 respondents, you may be better served just by reading each person's survey responses individually, and following up with them to understand any particularly high or low scores :-) But, since this is a statistical Q&A site, I'll discuss the statistical approach.

Ideally, the survey would ask an overall question about the utility of the program and use that as your "success" metric, and the 9 individual questions as your "diagnostic" metrics, and so on for each overall topic.

Lacking this, I would recommend just using the same measures you use for the individual questions. Mean, Median, Top-box score, etc. run on the entire set of responses to all 9 questions. The drawback to this approach is that it assumes that each of the 9 questions are independent and equally important. In reality, with 9 questions about the utility of the program, you probably have several questions asking basically the same thing, placing undue weight on those topics in your analysis. You also probably have a couple of questions asking about things the respondent doesn't actually associate with more overall utility.

You could somewhat correct for overlapping topics if you had unlimited amounts of time through factor analysis (identifying groups of dimensions that can be combined into a single combined measure), although even this approach is limited as it can be tricked by coincidences. You can't correct for the relative importance of different topics, because there is nothing in the data that tells you what is more important.

You are getting some good responses here. I will see if I can organize some of this information and add some other bits to create a fuller picture for you. Your project appears to be entirely descriptive rather than inferential, so I think you don't have to worry about as much. For the most part, as several others suggest, I think you can probably just average the ratings from the 9 questions regarding utility (since that's what you're interested in) for the 16 participants. People are often concerned about Likert items being ordinal rather than interval in nature, but when combining lots of Likert items into a scale, it can often be reasonable to consider the scale as roughly interval. Is yours truly interval? We'll never really know, but it probably doesn't matter in this context. (Here's a great discussion of the issues on CV.) Furthermore, you can calculate the standard deviation, from that the standard error (i.e., $SD/\sqrt{16}$), and then a 95% confidence interval can be approximated by multiplying the SE by 2 and adding (& subtracting) the product from your mean. This interval can serve as a measure of how much the mean might be likely to bounce around if you were to do this again. Of course, be sure all your items are scored the same way before you do all of this.

As for the rest of your topics, it sounds like you are less interested in them. You could always follow this procedure with them anyway, but you really need to have several items for this approach to be reasonable. Four seems like a bare minimum, and I definitely would not want to combine two items. If you haven't administered the survey already, consider coming up with other items to probe people's assessments of those topics. I would generally prefer 5-8. In the long run, if you wanted a more sophisticated approach to developing an instrument that would allow you to measure satisfaction with the program, you should look into factor analysis and related methods.