
Photo by Grianghraf on Unsplash
MANOVA stands for Multivariate ANOVA or Multivariate Analysis Of Variance. It’s an extension of regular ANOVA. The general idea is the same, but the MANOVA test has to include at least two dependent variables to analyze differences between multiple groups (factors) of the independent variable (Radečić 2022).
I personally find this website explains the concept quite well.
Multivariate ANOVA could take into account the correlations between the dependent variables.
The method also provides us the following benefits (Frost):
Greater statistical power: When the dependent variables are correlated, MANOVA can identify effects that are smaller than those that regular ANOVA can find.
Assess patterns between multiple dependent variables: The factors in the model can affect the relationship between dependent variables instead of influencing a single dependent variable. As the example in this post shows, ANOVA tests with a single dependent variable can fail completely to detect these patterns.
Limits the joint error rate: When you perform a series of ANOVA tests because you have multiple dependent variables, the joint probability of rejecting a true null hypothesis increases with each additional test. Instead, if you perform one MANOVA test, the error rate equals the significance level.
Below are some of the materials I refer to while exploring MANOVA:
https://statisticseasily.com/manova-assumptions/
https://statisticsbyjim.com/anova/multivariate-anova-manova-benefits-use/
https://online.stat.psu.edu/stat505/lesson/8
In this demosntration, I will be using several methods to fit an ordinal logistic regression.
pacman::p_load(tidyverse, tidymodels, car, palmerpenguins, rstatix, micompr, MASS)
I will be using the penguins dataset for this demonstration.
gen_recipe <-
recipe(species ~ .
,data = penguins) %>%
step_impute_bag(all_predictors()) %>%
step_normalize(all_numeric_predictors()) %>%
prep()
df_imputed <- bake(gen_recipe, penguins)
Before conducting the MANOVA test, I will first check the assumptions.
df_imputed %>%
group_by(species) %>%
tally()
# A tibble: 3 × 2
species n
<fct> <int>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
We could use the identify_outliers function to find the potential outliers in the dataset.
In rstatix package, the outlier is defined as any values above third quantile + 1.5 times of interquartile range or below first quantile - 1.5 times of interquartile range.
# bill length
df_imputed %>%
group_by(species) %>%
identify_outliers(bill_length_mm)
# A tibble: 2 × 10
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <dbl>
1 Gentoo Biscoe 2.88 -0.0761 2.07
2 Gentoo Biscoe 2.20 -0.0761 1.92
# ℹ 5 more variables: body_mass_g <dbl>, sex <fct>, year <dbl>,
# is.outlier <lgl>, is.extreme <lgl>
# bill depth
df_imputed %>%
group_by(species) %>%
identify_outliers(bill_depth_mm)
# A tibble: 1 × 10
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <dbl>
1 Adelie Torgersen 0.383 2.20 -0.492
# ℹ 5 more variables: body_mass_g <dbl>, sex <fct>, year <dbl>,
# is.outlier <lgl>, is.extreme <lgl>
# flipper
df_imputed %>%
group_by(species) %>%
identify_outliers(flipper_length_mm)
# A tibble: 2 × 10
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <dbl>
1 Adelie Biscoe -1.10 0.735 -2.06
2 Adelie Torgersen 0.0345 0.431 0.645
# ℹ 5 more variables: body_mass_g <dbl>, sex <fct>, year <dbl>,
# is.outlier <lgl>, is.extreme <lgl>
From the results shown above, we could see that there are some outliers, but none of these points are extreme points.
To find the multivariate outliers, we could use mahalanobis_distance function from rstatix package to compute.
df_imputed %>%
group_by(species) %>%
mahalanobis_distance(c(bill_length_mm, bill_depth_mm, flipper_length_mm)) %>%
filter(is.outlier == TRUE)
# A tibble: 1 × 7
bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2.58 0.329 -1.42 -0.628 -1.26
# ℹ 2 more variables: mahal.dist <dbl>, is.outlier <lgl>
(Bonarota, Farnisa, and Zauner 2021) For sample sizer larger than 50, QQ-plots are recommended over Shapiro-Wilks, which becomes more sensitive to larger sample sizes.
df_imputed %>%
group_by(species) %>%
shapiro_test(bill_length_mm, bill_depth_mm, flipper_length_mm) %>%
arrange(variable)
# A tibble: 9 × 4
species variable statistic p
<fct> <chr> <dbl> <dbl>
1 Adelie bill_depth_mm 0.985 0.0926
2 Chinstrap bill_depth_mm 0.973 0.142
3 Gentoo bill_depth_mm 0.977 0.0325
4 Adelie bill_length_mm 0.994 0.733
5 Chinstrap bill_length_mm 0.975 0.194
6 Gentoo bill_length_mm 0.972 0.0117
7 Adelie flipper_length_mm 0.993 0.712
8 Chinstrap flipper_length_mm 0.989 0.811
9 Gentoo flipper_length_mm 0.963 0.00178
From the results above, if we were to take p-value cutoff of 0.05, we could see that the data is normally distributed, except for following:
Gentoo’s bill_depth_mm
Gentoo’s bull_length_mm
Gentoo’s flipper_length_mm
var_list <- c("bill_length_mm", "bill_depth_mm", "flipper_length_mm")
for(i in var_list){
print(
df_imputed %>%
ggplot(aes(sample = get(i))) +
stat_qq() +
stat_qq_line() +
labs(title = i)
)
}



We will use mshapiro_test function to check for the overall multivariate normality assumption.
# A tibble: 1 × 2
statistic p.value
<dbl> <dbl>
1 0.972 0.00000334
The p-value is less than 0.05, suggesting that there is no statistical evidence that the multivariate normality assumptions hold.
In this test, we want to ensure the correlations between dependent variables are not too high to one another.
The rule of thumb is to use 0.9 as the cutoff.
df_imputed %>%
cor_test(bill_length_mm, bill_depth_mm, flipper_length_mm) %>%
filter(var1 != var2)
# A tibble: 6 × 8
var1 var2 cor statistic p conf.low conf.high method
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 bill_lengt… bill… -0.24 -4.53 8.27e- 6 -0.335 -0.135 Pears…
2 bill_lengt… flip… 0.66 16.1 6.78e-44 0.593 0.713 Pears…
3 bill_depth… bill… -0.24 -4.53 8.27e- 6 -0.335 -0.135 Pears…
4 bill_depth… flip… -0.59 -13.4 3.92e-33 -0.652 -0.512 Pears…
5 flipper_le… bill… 0.66 16.1 6.78e-44 0.593 0.713 Pears…
6 flipper_le… bill… -0.59 -13.4 3.92e-33 -0.652 -0.512 Pears…
Based on the results, none of the correlations exceed 0.9. All the p-values are also lower than 0.05 (i.e., reject null hypothesis). There is statistical evidence that multicolinearity does not exist in the dataset.
MANOVAs are best conducted when the dependent variables used in the analysis are highly negatively correlated and are also acceptable if the dependent variables are found to be correlated around .60, either positive or negative (Bonarota, Farnisa, and Zauner 2021).
for(i in var_list){
for(j in var_list){
if(i != j){
print(
ggplot(df_imputed
,aes(x = get(i)
,y = get(j)
,color = species))+
geom_point(size = 4)+
geom_smooth(method = lm, se = FALSE) +
xlab(i) +
ylab(j)
)
}
}
}






In this test, we will check whether the assumption on multivariate homogenity of variance hold in this dataset.
This test is highly sensitive and significance for this test is determined at alpha = 0.001 (Bonarota, Farnisa, and Zauner 2021).
box_m(df_imputed[, c("bill_length_mm", "bill_depth_mm", "flipper_length_mm")], df_imputed$species)
# A tibble: 1 × 4
statistic p.value parameter method
<dbl> <dbl> <dbl> <chr>
1 60.3 0.0000000195 12 Box's M-test for Homogeneity of Co…
According to the author, with a balanced design of groups with similar n, violating homogeneity of variances-covariances matrices is not a problem and we could continue the analysis.
However, an unbalanced design will present problems.
In this test, we would check the equality of variances between different groups.
df_imputed %>%
pivot_longer(c(bill_length_mm, bill_depth_mm, flipper_length_mm), names_to = "variable", values_to = "value") %>%
group_by(variable) %>%
levene_test(value ~ species)
# A tibble: 3 × 5
variable df1 df2 statistic p
<chr> <int> <int> <dbl> <dbl>
1 bill_depth_mm 2 341 2.11 0.123
2 bill_length_mm 2 341 2.32 0.0998
3 flipper_length_mm 2 341 0.348 0.706
The p-value is greater than 0.05, so we fail to reject null hypothesis. There is statistical evidence that the variance between different groups are not equal.
manova function from base RThe null hypothesis is the mean is equal for each group.
First, we will create a MANOVA object.
Then, we will pass the MANOVA object into summary function.
Pillai test is more robust against violations of assumptions and recommended for unbalanced design (Bonarota, Farnisa, and Zauner 2021).
summary(manova_fit, test = "Pillai")
Df Pillai approx F num Df den Df Pr(>F)
species 2 1.527 365.84 6 680 < 2.2e-16 ***
Residuals 341
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the result above, we can conclude that there is statistical evidence that at least one species is different from the rest.
To see which groups differ from each other, we could use summary.aov function.
summary.aov(manova_fit)
Response bill_length_mm :
Df Sum Sq Mean Sq F value Pr(>F)
species 2 243.087 121.544 414.83 < 2.2e-16 ***
Residuals 341 99.913 0.293
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Response bill_depth_mm :
Df Sum Sq Mean Sq F value Pr(>F)
species 2 233.63 116.815 364.21 < 2.2e-16 ***
Residuals 341 109.37 0.321
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Response flipper_length_mm :
Df Sum Sq Mean Sq F value Pr(>F)
species 2 267.429 133.715 603.36 < 2.2e-16 ***
Residuals 341 75.571 0.222
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the result above, we could see that the species differ from each other under the different measurements.
We could even visualize the results by using boxplot to see how the different species differ from one another.
df_imputed %>%
pivot_longer(c(bill_length_mm, bill_depth_mm, flipper_length_mm), names_to = "variable", values_to = "value") %>%
ggplot(aes(species, value)) +
geom_boxplot() +
facet_wrap(~variable)

Manova function from car packageIt seems like the Manova function from car package is unable to accept formula. Hence we will need to create a fitted model first.
Then, we will pass into Manova function as shown below.
Manova(lm_fit)
Type II MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
species 2 1.527 365.84 6 680 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To extract the results, I will pass the Manova object into summary function.
summary(Manova(lm_fit))
Type II MANOVA Tests:
Sum of squares and products for error:
bill_length_mm bill_depth_mm flipper_length_mm
bill_length_mm 99.91284 55.10102 41.91315
bill_depth_mm 55.10102 109.37067 44.26359
flipper_length_mm 41.91315 44.26359 75.57077
------------------------------------------
Term: species
Sum of squares and products for the hypothesis:
bill_length_mm bill_depth_mm flipper_length_mm
bill_length_mm 243.0872 -136.6574 183.4852
bill_depth_mm -136.6574 233.6293 -245.3376
flipper_length_mm 183.4852 -245.3376 267.4292
Multivariate Tests: species
Df test stat approx F num Df den Df Pr(>F)
Pillai 2 1.526966 365.8430 6 680 < 2.22e-16 ***
Wilks 2 0.029138 548.9816 6 678 < 2.22e-16 ***
Hotelling-Lawley 2 14.234065 801.8523 6 676 < 2.22e-16 ***
Roy 2 12.735500 1443.3567 3 340 < 2.22e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As shown in the results above, the p-value is less than 0.05, so we will reject null hypothesis.
In the other word, there is statistical evidence that there are differences between the species across the different measurements.
We could extract a particular test statistics by passing the necessary info into test.statistic argument.
Manova(lm_fit, test.statistic = "Wilks")
Type II MANOVA Tests: Wilks test statistic
Df test stat approx F num Df den Df Pr(>F)
species 2 0.029138 548.98 6 678 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We could perform a further analysis on the dataset to understand how does the different groups are similar to one another.
I have made reference to this website on how to conduct such analysis.
dependent_var <- cbind(df_imputed$bill_length_mm, df_imputed$bill_depth_mm, df_imputed$flipper_length_mm)
independent_var <- df_imputed$species
lda_fit <- lda(independent_var ~ dependent_var)
lda_fit
Call:
lda(independent_var ~ dependent_var)
Prior probabilities of groups:
Adelie Chinstrap Gentoo
0.4418605 0.1976744 0.3604651
Group means:
dependent_var1 dependent_var2 dependent_var3
Adelie -0.9396198 0.6074278 -0.7810514
Chinstrap 0.9024536 0.6436504 -0.3625097
Gentoo 0.6568981 -1.0975585 1.1562134
Coefficients of linear discriminants:
LD1 LD2
dependent_var1 0.9338293 -2.0582842
dependent_var2 -1.8493811 -0.0907716
dependent_var3 1.6822012 1.4274989
Proportion of trace:
LD1 LD2
0.8947 0.1053
Once we have fitted the LDA model, I will use the model to generate the LDA factors for each model point.
lda_pred <-
as.data.frame(predict(lda_fit, newdata = df_imputed))
Then, I will plot it to see how the different species are different from one another.
ggplot(lda_pred) +
geom_point(aes(x = x.LD1, y = x.LD2, color = class))

As shown in the chart above, Adelie and Chinstrap are more similar to one another, as compared to Gentoo.
This is consistent with what we observed in the statistical tests earlier.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Cornelius Ventures on Unsplash