1, 2, 3, Go!

Photo by Anna Tukhfatullina Food Photographer/Stylist on Unsplash
Recently when I was doing analysis, I happened to find out there are different types of ANOVA (ANalysis Of VAriance) tests and it made me very curious what are the differences in various types of ANOVA tests.
In this post, I will be exploring the different types of ANOVA.
This is also known as the “sequential” sum of squares.
This is the ANOVA we will learn during our simple regression class.
Type II measures the marginal effect of the variable.
This method also follows the principle of marginality.
Under the principle of marginality, the lower levels of variable need to exist in the model for the higher orders to be included.
For example, if the interaction term is tested to be significant, then both main effects need to be included in the model even if the main effect is tested to be insignificant.
Another example is if the higher order terms are tested to be significant and we would like to keep the relevant term in the fitted model, all the lower order terms should be included even if any of the lower order terms are tested to be insignificant.
In theory, the result from Type III ANOVA should be the same as Type II when there is no interaction term.
Type III ANOVA violates the principle of marginality.
The author of the package also discouraged users from using Type III ANOVA if one does not understand the issue of Type III ANOVA. Refer to the documentation for more info.
Nevertheless, below is the comparison of different types of ANOVA (Cross Validated 2016):

All three types of ANOVA tests would produce the same results if the data is balanced and factors are orthogonal (i.e., the independent variables are uncorrelated) (nzcoops 2011).
In this demonstration, I will be using several methods to fit an ordinal logistic regression.
pacman::p_load(tidyverse, janitor, car)
I will be using this travel insurance dataset I found on Kaggle for this demonstration.
df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2021-08-31-naive-bayes/data/travel%20insurance.csv") %>%
clean_names() %>% # clean up the column naming
select(-c(gender, product_name, destination)) %>%
filter(net_sales > 0
,duration > 0
,age < 100) %>%
mutate(claim = factor(claim)
,rand_noise = rnorm(nrow(.), 0, 2))
I will also add a random noise as one of the columns.
First, I build a simple model without any interaction term.
logit_fit <-
glm(claim ~
agency_type
+ distribution_channel
+ net_sales
+ age
+ rand_noise
,data = df
,family = "binomial")
logit_fit
Call: glm(formula = claim ~ agency_type + distribution_channel + net_sales +
age + rand_noise, family = "binomial", data = df)
Coefficients:
(Intercept) agency_typeTravel Agency
-2.45374 -1.40250
distribution_channelOnline net_sales
-0.68169 0.00701
age rand_noise
-0.01774 -0.01024
Degrees of Freedom: 59775 Total (i.e. Null); 59770 Residual
Null Deviance: 9456
Residual Deviance: 8524 AIC: 8536
First, we will perform Type I ANOVA test, which is also known as the sequential sum of squares.
In this ANOVA test, the effect is measured in sequence.
In other words, the function will first measure the model with only agency_type as the explanatory variable, then re-compute the effect by adding distribution_channel and so on.
anova(logit_fit, test = "Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: claim
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 59775 9456.0
agency_type 1 552.42 59774 8903.5 < 2.2e-16 ***
distribution_channel 1 2.25 59773 8901.3 0.1332
net_sales 1 342.90 59772 8558.4 < 2.2e-16 ***
age 1 33.55 59771 8524.8 6.941e-09 ***
rand_noise 1 0.36 59770 8524.5 0.5459
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If we were to re-arrange the variables, we would notice that the effects of variables would have changed.
logit_fit_different_order <-
glm(claim ~
rand_noise
+ distribution_channel
+ net_sales
+ age
+ agency_type
,data = df
,family = "binomial")
anova(logit_fit_different_order, test = "Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: claim
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 59775 9456.0
rand_noise 1 0.34 59774 9455.6 0.5581
distribution_channel 1 0.68 59773 9454.9 0.4105
net_sales 1 546.64 59772 8908.3 < 2.2e-16 ***
age 1 16.21 59771 8892.1 5.658e-05 ***
agency_type 1 367.62 59770 8524.5 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is because the ANOVA result under Type I is being computed sequentially. Hence, the result will change once the order of the variables changes.
Unfortunately, the anova function from base R does not support Type II ANOVA.
To perform Type II ANOVA, we will use Anova function from car package.
Anova(logit_fit, type = 2)
Analysis of Deviance Table (Type II tests)
Response: claim
LR Chisq Df Pr(>Chisq)
agency_type 367.62 1 < 2.2e-16 ***
distribution_channel 4.34 1 0.03732 *
net_sales 339.44 1 < 2.2e-16 ***
age 33.64 1 6.632e-09 ***
rand_noise 0.36 1 0.54592
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As shown in the result, we can see that the ANOVA results are different between Type I and Type II.
This is because under Type II ANOVA, the marginal effect of each variable is being computed, whereas the Type I ANOVA will compute the effect in sequential order.
Lastly, I will run the Type III ANOVA test.
Anova(logit_fit, type = 3)
Analysis of Deviance Table (Type III tests)
Response: claim
LR Chisq Df Pr(>Chisq)
agency_type 367.62 1 < 2.2e-16 ***
distribution_channel 4.34 1 0.03732 *
net_sales 339.44 1 < 2.2e-16 ***
age 33.64 1 6.632e-09 ***
rand_noise 0.36 1 0.54592
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA results under Type II and Type III are the same as shown above.
Next, I will build a model with an interaction term and compare the results under different ANOVA tests.
logit_fit_interact <-
glm(claim ~
rand_noise
+ agency_type
+ distribution_channel
+ net_sales
+ age
+ age * distribution_channel
,data = df
,family = "binomial")
logit_fit_interact
Call: glm(formula = claim ~ rand_noise + agency_type + distribution_channel +
net_sales + age + age * distribution_channel, family = "binomial",
data = df)
Coefficients:
(Intercept) rand_noise
-2.001411 -0.010267
agency_typeTravel Agency distribution_channelOnline
-1.401896 -1.152408
net_sales age
0.007009 -0.028546
distribution_channelOnline:age
0.011289
Degrees of Freedom: 59775 Total (i.e. Null); 59769 Residual
Null Deviance: 9456
Residual Deviance: 8524 AIC: 8538
Next, I will perform different types of ANOVA tests on the fitted model.
# type I
anova(logit_fit_interact, test = "Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: claim
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 59775 9456.0
rand_noise 1 0.34 59774 9455.6 0.5581
agency_type 1 552.32 59773 8903.3 < 2.2e-16
distribution_channel 1 2.25 59772 8901.0 0.1334
net_sales 1 342.93 59771 8558.1 < 2.2e-16
age 1 33.64 59770 8524.5 6.632e-09
distribution_channel:age 1 0.53 59769 8523.9 0.4662
NULL
rand_noise
agency_type ***
distribution_channel
net_sales ***
age ***
distribution_channel:age
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# type II
Anova(logit_fit_interact, type = 2)
Analysis of Deviance Table (Type II tests)
Response: claim
LR Chisq Df Pr(>Chisq)
rand_noise 0.37 1 0.54488
agency_type 367.21 1 < 2.2e-16 ***
distribution_channel 4.34 1 0.03732 *
net_sales 339.45 1 < 2.2e-16 ***
age 33.64 1 6.632e-09 ***
distribution_channel:age 0.53 1 0.46619
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# type III
Anova(logit_fit_interact, type = 3)
Analysis of Deviance Table (Type III tests)
Response: claim
LR Chisq Df Pr(>Chisq)
rand_noise 0.37 1 0.54488
agency_type 367.21 1 < 2e-16 ***
distribution_channel 2.44 1 0.11861
net_sales 339.45 1 < 2e-16 ***
age 3.56 1 0.05907 .
distribution_channel:age 0.53 1 0.46619
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the results above, we can see that all three ANOVA tests produce different results.
The interaction term seems to have the same effect under Type II and III. However, the effect for their main effect seems to be adjusted under Type III.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Diane Helentjaris on Unsplash