Different Types of ANOVA

Machine Learning Supervised Learning

1, 2, 3, Go!

Jasper Lok https://jasperlok.netlify.app/
05-01-2024

Photo by Anna Tukhfatullina Food Photographer/Stylist on Unsplash

Recently when I was doing analysis, I happened to find out there are different types of ANOVA (ANalysis Of VAriance) tests and it made me very curious what are the differences in various types of ANOVA tests.

In this post, I will be exploring the different types of ANOVA.

Different types of ANOVA

Type I ANOVA

This is also known as the “sequential” sum of squares.

This is the ANOVA we will learn during our simple regression class.

Type II ANOVA

Type II measures the marginal effect of the variable.

This method also follows the principle of marginality.

What is the principle of marginality?

Under the principle of marginality, the lower levels of variable need to exist in the model for the higher orders to be included.

For example, if the interaction term is tested to be significant, then both main effects need to be included in the model even if the main effect is tested to be insignificant.

Another example is if the higher order terms are tested to be significant and we would like to keep the relevant term in the fitted model, all the lower order terms should be included even if any of the lower order terms are tested to be insignificant.

Type III ANOVA

In theory, the result from Type III ANOVA should be the same as Type II when there is no interaction term.

Type III ANOVA violates the principle of marginality.

The author of the package also discouraged users from using Type III ANOVA if one does not understand the issue of Type III ANOVA. Refer to the documentation for more info.

Nevertheless, below is the comparison of different types of ANOVA (Cross Validated 2016):

All three types of ANOVA tests would produce the same results if the data is balanced and factors are orthogonal (i.e., the independent variables are uncorrelated) (nzcoops 2011).

Demonstration

In this demonstration, I will be using several methods to fit an ordinal logistic regression.

pacman::p_load(tidyverse, janitor, car)

Import Data

I will be using this travel insurance dataset I found on Kaggle for this demonstration.

df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2021-08-31-naive-bayes/data/travel%20insurance.csv") %>%
  clean_names() %>%  # clean up the column naming
  select(-c(gender, product_name, destination)) %>%
  filter(net_sales > 0
         ,duration > 0
         ,age < 100) %>% 
  mutate(claim = factor(claim)
         ,rand_noise = rnorm(nrow(.), 0, 2))

I will also add a random noise as one of the columns.

Model Building

First, I build a simple model without any interaction term.

logit_fit <-
  glm(claim ~ 
        agency_type
      + distribution_channel
      + net_sales
      + age
      + rand_noise
      ,data = df
      ,family = "binomial")

logit_fit

Call:  glm(formula = claim ~ agency_type + distribution_channel + net_sales + 
    age + rand_noise, family = "binomial", data = df)

Coefficients:
               (Intercept)    agency_typeTravel Agency  
                  -2.45374                    -1.40250  
distribution_channelOnline                   net_sales  
                  -0.68169                     0.00701  
                       age                  rand_noise  
                  -0.01774                    -0.01024  

Degrees of Freedom: 59775 Total (i.e. Null);  59770 Residual
Null Deviance:      9456 
Residual Deviance: 8524     AIC: 8536

Type I ANOVA

First, we will perform Type I ANOVA test, which is also known as the sequential sum of squares.

In this ANOVA test, the effect is measured in sequence.

In other words, the function will first measure the model with only agency_type as the explanatory variable, then re-compute the effect by adding distribution_channel and so on.

anova(logit_fit, test = "Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: claim

Terms added sequentially (first to last)

                     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                 59775     9456.0              
agency_type           1   552.42     59774     8903.5 < 2.2e-16 ***
distribution_channel  1     2.25     59773     8901.3    0.1332    
net_sales             1   342.90     59772     8558.4 < 2.2e-16 ***
age                   1    33.55     59771     8524.8 6.941e-09 ***
rand_noise            1     0.36     59770     8524.5    0.5459    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If we were to re-arrange the variables, we would notice that the effects of variables would have changed.

logit_fit_different_order <-
  glm(claim ~
        rand_noise
      + distribution_channel
      + net_sales
      + age
      + agency_type
      ,data = df
      ,family = "binomial")

anova(logit_fit_different_order, test = "Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: claim

Terms added sequentially (first to last)

                     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                                 59775     9456.0              
rand_noise            1     0.34     59774     9455.6    0.5581    
distribution_channel  1     0.68     59773     9454.9    0.4105    
net_sales             1   546.64     59772     8908.3 < 2.2e-16 ***
age                   1    16.21     59771     8892.1 5.658e-05 ***
agency_type           1   367.62     59770     8524.5 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is because the ANOVA result under Type I is being computed sequentially. Hence, the result will change once the order of the variables changes.

Type II ANOVA

Unfortunately, the anova function from base R does not support Type II ANOVA.

To perform Type II ANOVA, we will use Anova function from car package.

Anova(logit_fit, type = 2)
Analysis of Deviance Table (Type II tests)

Response: claim
                     LR Chisq Df Pr(>Chisq)    
agency_type            367.62  1  < 2.2e-16 ***
distribution_channel     4.34  1    0.03732 *  
net_sales              339.44  1  < 2.2e-16 ***
age                     33.64  1  6.632e-09 ***
rand_noise               0.36  1    0.54592    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As shown in the result, we can see that the ANOVA results are different between Type I and Type II.

This is because under Type II ANOVA, the marginal effect of each variable is being computed, whereas the Type I ANOVA will compute the effect in sequential order.

Type III ANOVA

Lastly, I will run the Type III ANOVA test.

Anova(logit_fit, type = 3)
Analysis of Deviance Table (Type III tests)

Response: claim
                     LR Chisq Df Pr(>Chisq)    
agency_type            367.62  1  < 2.2e-16 ***
distribution_channel     4.34  1    0.03732 *  
net_sales              339.44  1  < 2.2e-16 ***
age                     33.64  1  6.632e-09 ***
rand_noise               0.36  1    0.54592    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA results under Type II and Type III are the same as shown above.

What if there is an interaction term?

Next, I will build a model with an interaction term and compare the results under different ANOVA tests.

logit_fit_interact <-
  glm(claim ~ 
        rand_noise
      + agency_type
      + distribution_channel
      + net_sales
      + age
      + age * distribution_channel
      ,data = df
      ,family = "binomial")

logit_fit_interact

Call:  glm(formula = claim ~ rand_noise + agency_type + distribution_channel + 
    net_sales + age + age * distribution_channel, family = "binomial", 
    data = df)

Coefficients:
                   (Intercept)                      rand_noise  
                     -2.001411                       -0.010267  
      agency_typeTravel Agency      distribution_channelOnline  
                     -1.401896                       -1.152408  
                     net_sales                             age  
                      0.007009                       -0.028546  
distribution_channelOnline:age  
                      0.011289  

Degrees of Freedom: 59775 Total (i.e. Null);  59769 Residual
Null Deviance:      9456 
Residual Deviance: 8524     AIC: 8538

Next, I will perform different types of ANOVA tests on the fitted model.

# type I
anova(logit_fit_interact, test = "Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: claim

Terms added sequentially (first to last)

                         Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
NULL                                     59775     9456.0          
rand_noise                1     0.34     59774     9455.6    0.5581
agency_type               1   552.32     59773     8903.3 < 2.2e-16
distribution_channel      1     2.25     59772     8901.0    0.1334
net_sales                 1   342.93     59771     8558.1 < 2.2e-16
age                       1    33.64     59770     8524.5 6.632e-09
distribution_channel:age  1     0.53     59769     8523.9    0.4662
                            
NULL                        
rand_noise                  
agency_type              ***
distribution_channel        
net_sales                ***
age                      ***
distribution_channel:age    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# type II
Anova(logit_fit_interact, type = 2)
Analysis of Deviance Table (Type II tests)

Response: claim
                         LR Chisq Df Pr(>Chisq)    
rand_noise                   0.37  1    0.54488    
agency_type                367.21  1  < 2.2e-16 ***
distribution_channel         4.34  1    0.03732 *  
net_sales                  339.45  1  < 2.2e-16 ***
age                         33.64  1  6.632e-09 ***
distribution_channel:age     0.53  1    0.46619    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# type III
Anova(logit_fit_interact, type = 3)
Analysis of Deviance Table (Type III tests)

Response: claim
                         LR Chisq Df Pr(>Chisq)    
rand_noise                   0.37  1    0.54488    
agency_type                367.21  1    < 2e-16 ***
distribution_channel         2.44  1    0.11861    
net_sales                  339.45  1    < 2e-16 ***
age                          3.56  1    0.05907 .  
distribution_channel:age     0.53  1    0.46619    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results above, we can see that all three ANOVA tests produce different results.

The interaction term seems to have the same effect under Type II and III. However, the effect for their main effect seems to be adjusted under Type III.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Diane Helentjaris on Unsplash

Cross Validated. 2016. “How to Interpret Type i, Type II, and Type III ANOVA and MANOVA?” https://stats.stackexchange.com/questions/20452/how-to-interpret-type-i-type-ii-and-type-iii-anova-and-manova.
nzcoops. 2011. “Anova - Type i/II/III SS Explained.” https://www.r-bloggers.com/2011/03/anova-%E2%80%93-type-iiiiii-ss-explained/.

References