STATS 302 Assignment 2

Question 1

bellData <- read.csv("bellinfo.csv")
attributes <- bellData[, c(12:16)]
tones <- bellData[, c(2:11)]

bellDataCor <- CCorA(Y=tones, X=attributes, stand.X = TRUE, stand.Y = TRUE, permutations=5000)
bellDataCor$p.perm

## [1] 0.00119976

round(bellDataCor$RDA.adj.Rsq, 2)

## [1] 0.61 0.68

bellDataCor$corr.X.Cx[, c(1,2)]

##          CanAxis1   CanAxis2
## Year    0.4961876 0.81140580
## WEIGHT -0.6836890 0.12586341
## DIAM   -0.9099574 0.06642241
## HEIGHT -0.9195910 0.07178239
## THICK  -0.9120074 0.10433196

bellDataCor$corr.Y.Cy[, c(1,2)]

##               CanAxis1   CanAxis2
## HUMFreq      0.9609400  0.1383602
## HUMDev       0.1805417  0.7074803
## PrimeFreq    0.9625757  0.1297891
## PRIMEDev     0.3073124  0.7895891
## TierceFreq   0.9629003  0.1266125
## TierceDev   -0.6310998 -0.1506503
## QuintFreq    0.9610243  0.1417073
## QuintDev     0.2425238  0.4504014
## NominalFreq  0.9596619  0.1492734
## NominalDev   0.2000615  0.6469457

biplot(bellDataCor)

To start, we’ve performed a canonical correlation analysis on the bell attributes vs bell tones. The first two outputs shown are the “p.perm” and “RDA.adj.Rsq” values. From our highly significant permutation test p-value, we have strong evidence of a relation between the data sets. This is backed up by the RDA adjusted R-squared values of 0.61 and 0.67, telling us that the two data sets predict eachother reasonably well.

After this we can see the correlations of both data sets on the first two canonical pairs. From the first set of correlations we can see that DIAM, HEIGHT, and THICK are the most significant bell attributes to canonical axis 1, while Year seems to be largely represented in canonical axis 2. From the second set we see that the various frequencies of each bell are very strongly represented in canonical axis 1, while deviations are heavily represented in cannonical axis 2.

Finally, our biplot shows two very clear outliers (bell #38 and #39). These bells are much younger than the others, made in 1954 compared to the rest being made in 1929, 1925 and 1905, which is likely the reason for the significant difference.

Question 2

bells1925 <- mutate(arrange(filter(bellData, Year == "1925"), WEIGHT), size = case_when(WEIGHT < 35 ~ "small", WEIGHT < 200 ~ "med", TRUE ~ "large"))

bellsMatrix <- as.matrix(bells1925[, c(3, 5, 7, 9, 11)])
manovaResult <- manova(bellsMatrix~factor(bells1925$size))

bellsResid <- manovaResult$resid
absBellsResid<-abs(bellsResid)

bellsPerm<-manylm(absBellsResid~factor(bells1925$size), cor.type = 'R', test="F")
anova(bellsPerm, resamp='perm.resid')

## Analysis of Variance Table
## 
## Model: manylm(formula = absBellsResid ~ factor(bells1925$size), test = "F", 
## Model:     cor.type = "R")
## 
## Overall test for all response variables
## Test statistics:
##                        Res.Df Df.diff val(F) Pr(>F)  
## (Intercept)                30                        
## factor(bells1925$size)     28       2  10.77  0.038 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Arguments:
##  Test statistics calculated assuming unconstrained correlation response 
##  P-value calculated using 999 iterations via residual (without replacement) resampling.

boxplot(bellsResid~factor(bells1925$size))

Above, we have performed a Levene’s test for equal covariance along with making box plots of the residuals of each group. The Levene’s test gives us a slightly significant p-value for a difference between groups, but comparing our box plots suggests we have approximately equal covariance. Equal group sizes also strengthen the robustness of our MANOVA against slightly different covariances.

Question 3

sigma<-summary(manovaResult)$SS$Resid/(nrow(bells1925)-3)

bellsMah <- mahalanobis(manovaResult$residuals,0,sigma)
ks.test(bellsMah, "pchisq", 5)

## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  bellsMah
## D = 0.24341, p-value = 0.04211
## alternative hypothesis: two-sided

qqplot(qchisq(ppoints(nrow(bells1925)),ncol(manovaResult$resid)),y= bellsMah, xlab="Chi-Sq quantiles",
       ylab="Bells Mahalaobis distances")
qqline(bellsMah, distribution=function(p) qchisq(p, df=ncol(manovaResult$resid)))

Here we have used Mahalanobis distances to asses multivariate normality by comparing them to a chi-squared distribution. We’ve also plotted our data compared to what we would expect from a perfect chi-squared distribution. From our Kolmogorov-Smirnov test, we have a slightly significant p-value which suggests a permutation test may be best for our MANOVA.

Question 4

manovaPerm<-manylm(bellsMatrix~factor(bells1925$size), cor.type='R', test="F")
anova(manovaPerm, resamp="perm.resid")

## Analysis of Variance Table
## 
## Model: manylm(formula = bellsMatrix ~ factor(bells1925$size), test = "F", 
## Model:     cor.type = "R")
## 
## Overall test for all response variables
## Test statistics:
##                        Res.Df Df.diff val(F) Pr(>F)   
## (Intercept)                30                         
## factor(bells1925$size)     28       2  14.13  0.006 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Arguments:
##  Test statistics calculated assuming unconstrained correlation response 
##  P-value calculated using 999 iterations via residual (without replacement) resampling.

boxplot(bellsMatrix~factor(bells1925$size))

Finally, we have performed a MANOVA test using permutation due to a question of multivariate normality. A significant p-value suggests that we do have differences between groups. We should be careful with this result, however, because of the slightly unequal covariance we found in Q2. Also displayed are boxplots representing the original deviation variables under our three sizes. From this plot it appears that the groups have very similar averages which is further reason to question the conclusion of our MANOVA for interpreting a difference in means.

STATS 302 Assignment 2

Liam Rohrer, lroh486, 973023817

Question 1

Question 2

Question 3

Question 4