Automated home cage experiments have been proposed as an alternative to the classical tests used for behavioural phenotyping. As the name implies, automated home cage experiments are conducted in home cage environments and the behaviour is recorded automatically. The experiments can thus be conducted without human interference and can last for several days.

All data incorporated in this thesis is collected using a PhenoTyper® system (Noldus Information Technology, Wageningen, The Netherlands). The Pheno- Typer is a home cage environment with an integrated top-view camera. The ex- act location of the rat or mouse is determined for every frame in the video. Be- havioural response variables such as Distance Moved or Duration Progressing are extracted from the location.

Data from automated home cage experiments typically consists of multiple re- sponse variables that can be highly correlated. In addition to the location-based activity response variables, automated home cage environments have the poten- tial to incorporate data from other sources such as biometric parameters.

The aim of this thesis is to expand the methodology available to analyse these data.

In Chapter 2, the use of multivariate statistics for data from automated home cage experiments is demonstrated in two case studies. Data from automated home cage experiments is pre-dominantly analysed using univariate statistics in which the significance and magnitude of the effect of a treatment on a single response variable is tested. By analysing single response variables the benefit that au- tomated home cage experiments allow for the collection of numerous response variables simultaneously is not fully utilized. The use of multivariate statistics allows for simultaneous analysis of multiple response variables. The multivari- ate methods described in Chapter 2 are Redundancy Analysis (RDA) and Principal Response Curves (PRC). Both of these methods are frequently used in (aquatic) ecology, toxicology, and microbiology. RDA is a constrained form of Principal Components Analysis (PCA). RDA describes the underlying structure of a data set in terms of the explanatory variables (such as experimental treatment). It quan- tifies the proportion of variance in the data set that can be described using these explanatory variables. PRC is a special case of RDA used to describe experimen- tal multivariate longitudinal data. It estimates differences among treatments on a collection of response variables over time and the extent to which the response of those individual response variables resembles the overall response. In both case studies, the multivariate analyses were able to draw the same main conclusions as the contrasting univariate analyses. The advantages of using a multivariate analysis rather than a univariate analysis on a single response variable is that the multivariate methods provide a graphical representation of the data set, are easy to interpret, and allow for estimation of the relation between response variables.

In Chapter 3, a novel extension to PRC is presented that allows for response variable selection using permutation testing. Often, not all of the response vari- ables included in PRC are affected by the treatment which can make response vari- able selection desirable. One approach is to use a straightforward cut-off value for coefficient size. Because coefficient size of response variables are affected by more factors than effect-size alone, results of this approach can be variable between data sets. A backward selection approach was expected to give a more robust result. Four backward selection approaches based on permutation testing were presented. The approaches differ in whether coefficient size is used or not in ranking the response variables to test. The performance of these approaches was demonstrated in a simulation study using a well known data set in the field of aquatic ecology. The permutation testing approach that uses information on coefficient size of RVs sped up the algorithm without affecting its performance. This most successful permutation testing approach removed roughly 95% of the response variables that are unaffected by the treatment irrespective of the char- acteristics of the data set (which is a desirable property of a statistical test) and, in the simulations, correctly identified up to 97% of response variables affected by the treatment.

In Chapter 4, a case study is used to illustrate the power of combining mecha- nistic and statistical modelling, and the benefits of simulation studies. In this case study, an integrated analysis of two streams of information: activity response variables per rat and Ultrasonic Vocalisations (USVs) per cage (containing a pair of rats). USVs are crucial in the social behaviour of rats. The aim of the first part of the chapter was to develop methodology to predict the USV-rate of the pair of rats as a function of the activity of the individuals. A mechanistic model is that the USV-rate of the pair of rats is the sum of the USV-rates of the two individuals depending on their own behaviour (“sum-of-rates” model). It turns out that this “sum-of-rates” model can be fitted to data using a Composite Link Model (CLM) approach. In generalized linear models (GLM) the individual’s USV-rates are mul- tiplied rather than summed. A simulation study verified that CLM gave a better fit (lower Poisson Deviance) than GLM. In the second part of the chapter, data from an experiment in which half of the cages did allow the rats of the pair to interact (Pair Housing) and the other half did not (Individual Housing). A num- ber of models was fitted to investigate whether there is evidence that interaction between rats affects their behaviour. The “sum-of-rates” model fit best for In- dividual Housing and GLM for Pair Housing. This difference in fit supports the hypothesis that interaction between rats affects their behaviour. An additional simulation study strongly suggested that this difference was not due to chance and that the underlying mechanism that links activity and USVs structurally dif- fered between Pair Housing and Individual Housing.

In Chapter 5, a simulation study is described that evaluates the performance of a new and promising statistical learning method under circumstances relevant for automated home cage experiments. Targeted Maximum Likelihood Estima- tion (TMLE) is a new and promising statistical method for causal effect estima- tion, even in observational studies, that can use machine learning methods to increase performance. The intended role of TMLE in the analysis of home cage ex- periments was to account for inter-individual variation in behaviour when test- ing specific treatment effects. TMLE is a doubly robust method, which means that it is robust to misspecification of either the treatment outcome model or the treatment assignment model. A treatment outcome model predicts the effect of a treatment on the response variable given the covariates. A treatment assignment model predicts the probability that an individual is in a treatment group given the covariates. In theory, when all assumptions are correct, TMLE should thus pro- vide unbiased causal effect estimators even when either the treatment outcome or treatment assignment model is misspecified. When TMLE is applied in prac- tice however, it is possible that these required theoretical assumptions such as the positivity assumption and no unobserved confounders are violated. The sim- ulation study in Chapter 5 illustrates the effects of unobserved (non-)confounding covariates and noise covariates on bias, mean square error, and coverage of TMLE on near-balanced data sets (with low risk of positivity violations) and unbalanced data sets (with higher risks of positivity violations). The conclusion was that TMLE is able to estimate average causal effects with low bias and mean square error, compared to the golden standard linear regression, given that the sample size is large, the data set is near-balanced, and the assignment model is specified cor- rectly. In unbalanced data sets TMLE did not live up to expectations, also in data sets in which the positivity assumption was not violated. The conclusion from the simulation study is that TMLE is as yet not suited for the intended use in home cage experiments.

In Chapter 6, the General Discussion, the main findings of the thesis are sum- marised and discussed in relation to the aim of the thesis. In addition, several hot topics in biostatistics for automated home cage experiments are discussed.