Multiple imputation is widely used to handle missing data, but standard implementations assume independent data. Recent developments enable imputation of multilevel (clustered) data, such as data from multi-centre studies and individual participant data meta-analysis. This course describes the difficulties in handling missing values in such data: notably the challenge of systematically missing data (where a variable is missing for all individuals in a cluster), and the importance of respecting the hierarchical structure of the data. We will give some theoretical background and show how the imputation model must be tailored to the intended form of analysis. We will then describe the two main families of imputation methods for multilevel data that are available in statistical software packages, joint modelling and chained equations (fully conditional specification), and summarise their strengths and weaknesses. The course will end with a practical session in which participants may apply the methods in R to data that we provide, and/or have further discussion.
By the end of the course, participants should understand the difficulties of multiply imputing multilevel data, understand the strengths and weaknesses of two main families of imputation methods, and be able to apply them to their own data.