Clinical prediction models are an important tool in contemporary medical decision making and abundant in the medical literature. These models estimate the probability/risk that a certain condition is present or will occur in the future by combining information from multiple variables (predictors) from an individual, e.g. predictors from patient history, physical examination or medical testing. Unfortunately, many prediction models perform much worse than anticipated during their development. A major reason for unsatisfactory performance and limited use in clinical practice is that they are typically developed from relatively small datasets, and subsequently used in populations/settings too different from the original development population/setting, without proper validation and adaptation to the new situation. In this talk, I will discuss how we can investigate, quantify and improve the generalizability of prediction models by adopting formal strategies for evidence synthesis. I will highlight the potential advantages of undertaking a systematic review, and present statistical methods to build upon published evidence or multiple sources of individual participant data when developing or validating a prediction model.