Risk factor analysis
To determine if each risk factor was associated with increased risk of a first fall, risk factors were assessed using bivariate logistic regression models stratified by place of residence (as risk factors may differ for those living in the community and those in residential aged care).
Risk factors found to have a significant association with an increased risk of falls were also assessed and reported in multivariable logistic regression models, again stratified by place of residence. This regression produces adjusted odds ratios to determine the association of each risk factor with falls while accounting for other influencing risk factors which may have confounding effects.
Further analysis was undertaken to determine if there were particular groups of risk factors which resulted in severe outcomes following a first hospitalised fall.
Logistic regression
All logistic regression was performed in SAS Enterprise Guide 8.3. The study population for this analysis was the total cohort outlined in Defining dementia and fall cohort – those with a dementia record but not fall record prior to 2019. The modelling outcome is a first hospitalised fall in 2019.
Bivariate logistic regression analysis
All risk factors identified from literature and converted to binary (present/ absent) variables were first assessed using bivariate logistic regression for statistically significant association (that is, p <0.05) with increased likelihood of experiencing a fall in the study cohort. This analysis was run for individuals in residential aged care and in the community separately, with risk factors associated with a significantly increased likelihood of falls retained for further analysis.
Stratified multivariable logistic regression
Important risk factors identified in the bivariate logistic regression were analysed using multivariable logistic regression models which examined association between risk factors and fall likelihood whilst controlling for other risk factors, as well as for age. These regression models were stratified by sex and by place of residence at reference date (residential aged care or community).
To assess the association between risk factors and the likelihood of a person experiencing their first hospitalised fall, multivariable logistic regression (also known as multiple logistic regression) was used. This was done separately for people living in the community and people living in residential aged care. Only risk factors found to be statistically significant in their association with experiencing a first hospitalised fall were retained for the regression models presented in this report.
A multivariable logistic regression model assesses the change in likelihood of an event occurring – in this case, the first hospitalised fall of an individual – due to a risk factor, whilst controlling for the confounding effect of other risk factors included in the model. The results for logistic regression analysis are generally presented as odds ratios.
The risk factors that were found to have a significant association with falls were used to create risk factor clusters. Since fall-associated risk factors were identified separately for people living in the community and those in residential aged care, cluster analysis was also conducted separately by place of residence. Cluster analysis groups people together who have similar sets of risk factors. Each cluster has a unique risk factor profile that can be used to understand which groups may be at higher risk for falls, or at higher risk for specific outcomes following a fall hospitalisation. The characteristics of falls and outcomes following a fall hospitalisation are presented by clusters and can be used to inform targeted intervention strategies to prevent falls and severe outcomes.
‘Odds’ is the numerical expression for the likelihood of an event occurring. The odds of an event occurring are defined as the ratio of the probability that the event will occur over the probability that the event will not occur.
Here, the odds ratio (OR) represents the likelihood of someone experiencing a first hospitalised fall, given they have a particular risk factor.
- An odds ratio of 1 means that the presence of the risk factor does not affect the odds of experiencing a first hospitalised fall.
- An odds ratio of greater than 1 means that the presence of the risk factor is associated with higher odds (or higher likelihood) of experiencing a first hospitalised fall.
- An odds ratio of less than 1 means that the presence of the risk factor is associated with lower odds (or less likelihood) of the experiencing a first hospitalised fall.
Both unadjusted and adjusted odds ratios were included in analysis but only the adjusted odds ratios are presented in the report. Unadjusted odds ratios represent the magnitude of the association between the risk factor and a first hospitalised fall in isolation (produced by the bivariate regression model), while the adjusted odds ratios represent the magnitude of the association between the risk factor and a first hospitalised fall while controlling for all other risk factors (produced by the multivariable logistic regressions).
It is important to note that the odds ratio is not the same as the relative risk (that is, the probability of an event occurring in a given group of individuals compared with the probability in occurrence in the reference group) but rather a measure of how much more or less likely an event (the odds) is to happen in one group compared to another.
Confidence intervals are also presented with the odds ratios. This is a statistical term describing the range of values within which there is a 95% ‘confidence’ that the true value lies.
Cluster analysis
What is cluster analysis?
Cluster analysis aims to uncover structure in data, such as underlying subpopulations, by assessing similarity across a set of characteristics. This similarity is measurable and can be used to group the data in such a way that those in a group are most similar to each other and least similar to those in other groups. In this study, people with dementia were grouped based on risk factors that were found to increase their likelihood of falling. Each cluster is formed by grouping together people with similar risk factor profiles.
Using the risk factors identified from bivariate logistic regression modelling, a probabilistic clustering method was applied to the aged care and community sub-cohorts separately, to identify groups of individuals with different risk factor profiles. All cluster analyses were performed in R (Version 4.1.3) using the FlexMix package (Leisch 2004).
All risk factors for assessment were first coerced to a matrix as binary variables (1=risk factor present, 0=risk factor absent), with the de-identified personal identifier retained as the row name.
This matrix was then subjected to probabilistic clustering (using Bernoulli mixture modelling methods) in order to allocate each individual to a cluster based upon their combination of risk factor presence. The clustering model ran this cluster allocation 10 times with the maximum likelihood solution retained. This number of iterations was chosen to provide sufficient runs to ensure stability of output without exceeding computational capabilities.
During the clustering process, the analyst defined the number of clusters to be formed and identified the optimum number of clusters. The cluster model was run with iterations of up to 5, 6, 7 or 8 clusters maximum, with the optimum cluster number found to be 5 for both residential aged care and community populations. Details on selecting the optimum number are outlined in ‘How is the ‘right’ number of clusters chosen?’ below.
As de-identified personal identifiers were retained during the clustering process, the cluster group number each individual was assigned to (either 1, 2, 3, 4 or 5) was then rejoined to the original dataset for further analysis of the features and outcomes by cluster group.
Mixture modelling is a probabilistic clustering technique that accounts for a level of uncertainty (that is, assigns a probability of membership to a cluster rather than a hard assignment). Like other iterative clustering techniques (for example, K-means), it randomly assigns a chosen number of starting points, sorts records into clusters based on those starting points, and then iteratively refines the groupings until the clusters stabilise (Stahl and Sallis 2012). In Bernoulli mixture modelling, it is assumed that the data have an underlying Bernoulli distribution, which is a special case of the binomial distribution suited to a set of independent binary events (the risk factors).
Cluster analysis techniques are necessary for analysing correlation between more than 2 variables because they allow us to identify groups of individuals based on the association of all variables (risk factors including health conditions and medications).
Using cluster analysis, allows groups of individuals with similar sets of risk factors to be identified – groups are made of individuals who are most similar to each other in which risk factors they have, and least similar to those in other groups. This can assist in identifying underlying patterns and complex relationships in the data that might be missed if only simple relationships between one risk factor and another were examined.
For example, if a distinct hypertension/ cardiovascular disease cluster and a diabetes/ polypharmacy cluster are identified, the complexity of the data to be compared is reduced, and outcomes can now be compared by clusters, or risk factor profiles. The resulting clusters can also be used to develop hypotheses about the relationships between the variables, to guide further analysis and to inform targets for fall intervention strategies.
Choosing the ‘right’ number of clusters is not always straightforward. It is often a matter of balancing the number of clusters indicated by various measures such as the integrated completed likelihood (ICL) and the information criteria (AIC and BIC), with the aims and desired outcomes of the analysis. For example, the aim of this analysis is to identify profiles within the dementia population who are at risk of falls and poor outcomes, with the desired purpose of using such information to heighten awareness and inform the development of tailored preventative measures. This means that the clusters need to be numerous enough to uncover specific and meaningful profiles, but not so numerous that it is impractical to apply the profiles to policy and prevention. Iterative cluster identification with maximum numbers of clusters set to 5, 6, 7, or 8 were assessed and the output from each used to determine the minimum number of clusters that appear to be stable whilst also having clear cluster identities for reporting.
The BIC (Bayesian Information Criterion) indicated that 8 would be suitable (noting that the number of clusters was capped at 8), but the majority of information gain drop off occurred around 4 or 5 clusters for both the community and residential aged care populations (Figures 1 and 2, respectively). After further exploration of the information gain, 5 clusters were used because it provided a better distinction between medium and high-risk groups in both community and residential aged care populations. This means that the needs of different risk levels can more accurately identified and addressed.
Leisch F (2004) ‘FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R’, Journal of Statistical Software, 11(8), 1–18, doi:10.18637/jss.v011.i08.
Stahl D and Sallis H (2012) ‘Model-based cluster analysis’, WIREs Computational Statistics, 4(4), 341–358, doi:10.1002/wics.1204.