**Statistical Analyses**

All analyses outlined here were conducted in the software package R58. We used an exploratory approach to examine the correlation between each health response and potential predictors (outlined in detail in Table S1), including socio-demographic variables, BMI, physical activity (where it was not also the response variable), and the three nature experience measures. We used generalized linear models (binomial) for depression and high blood pressure, linear regression models for social cohesion, and negative binomial generalized linear models for physical activity. The three measures of nature dose were correlated (significant Spearman’s rank test correlations of 0.50–0.57), so to avoid issues associated with multicollinearity we generated four predictor model sets for each health response: (i) all socio-demographic variables (but excluding the frequency, duration and intensity of nature experiences); (ii) socio-demographic variables plus duration of nature experiences; (iii) socio-demographic variables plus frequency of nature experiences; (iv) socio-demographic variables plus nature intensity. Neighbourhood socioeconomic disadvantage (IRSD) was reverse square-root transformed and BMI was log transformed to ensure models met assumptions of normality. We calculated the model averaged coefficient estimates for each predictor variable by generating models with all possible variable combinations, and averaged the coefficient for each across all models in which it was present (using the R package MuMln).

To further explore any relationships which became evident from the analyses above, we conducted dose-response modelling for the two binary health measures (depression and high blood pressure) where there was evidence for an effect of any one of the three nature dose variables. Dose response modelling is readily achieved for binary response variables40; social cohesion and physical activity did not lend themselves readily to this analytical approach because there is no threshold where a score is ‘good’ or ‘bad’. To carry out this approach we first built a logistic regression model where the predictor variables were treated as ‘risk factors’, an established practice in population epidemiology59,60. The relative odds of occurrence of either depression or high blood pressure in an individual were calculated given a person’s specific risk factors (e.g. age) or duration, frequency or intensity of nature experiences. We used only the predictor variables that were statistically significant in the analysis in Table 1, and transformed each into a binary risk factor using existing evidence where possible. For example, for age the risk of being diagnosed with hypertension begins to increase steeply at age 45 years61, and the prevalence of affective mood disorders such as depression begins to decline in Australia at about 4562. We therefore used 45 years to create a binary risk factor above which the risk of having depression was zero, and below one (and vice versa for high blood pressure). Similarly, Australian guidelines recommend physical activity on most, if not all days per week63, and we therefore created a binary risk factor as people who exercised for 30 minutes on 5 days or more (0) and those who did not (1). Respondents who were ‘overweight’ (≥25 BMI64) were categorized as a risk factor of 1, and those under as 0. Where no definitive information was available we used the results from Table 1 to guide the direction of the risk categorization; this includes whether children were present in the home, whether a person works (treated as a binary work or no-work), and income and neighbourhood disadvantage (IRSD; with the binary categorization reflecting whether the respondent fell into the top half or bottom half of the population values). Variables for which no threshold could be estimated were omitted from these analyses (as was the case for social cohesion and nature relatedness).

To create a dose-response curve, we ran the logistic regression models described above with incrementally increased thresholds of nature experiences (e.g. for duration a person’s risk factor was varied based on whether they met incremental thresholds including >0 minutes; ≥15 minutes; ≥30 minutes; ≥45 minutes; ≥1 hour and so forth until the maximum time of 4 hours), and determined the odds ratio that a person who fell within that category would have the condition. We identified the point at which health gains were first recorded as better than the null model on plots of nature dose versus the odds ratio for use in the analysis described below.

A population average attributable fraction analysis was used to estimate the proportion of depression and high blood pressure cases in the population attributable to each of the predictor variables or ‘risk factors’60. Within a multivariate logistic regression environment, each risk factor was removed sequentially from the population by classifying every individual as unexposed (i.e. risk factor of 0). The probability of each person having the disease was then calculated, where the sum of all probabilities across the population was the adjusted number of disease cases expected if the risk factor was not present. The attributable fraction was calculated by subtracting this adjusted number of cases from the observed number of cases. The risk factors were removed in every possible order, and an average attributable fraction from all analyses was obtained.

How to cite this article: Shanahan, D. F. et al. Health Benefits from Nature Experiences Depend on Dose. Sci. Rep. 6, 28551; doi: 10.1038/srep28551 (2016).