Module Exercise 2: Public health (large area) epidemiology


The exercise:

The Australian government Department of Health (federal) produces reports each year containing data on notifiable diseases which are of great use to those studying changes in disease distributions with space or time with the aim of planning country-wide control initiatives. To facilitate similar regional operations, states and territories produce annual Public Health Bulletins, zooming-in on the data at a higher level of resolution.

Part 1:  Access a table for NSW showing disease incidence for the years 2003 to 2012, and produce labelled, computer-generated time trend graphs for giardiasis and HIV infections using an application such as Excel®.

Part 2: Briefly discuss two possible reasons why each of these diseases might have increased or decreased over this period. Reference this discussion.

Aims of the exercise:

  1. To acquire skills in the extraction, presentation, analysis and use ofquantitative informationfrom a large-area epidemiological report.
  2. To develop early perspectives on risk factors for specific diseases, and insight as to how and why these might change with time.


  1. Public Health Bulletins usually include data up to the year before they were published (eg: a 2012 bulletin usually contains data up to 2011).
  2. Departments are sometimes a few years behind with their bulletins, so a bulletin for the year 2013 might not be available until 2015.

iii. For comparison of disease incidence by places or by year, rates (not absolute numbers) are always used in epidemiology. Disease notification rates are usually given per 100,000 population.

Module Exercise 3: Bivariate linear regression analysis (correlation)

Background to the exercise:

As a preliminary step in a large-scale study of asthma in Armidale, New South Wales, you are asked to carry out a study to identify the impact of ambient atmospheric general particulate pollution (PM10) on the incidence of asthmatic wheeze in primary school children. Thermal inversions can occur periodically in the Armidale basin, trapping pollutants from point and diffuse sources in the lower atmosphere.

To ensure an accurate medical diagnosis you select all primary school children attending a day clinic over a 30-day period in April.  In this month, other “confounding” risk factors (such as rainfall) are at relatively low levels, and therefore to some extent controlled.

From trained clinical staff you obtain a daily record of asthmatic wheeze incidence in children presenting for all medical conditions at the clinic during the study period. The daily air quality record is obtained from the Department of the Environment and a short latency period (minutes to hours) between exposure to ambient air particulates and production of symptoms is assumed. You produce the tabulated data shown on the next page.

The exercise:

Part 1: Plot a graph showing the relationship between asthma wheeze and ambient atmospheric particulate matter (PM10) using a recognised computer application such as Excel®. Add a computer-generated line of best fit, assuming a linear relationship. Present the graph for assessment with a comment on the type of correlation (direct or inverse), its electronically-computed strength in terms of Pearson’s Product Moment Correlation Coefficient r  (some versions of the graph on Excel also give this), and a qualitative interpretation of this result (eg: “low correlation”, “moderate correlation”, etc.)

Part 2: Using the formula and table given in the module notes, hand-calculate Pearson’s Product Moment Correlation Coefficient, r. Submit the tabulation used to generate values for the algebraic formula, along with your calculated value for r. Comment on the possible reason for any differences noted between the result obtained in parts 1 and 2.

Aim of the Exercise:

  1. To gain an understanding of the use of bivariate linear regression analysis as a fundamental but powerful epidemiological analytical tool.
  2. To gain a conceptual idea of an industrially generated, environmental risk factor for an important health condition.


Day Total number of children with asthmatic wheeze Total number of children attending the clinic that day Ambient atmospheric particulates (PM10in µg/m3) Blank column for calculated values
1 11 420 40
2 8 230 45
3 11 190 90
4 24 550 60
5 31 643 50
6 39 710 60
7 39 560 360
8 26 302 320
9 19 200 110
10 31 587 70
11 22 589 80
12 21 632 64
13 14 585 50
14 27 602 50
15 22 320 130
16 16 245 220
17 24 558 100
18 26 570 60
19 42 603 40
20 36 555 40
21 46 599 100
22 17 197 160
23 16 197 190
24 26 520 80
25 22 476 50
26 19 600 40
27 14 557 30
28 17 481 40
29 10 225 50
30 10 190 40



  1. If the question looks confusing and perplexing you probably need to go back to the module notes where the approach is clearly explained, and work through an example.
  2. When finished check your calculations thoroughly as marks are awarded for both method and the correct answer. With care it is relatively easy to score 100%.

iii. The first step when working with raw data is always to classify (ie: to construct a table). When in doubt, tabulate, when masses of numbers will always become clearer.

  1. Ensure accuracy by using one more decimal place in your calculations than you intend to give in your answer.
  2. Use the formula in the module notes rather than the one given in text books, which is primarily for statisticians.
  3. When comparing health states (diseases and fitness) always use rates.

vii. Excel® does not do as much as SPSS and Minitab, but is probably the most user-friendly program to use, and links well with Word®. For example, values in the Word table can be cut and pasted into Excel®. Adding the line of best fit in Excel® involves highlighting the graph first by clicking on it, when the menu tab for this function will appear.