|
Project Information
Swim
advisories are issued by beach managers on the basis of standards
for concentrations of bacterial indicators—Escherichia coli (E.
coli) or enterococci for freshwaters and enterococci for marine
waters. The analytical methods for these organisms, however, take
at least 18–24 hours to complete. Recreational water-quality
conditions may change during this time, leading to erroneous
assessments of public-health risk. As a result, some agencies have
turned to predictive modeling to obtain near-real-time estimates of
recreational water quality.
In an earlier study (Francy and others, 2006), predictive models were developed for five Ohio Lake Erie beaches—Huntington (Bay Village), Edgewater and Villa Angela (Cleveland), Lakeview (Lorain), and Lakeshore (Ashtabula). The best model for each beach was based on a unique combination of variables that explained changes in E. coli concentrations. The “variables” included turbidity (water clarity), rainfall, wave height, water temperature, day of the year, and lake level. Although daily monitoring still continues at Lakeview, predictive model development was suspended in 2006.
Work on predictive model development and testing continued during 2007 at four Ohio Lake Erie beaches. At Huntington, predictions based on the Huntington model were available to the public through an Internet-based nowcasting system that began operating in 2006. At Edgewater, Villa Angela, and Lakeshore, models were tested in 2007 for possible use in future nowcasts. The Edgewater model provided more correct responses and fewer false positive and false negative predictions than using the previous day’s E. coli concentration (current method). Edgewater was added to the nowcast in 2008. At Villa Angela and Lakeshore, the model resulted in fewer correct responses than using the current method; work on predictive model development at these two beaches was, therefore, suspended in 2007. A full report on 2007 testing is available by clicking here.
Quick links to the following topics...
Why is this
work being done?
Who are the partners in these studies?
How are
data collected for the nowcast?
How are
samples analyzed for the nowcast?
How is the
nowcast determined?
Next steps
Why is this
work being done?
People
may risk illness from exposure to disease-causing microorganisms (pathogens) at
recreational beaches. Because monitoring for pathogens is difficult and
expensive, beach advisories are issued on the basis of standards for
concentrations of indicator bacteria, such as Escherichia coli (E.
coli). Indicator bacteria do not necessarily cause disease, but they are
present in feces and therefore indicate the possible presence of disease-causing
organisms. For Ohio, the standard used to assess bathing-water quality is the
single-sample maximum level for E. coli of 235 colony-forming units per
100 milliliters (CFU/100 mL). To determine concentrations of E. coli,
however, the bacteria must be cultured, and this takes at least 18 hours. That
means that beach managers are using the previous day’s E. coli results to
evaluate current beach conditions. This delay may result in an erroneous
evaluation of current conditions, because water quality may change overnight.
Agencies that monitor beaches need tools
that can provide quick, reliable indicators of recreational water quality.
Real-time forecasting using mathematical models may help resolve the delayed
notification problems inherent with the present approach. Mathematical models
use easily measured environmental and water-quality variables (“explanatory
variables”), such as wave height and rainfall, to estimate the E. coli
concentrations or the probability of exceeding 235 CFU/100 mL of E. coli.
This method provides a “nowcast” of recreational water quality and is similar to
a weather forecast except current conditions instead of future conditions are
estimated.
Predictive modeling is a dynamic process: that is, models should be continuously validated and refined to improve predictions and better protect public health.
Who are the partners in these studies?
Nowcasts are being developed as part of
several projects that are partnerships between the U.S. Geological Survey (USGS)
and other federal, state, and local agencies. Partners include Ashtabula
Township Park Commission, Cuyahoga County Board of Health (CCBH), Northeast Ohio
Regional Sewer District (NEORSD), Ohio Department of Health, Ohio Department of Natural Resources (ODNR), and the U.S.
Environmental Protection Agency. Funding and (or) services are provided by the
partners, the Ohio Lake Erie Office, and the Ohio Water Development
Authority.
How are
data collected for the nowcast?
Daily data are collected during the recreational season in Ohio (mid May
through early September) and include analysis of water samples for E. coli
and measurement of explanatory variables for model development and testing.
During 2008, data will be collected at Huntington and Edgewater 7 days/week for the nowcast.
Samples are collected between 7 and 9 a.m.
where the water is 2−3 feet deep in areas of the beach used for swimming. All
water-sample bottles are filled about 1 foot below the water surface using a
grab-sampling technique (Myers and Wilde, 2003). Water samples are kept on ice
and analyzed for concentrations of E. coli and turbidity at local
laboratories within 6 hours of collection.
During the data-collection period, field
personnel collect or compile data for environmental and water-quality variables
expected to affect E. coli concentrations.
- Bird counts. Manual counts are made of
the number of birds on the beach upon arrival.
- Wave heights. Wave-height data are
collected by a variety of different methods: the most precise measurement,
when available, is used for predictive models. The methods used, from lowest
to highest precision are as follows:
- At the time of sample collection, wave heights are estimated by the field technician and placed into four categories based on minimum and maximum heights in each wave train: (1) 0 to 2 feet, (2) 1 to 3 feet, (3) 2 to 4 feet, (4) > 3 to 5 feet.
- At the time of sample collection, wave heights are measured by use of a graduated stick. The stick is placed at the sampling location, and minimum and maximum water heights are noted over the course of one minute.
- Hourly wave heights are obtained from a wave-height buoy placed just outside the swimming area at Edgewater, Cleveland, Oh. The buoy is equipped with instrumentation to measure wave heights and store and transmit the data.

- Water temperatures. Water temperature is
measured at the sampling location using an alcohol-filled thermometer.
- Lake levels. Lake-level data are obtained from the National Oceanic and
Atmospheric Administration (NOAA) station in Cleveland (NOAA ID 9063053).
- Site-specific rainfall data. Site-specific rainfall data are obtained from
National Weather Service (NWS) station at Hopkins International Airport for
Huntington and Edgewater.
- Radar rainfall data. To obtain rainfall data from a more widespread
area, radar data are obtained from the NWS. The data are provided for
4-kilometer grids for each hour of the day. For the area around Edgewater, data from 2 grids are used to estimate rainfall amounts in the past 48 hours. For the area around Huntington, data
from 6 grids are used to estimate rainfall amounts in the past 24 hours
(“radar6cell”).

A weighted rainfall variable was calculated
and used in predictive model development. Rainfall weighted 48 hours (Rw48) is 48
hours of cumulative rainfall and gives more weight to the most recent rainfall
amount as follows:
(Rw48) = (2* Rd-1 + Rd-2)
Where Rd-1 is the amount of rain,
in inches, that fell in the 24-hour period (9 a.m. to 9 a.m.) preceding the
morning sampling and Rd-2 is the amount of rain that
fell in the 24-hour period 2 days preceding the morning sampling.
How are
samples analyzed for the nowcast?
Samples are analyzed by use of the modified mTEC membrane-filtration method (U.S. Environmental Protection Agency, 2002). Membrane-filtration equipment and supplies include a manifold and vacuum pump, filter funnel and base, membrane filters, a graduated cylinder and pipets to measure sample volumes, forceps to handle the filter, an alcohol lamp to flame forceps, sterile buffered water to rinse the filter funnel, and agar plates to grow the bacteria.

Several different
volumes of sample water are filtered through a membrane filter with the goal of
obtaining 20–80 colonies on at least one of the agar plates. Usually, sample
volumes of 100, 30, 10, 3, and 1 mL are plated. If the water is suspected to
have concentrations of E. coli in the thousands of colonies per 100
milliliters, serial dilutions of the sample are made.

The bacteria are concentrated on a filter using the manifold and vacuum pump and
filter funnel and base. The filter is placed on an agar plate and incubated at
35°C for 2 hours and at 44.5°C for an additional 20–22 hours. After the
prescribed incubation time, the plates are removed, and those membranes with
magenta colonies are counted as E. coli. Results are calculated to be
reported as colony-forming units per 100 milliliters (CFU/100 mL).

Turbidity measures the scattering effect that suspended
solids have on light: the higher the intensity of scattered light, the higher
the turbidity. Turbidity can make water look cloudy or muddy. Contributors to
turbidity include clay, silt, finely divided organic matter, plankton,
microscopic organisms, and dyes (Anderson, 2005). Turbidity is determined in
water samples with a turbidimeter. Turbidity is reported in nephelometric
turbidity units (NTUs). A second type of turbidity measurement is being tested at Edgewater during 2008--using an in situ probe attached to the Edgewater bouy.
How is the
nowcast determined?
Using data collected in 2000-07 at
Huntington, the steps in development of the nowcast are shown below. The steps
are defining the performance of the model un the current year, identifying explanatory variables, developing a predictive model, and
providing output from the model and determining the threshold. More information
on developing predictive models can be found in a
USGS
on-line how-to report.
Defining the performance of the model in the current year
How did the nowcast system perform in 2006 at Huntington? Nowcasts were provided to the public for 85 days during the recreational season of 2006. The nowcast provided a correct response, 80 percent of time. False positive responses were provided 10 percent of time; that means that the nowcast incorrectly predicted that the standard was exceeded on 6 out of 59 days that the standard was actually NOT exceeded. False negative responses were higher – 42 percent. That means that the nowcast incorrectly predicted that the standard would NOT be exceeded on 11 out of 26 days that the standard was actually exceeded.
Although the false negative rate for the nowcast is higher than we would like, the nowcast still provides more accurate information and better estimates of public health risk than the use of the previous day’s E. coli (the current method used by most beach managers.). During 2006, the previous day’s E. coli provided only 57 percent correct responses. False positives were provided 30 percent and false negatives 72 percent of time.
Identifying explanatory variables
The first step in development of predictive
models is to identify explanatory variables related to E. coli. This was
done by making scatterplots of the continuous data with the measured variable on
the x axis and level of E. coli on the y axis. Continuous data are
measured data that have an infinite range of values. A statistical
test—correlation analysis—was done to provide a quantitative measure of the
relation between the variable and E. coli. The result from correlation
analysis is a Pearson’s r correlation value (r), which measures the linear
(straight line) association between the variable and E. coli
concentrations. If the data lie exactly along a straight line with positive
slope, then the r value is equal to 1 (Helsel and Hirsch, 1992, p. 209). The
more the correlation coefficient deviates from 1 or -1 and approaches zero, the
weaker the relation. Correlation coefficients were considered statistically
significant if the p-value (level of significance) was < 0.05. When the
p-value is <0.05, it means that there is less than a 5% chance that the results
were statistically significant when they were not.
An example of one of the
related variables at Huntington is turbidity. During 2000-06, as E. coli
concentrations increased, turbidity also increased (r=0.48, p<0.0001). However,
the scatter of E. coli concentrations at a given level of turbidity was
considerable. One reason for the scatter is that turbidity does not explain all
of the variability in E. coli concentrations; other explanatory variables
are needed to more fully explain this variability. Rainfall weighted 48
(r=0.39, p <0.0001), wave height (r=0.45, p<0.0001), radar6cell (r=0.60,
p<0.0001), and day of the year (r=0.15, p= 0.0017) were also significantly
related to E. coli at Huntington during 2000–06, although day of the year
was only weakly related to E. coli.
  
Developing a Predictive Model
Different combinations of variables related
to E. coli were tested by use of a statistical approach called multiple
linear regression (MLR). In MLR, a unique set of variables is used to develop a
model that best explains the variation in E. coli concentrations, leaving
as little variation as possible to unexplained “noise”. At Huntington, using
data collected during 2000–06, the best MLR model included the variables wave
height, turbidity, Rw48, radar6cell, and day of the year. The model explained
43% of the variability in E. coli concentrations; this is called the R2
or coefficient of determination of the model.
Providing output from the model and
determining the threshold
Two
types of output values were produced by the models. The first, and simpler,
output, was the predicted E. coli concentration. Because the potential
for error in the predicted E. coli concentration was shown to be fairly
wide in earlier studies, (Francy and Darner, 1998; Francy and Darner, 2002;
Francy and others, 2003), a second output variable was developed to provide a
more accurate prediction of recreational water quality—the
probability of exceeding the Ohio single-sample bathing water standard for E.
coli of 235 CFU/100 mL. This approach results in estimated probabilities similar to those in a weather
forecast.
The results from the model can be used daily
by beach managers and the public. For the model to be useful, the probability
that is associated with too great a risk to allow swimming needs to be
determined. This is the “threshold probability.” Probabilities that are less
than a threshold probability indicate that bacterial water quality that day is
most likely acceptable; the beach manager would not issue an advisory and beachgoers would feel fairly confident that the water is safe for swimming.
Probabilities equal to or above the threshold probability indicate that the
water quality is most likely not acceptable and that a water-quality advisory
may be needed.

The threshold probability for the Huntington model was established by determining the lowest probability that produced the most correct responses and fewest false negative responses. This was done by plotting the Huntington 2000–06 data used to develop the model. Because these data have been examined retrospectively, each point on the graph represents the actual E. coli concentration determined by culturing the sample (x-axis) and the associated computed probability based on the model (y-axis). The plot is divided into four quadrants by a vertical line through 235 CFU/100 mL on the x-axis and a horizontal line through the threshold probability on the y-axis. The four quadrants are:
-
Correct below
the standard. E. coli concentrations were less than 235 CFU/100
mL, and the predicted probabilities were below the threshold.
-
False
positive. E. coli concentrations were less than 235 CFU/100 mL,
but the predicted probabilities were above the threshold.
-
Correct above
the standard. E. coli concentrations were equal to or greater
than 235 CFU/100 mL, and the predicted probabilities were above the
threshold.
-
False
negative. E. coli concentrations were equal to or greater than
235 CFU/100 mL, but the predicted probabilities were below the
threshold.
If one were to raise or lower the horizontal
line, it would change the number of correct and incorrect responses. For
example, a threshold of 33 would have produced the highest number of correct
responses (334), but would also produce a high number of false negatives (36).
False negative responses should be reduced whenever positive because the
recreational water quality was determined to be acceptable when in fact the
standard was exceeded. Instead, selecting a threshold of 30 maintains a high
number of correct responses (332), yet reduces the false negatives (33) and
represents a compromise between false negative and false positive responses.
Next steps
The model development and testing steps described above were refined and used to develop models for Huntington and Edgewater for use in the nowcast during 2008. For a full report on development and testing, click here.
Predictive modeling is a dynamic
process meant to augment existing beach-monitoring programs, not replace
them. Models should be continuously validated and refined to improve
predictions.
References
Anderson, C.W., and Wilde, F.D., eds.,
September 2005, Turbidity (Version 2.1): U.S. Geological Survey Techniques of
Water-Resources Investigations, book 9, chap. A6., section 6.7, accessed March
2006 from
http://water.usgs.gov/owq/FieldManual/Chapter6/6.7_contents.html.
Francy, D.S., Darner, R.A., 1998, Factors affecting Escherichia coli concentrations at Lake Erie public bathing beaches: U.S.
Geological Survey Water-Resources Investigations Report 98-4241, 41 p.
Francy, D.S., Darner, R.A., 2002, Forecasting bacteria levels at
bathing beaches in Ohio: U.S. Geological Survey Fact Sheet FS-132-02. Available at
http://oh.water.usgs.gov/reports/fs-132-02.pdf, 4 p.
Francy, D.S., Gifford, A.M., Darner, R.A., 2003, Escherichia
coli at Ohio bathing beaches—
distribution, sources, wastewater indicators, and predictive modeling: U.S.
Geological Survey Water-Resources Investigations Report 02-4285. Available at
http://oh.water.usgs.gov/reports/Abstracts/wrir02-4285.html, 120 p.
Francy, D.S. and Darner, R.A., 2006,
Procedures for developing models to predict exceedance of recreational
water-quality standards at coastal beaches: U.S. Geological Survey Techniques
and Methods 6-B5, 34 p., available at
http://pubs.usgs.gov/tm/2006/tm6b5/
Francy, D.S., Darner, R.A., and Bertke, E.E.,
2006, Models for predicting recreational water quality at Lake Erie beaches:
U.S. Geological Survey Scientific Investigations Report 2006-5192, 13 p.,
available at http://pubs.usgs.gov/sir/2006/5192/
Helsel, D.R., and Hirsch, R.M., 2002,
Statistical methods in water resources: U.S. Geological Survey Techniques of
Water-Resource Investigation, book 4, chap. A3, accessed March 2006 at
http://pubs.er.usgs.gov/pubs/twri/twri04A3.
Myers, D.N., and Wilde, F.D., eds., 2003,
Biological indicators (3d ed.): U.S. Geological Survey Techniques of
Water-Resources Investigations, book 9, chap. A7, accessed March, 2006 at
http://pubs.water.usgs.gov/twri9A7/
U.S. Environmental Protection Agency, 2002, Method 1603—Escherichia
coli in water by membrane filtration using modified membrane-thermotolerant
Escherichia coli agar: Washington, D.C., EPA 821-R-02-23, 9 p.
|
|
|