Project Information

Edgewater, Cleveland, Ohio. Swim advisories are issued by beach managers on the basis of standards for concentrations of bacterial indicators—Escherichia coli (E. coli) or enterococci for freshwaters and enterococci for marine waters. The analytical methods for these organisms, however, take at least 18–24 hours to complete. Recreational water-quality conditions may change during this time, leading to erroneous assessments of public-health risk. As a result, some agencies have turned to predictive modeling to obtain near-real-time estimates of recreational water quality.

In an earlier study (Francy and others, 2006), predictive models were developed for five Ohio Lake Erie beaches—Huntington (Bay Village), Edgewater and Villa Angela (Cleveland), Lakeview (Lorain), and Lakeshore (Ashtabula). The best model for each beach was based on a unique combination of variables that explained changes in E. coli concentrations. The “variables” included turbidity (water clarity), rainfall, wave height, water temperature, day of the year, and lake level. Although daily monitoring still continues at Lakeview, predictive model development was suspended in 2006.

Work on predictive model development and testing continued during 2007 at four Ohio Lake Erie beaches. At Huntington, predictions based on the Huntington model were available to the public through an Internet-based nowcasting system that began operating in 2006. At Edgewater, Villa Angela, and Lakeshore, models were tested in 2007 for possible use in future nowcasts. The Edgewater model provided more correct responses and fewer false positive and false negative predictions than using the previous day’s E. coli concentration (current method). Edgewater was added to the nowcast in 2008. At Villa Angela and Lakeshore, the model resulted in fewer correct responses than using the current method; work on predictive model development at these two beaches was, therefore, suspended in 2007. A full report on 2007 testing is available by clicking here.


Quick links to the following topics...

Why is this work being done?
Who are the partners in these studies?
How are data collected for the nowcast?
How are samples analyzed for the nowcast?
How is the nowcast determined?
Next steps


Why is this work being done?

Lakeshore, Ashtabula, Ohio. People may risk illness from exposure to disease-causing microorganisms (pathogens) at recreational beaches. Because monitoring for pathogens is difficult and expensive, beach advisories are issued on the basis of standards for concentrations of indicator bacteria, such as Escherichia coli (E. coli). Indicator bacteria do not necessarily cause disease, but they are present in feces and therefore indicate the possible presence of disease-causing organisms. For Ohio, the standard used to assess bathing-water quality is the single-sample maximum level for E. coli of 235 colony-forming units per 100 milliliters (CFU/100 mL). To determine concentrations of E. coli, however, the bacteria must be cultured, and this takes at least 18 hours. That means that beach managers are using the previous day’s E. coli results to evaluate current beach conditions. This delay may result in an erroneous evaluation of current conditions, because water quality may change overnight.

Huntington, Bay Village, Ohio. Agencies that monitor beaches need tools that can provide quick, reliable indicators of recreational water quality. Real-time forecasting using mathematical models may help resolve the delayed notification problems inherent with the present approach. Mathematical models use easily measured environmental and water-quality variables (“explanatory variables”), such as wave height and rainfall, to estimate the E. coli concentrations or the probability of exceeding 235 CFU/100 mL of E. coli. This method provides a “nowcast” of recreational water quality and is similar to a weather forecast except current conditions instead of future conditions are estimated.

Predictive modeling is a dynamic process: that is, models should be continuously validated and refined to improve predictions and better protect public health.

 

 


Who are the partners in these studies?

Nowcasts are being developed as part of several projects that are partnerships between the U.S. Geological Survey (USGS) and other federal, state, and local agencies. Partners include Ashtabula Township Park Commission, Cuyahoga County Board of Health (CCBH), Northeast Ohio Regional Sewer District (NEORSD), Ohio Department of Health, Ohio Department of Natural Resources (ODNR), and the U.S. Environmental Protection Agency. Funding and (or) services are provided by the partners, the Ohio Lake Erie Office, and the Ohio Water Development Authority.


lakeview_lorain_ohioHow are data collected for the nowcast?

Daily data are collected during the recreational season in Ohio (mid May through early September) and include analysis of water samples for E. coli and measurement of explanatory variables for model development and testing. During 2008, data will be collected at Huntington and Edgewater 7 days/week for the nowcast.

Samples are collected between 7 and 9 a.m. where the water is 2−3 feet deep in areas of the beach used for swimming. All water-sample bottles are filled about 1 foot below the water surface using a grab-sampling technique (Myers and Wilde, 2003). Water samples are kept on ice and analyzed for concentrations of E. coli and turbidity at local laboratories within 6 hours of collection.

During the data-collection period, field personnel collect or compile data for environmental and water-quality variables expected to affect E. coli concentrations.

  • Bird counts. Manual counts are made of the number of birds on the beach upon arrival.
  • Wave heights. Wave-height data are collected by a variety of different methods: the most precise measurement, when available, is used for predictive models. The methods used, from lowest to highest precision are as follows:
    • At the time of sample collection, wave heights are estimated by the field technician and placed into four categories based on minimum and maximum heights in each wave train: (1) 0 to 2 feet, (2) 1 to 3 feet, (3) 2 to 4 feet, (4) > 3 to 5 feet.
    • At the time of sample collection, wave heights are measured by use of a graduated stick. The stick is placed at the sampling location, and minimum and maximum water heights are noted over the course of one minute.
    • Hourly wave heights are obtained from a wave-height buoy placed just outside the swimming area at Edgewater, Cleveland, Oh. The buoy is equipped with instrumentation to measure wave heights and store and transmit the data.

buoy

  • Water temperatures. Water temperature is measured at the sampling location using an alcohol-filled thermometer.
  • Lake levels. Lake-level data are obtained from the National Oceanic and Atmospheric Administration (NOAA) station in Cleveland (NOAA ID 9063053). 
  • Site-specific rainfall data. Site-specific rainfall data are obtained from National Weather Service (NWS) station at Hopkins International Airport for Huntington and Edgewater. 
  • Radar rainfall data. To obtain rainfall data from a more widespread area, radar data are obtained from the NWS. The data are provided for 4-kilometer grids for each hour of the day. For the area around Edgewater, data from 2 grids are used to estimate rainfall amounts in the past 48 hours. For the area around Huntington, data from 6 grids are used to estimate rainfall amounts in the past 24 hours (“radar6cell”).

Huntington, Bay Village, Ohio.

A weighted rainfall variable was calculated and used in predictive model development. Rainfall weighted 48 hours (Rw48) is 48 hours of cumulative rainfall and gives more weight to the most recent rainfall amount as follows:

 (Rw48) = (2* Rd-1 + Rd-2)

Where Rd-1 is the amount of rain, in inches, that fell in the 24-hour period (9 a.m. to 9 a.m.) preceding the morning sampling and Rd-2 is the amount of rain that fell in the 24-hour period 2 days preceding the morning sampling.


How are samples analyzed for the nowcast?Membrane filtration equipment.

Samples are analyzed by use of the modified mTEC membrane-filtration method (U.S. Environmental Protection Agency, 2002). Membrane-filtration equipment and supplies include a manifold and vacuum pump, filter funnel and base, membrane filters, a graduated cylinder and pipets to measure sample volumes, forceps to handle the filter, an alcohol lamp to flame forceps, sterile buffered water to rinse the filter funnel, and agar plates to grow the bacteria.

 

 

 

 


Serial dilution.

Several different volumes of sample water are filtered through a membrane filter with the goal of obtaining 20–80 colonies on at least one of the agar plates. Usually, sample volumes of 100, 30, 10, 3, and 1 mL are plated. If the water is suspected to have concentrations of E. coli in the thousands of colonies per 100 milliliters, serial dilutions of the sample are made.

E. coli modified mTEC.

 

 

 

 

 


Portable incubators. The bacteria are concentrated on a filter using the manifold and vacuum pump and filter funnel and base. The filter is placed on an agar plate and incubated at 35°C for 2 hours and at 44.5°C for an additional 20–22 hours. After the prescribed incubation time, the plates are removed, and those membranes with magenta colonies are counted as E. coli. Results are calculated to be reported as colony-forming units per 100 milliliters (CFU/100 mL).

Concentrated on filter.

 

 

 

 

 

 

 


TurbidityTurbidity measures the scattering effect that suspended solids have on light: the higher the intensity of scattered light, the higher the turbidity. Turbidity can make water look cloudy or muddy. Contributors to turbidity include clay, silt, finely divided organic matter, plankton, microscopic organisms, and dyes (Anderson, 2005). Turbidity is determined in water samples with a turbidimeter. Turbidity is reported in nephelometric turbidity units (NTUs). A second type of turbidity measurement is being tested at Edgewater during 2008--using an in situ probe attached to the Edgewater bouy.

 

 

 

 

 


How is the nowcast determined?

Using data collected in 2000-07 at Huntington, the steps in development of the nowcast are shown below. The steps are defining the performance of the model un the current year, identifying explanatory variables, developing a predictive model, and providing output from the model and determining the threshold. More information on developing predictive models can be found in a USGS on-line how-to report.

Defining the performance of the model in the current year

TurbidityHow did the nowcast system perform in 2006 at Huntington? Nowcasts were provided to the public for 85 days during the recreational season of 2006. The nowcast provided a correct response, 80 percent of time. False positive responses were provided 10 percent of time; that means that the nowcast incorrectly predicted that the standard was exceeded on 6 out of 59 days that the standard was actually NOT exceeded. False negative responses were higher – 42 percent. That means that the nowcast incorrectly predicted that the standard would NOT be exceeded on 11 out of 26 days that the standard was actually exceeded.

Although the false negative rate for the nowcast is higher than we would like, the nowcast still provides more accurate information and better estimates of public health risk than the use of the previous day’s E. coli (the current method used by most beach managers.). During 2006, the previous day’s E. coli provided only 57 percent correct responses. False positives were provided 30 percent and false negatives 72 percent of time.

 

Identifying explanatory variables

The first step in development of predictive models is to identify explanatory variables related to E. coli. This was done by making scatterplots of the continuous data with the measured variable on the x axis and level of E. coli on the y axis. Continuous data are measured data that have an infinite range of values. A statistical test—correlation analysis—was done to provide a quantitative measure of the relation between the variable and E. coli. The result from correlation analysis is a Pearson’s r correlation value (r), which measures the linear (straight line) association between the variable and E. coli concentrations. If the data lie exactly along a straight line with positive slope, then the r value is equal to 1 (Helsel and Hirsch, 1992, p. 209). The more the correlation coefficient deviates from 1 or -1 and approaches zero, the weaker the relation. Correlation coefficients were considered statistically significant if the p-value (level of significance) was < 0.05. When the p-value is <0.05, it means that there is less than a 5% chance that the results were statistically significant when they were not.

An example of one of the related variables at Huntington is turbidity. During 2000-06, as E. coli concentrations increased, turbidity also increased (r=0.48, p<0.0001). However, the scatter of E. coli concentrations at a given level of turbidity was considerable. One reason for the scatter is that turbidity does not explain all of the variability in E. coli concentrations; other explanatory variables are needed to more fully explain this variability. Rainfall weighted 48 (r=0.39, p <0.0001), wave height (r=0.45, p<0.0001), radar6cell (r=0.60, p<0.0001), and day of the year (r=0.15, p= 0.0017) were also significantly related to E. coli at Huntington during 2000–06, although day of the year was only weakly related to E. coli.

Scatterplot of turbidity versus E. coli, Huntington, Bay Village, Ohio 2000–2006.Scatterplot of rainfall weighted 48 versus E. coli, Huntington, Bay Village, Ohio 2000–2006.Scatterplot of day of the year versus E. coli, Huntington, Bay Village, Ohio 2000-2005.

Developing a Predictive Model

Different combinations of variables related to E. coli were tested by use of a statistical approach called multiple linear regression (MLR). In MLR, a unique set of variables is used to develop a model that best explains the variation in E. coli concentrations, leaving as little variation as possible to unexplained “noise”. At Huntington, using data collected during 2000–06, the best MLR model included the variables wave height, turbidity, Rw48, radar6cell, and day of the year. The model explained 43% of the variability in E. coli concentrations; this is called the R2 or coefficient of determination of the model.

Providing output from the model and determining the threshold

Two types of output values were produced by the models. The first, and simpler, output, was the predicted E. coli concentration. Because the potential for error in the predicted E. coli concentration was shown to be fairly wide in earlier studies, (Francy and Darner, 1998; Francy and Darner, 2002; Francy and others, 2003), a second output variable was developed to provide a more accurate prediction of recreational water quality—the probability of exceeding the Ohio single-sample bathing water standard for E. coli of 235 CFU/100 mL. This approach results in estimated probabilities similar to those in a weather forecast.

The results from the model can be used daily by beach managers and the public. For the model to be useful, the probability that is associated with too great a risk to allow swimming needs to be determined.  This is the “threshold probability.”   Probabilities that are less than a threshold probability indicate that bacterial water quality that day is most likely acceptable; the beach manager would not issue an advisory and beachgoers would feel fairly confident that the water is safe for swimming.  Probabilities equal to or above the threshold probability indicate that the water quality is most likely not acceptable and that a water-quality advisory may be needed.

Probability of exceeding.

The threshold probability for the Huntington model was established by determining the lowest probability that produced the most correct responses and fewest false negative responses. This was done by plotting the Huntington 2000–06 data used to develop the model. Because these data have been examined retrospectively, each point on the graph represents the actual E. coli concentration determined by culturing the sample (x-axis) and the associated computed probability based on the model (y-axis). The plot is divided into four quadrants by a vertical line through 235 CFU/100 mL on the x-axis and a horizontal line through the threshold probability on the y-axis. The four quadrants are:

  1. Correct below the standard. E. coli concentrations were less than 235 CFU/100 mL, and the predicted probabilities were below the threshold.
  2. False positive. E. coli concentrations were less than 235 CFU/100 mL, but the predicted probabilities were above the threshold.
  3. Correct above the standard. E. coli concentrations were equal to or greater than 235 CFU/100 mL, and the predicted probabilities were above the threshold.
  4. False negative. E. coli concentrations were equal to or greater than 235 CFU/100 mL, but the predicted probabilities were below the threshold.

If one were to raise or lower the horizontal line, it would change the number of correct and incorrect responses. For example, a threshold of 33 would have produced the highest number of correct responses (334), but would also produce a high number of false negatives (36). False negative responses should be reduced whenever positive because the recreational water quality was determined to be acceptable when in fact the standard was exceeded. Instead, selecting a threshold of 30 maintains a high number of correct responses (332), yet reduces the false negatives (33) and represents a compromise between false negative and false positive responses.


Edgewater, Cleveland, Ohio. Next steps

The model development and testing steps described above were refined and used to develop models for Huntington and Edgewater for use in the nowcast during 2008. For a full report on development and testing, click here.

Predictive modeling is a dynamic process meant to augment existing beach-monitoring programs, not replace them. Models should be continuously validated and refined to improve predictions.

 

 


References

Anderson, C.W., and Wilde, F.D., eds., September 2005, Turbidity (Version 2.1): U.S. Geological Survey Techniques of Water-Resources Investigations, book 9, chap. A6., section 6.7, accessed March 2006 from http://water.usgs.gov/owq/FieldManual/Chapter6/6.7_contents.html.

Francy, D.S., Darner, R.A., 1998,  Factors affecting Escherichia coli concentrations at Lake Erie public bathing beaches: U.S. Geological Survey Water-Resources Investigations Report 98-4241, 41 p.

Francy, D.S., Darner, R.A., 2002, Forecasting bacteria levels at bathing beaches in Ohio: U.S. Geological Survey Fact Sheet FS-132-02. Available at http://oh.water.usgs.gov/reports/fs-132-02.pdf, 4 p.

Francy, D.S., Gifford, A.M., Darner, R.A., 2003, Escherichia coli at Ohio bathing beaches distribution, sources, wastewater indicators, and predictive modeling: U.S. Geological Survey Water-Resources Investigations Report 02-4285. Available at http://oh.water.usgs.gov/reports/Abstracts/wrir02-4285.html, 120 p.

Francy, D.S. and Darner, R.A., 2006, Procedures for developing models to predict exceedance of recreational water-quality standards at coastal beaches: U.S. Geological Survey Techniques and Methods 6-B5, 34 p., available at http://pubs.usgs.gov/tm/2006/tm6b5/

Francy, D.S., Darner, R.A., and Bertke, E.E., 2006, Models for predicting recreational water quality at Lake Erie beaches: U.S. Geological Survey Scientific Investigations Report 2006-5192, 13 p., available at http://pubs.usgs.gov/sir/2006/5192/

Helsel, D.R., and Hirsch, R.M., 2002, Statistical methods in water resources: U.S. Geological Survey Techniques of Water-Resource Investigation, book 4, chap. A3, accessed March 2006 at http://pubs.er.usgs.gov/pubs/twri/twri04A3

Myers, D.N., and Wilde, F.D., eds., 2003, Biological indicators (3d ed.): U.S. Geological Survey Techniques of Water-Resources Investigations, book 9, chap. A7, accessed March, 2006 at http://pubs.water.usgs.gov/twri9A7/

U.S. Environmental Protection Agency, 2002, Method 1603—Escherichia coli in water by membrane filtration using modified membrane-thermotolerant Escherichia coli agar: Washington, D.C., EPA 821-R-02-23, 9 p.

 

 

Partners

 
Cuyahoga County Board of Health  Ohio Department of Health  Cleveland Metroparks  U.S. Geological Survey  Ohio Water Development Authority  Northeast Ohio Regional Sewer District Cuyahoga County Sanitary EngineerCleveland Lakefront State Park Ohio Lake Erie Commission  

    We would like to thank Steve Lawrence with the USGS Georgia Water Science Center for help with design suggestions and images.

 

The URL for this page is http://www.ohionowcast.info/ohionowcastmore.htm

For comments or changes regarding this Web page, please contact

Donna Francy
USGS Ohio Water Science Center
6480 Doubletree Avenue
Columbus, OH 43229
(614) 430-7769

 

Nowcast data are updated daily; last Web page update: July 2, 2008.