Water quality advisories issued with
3 times more accuracy
then previous model

Advisories protect beach visitors from contracting illnesses from waterborne pathogens.

All 26 miles of Chicago's shoreline along Lake Michigan belong to the public. Over twenty public beaches are open to 60 million annual visitors and residents. During the summer months, the cool waters of Lake Michigan provide a respite from the heat.

Most of the time, beach water quality is acceptable. But sometimes, rates of bacteria elevate to a level where swimmers are at risk of contracting an infection and developing flu-like symptoms.

Traditional testing methods do not return results quickly enough to provide realtime water quality notifications, or "beach advisories." Rapid DNA testing methods are available, but are costly when used daily at all beaches. Predictive modeling techniques have been developed, but do not yet provide accurate results reliably.

The City partnered with the Chicago Park District, volunteer data scientists, and students at local universities to build a better predictive model for forecasting beach water quality. In the process, the team developed an innovative new approach to water quality modeling.

Innovative Science

Prior efforts to forecast beach water quality in Chicago and elsewhere have adopted similar approaches. Researchers would collect meteorological data near a swimming site, and then predict contamination levels in the water. However, this methodology can be unreliable and often does not identify days with high E. coli levels.

This project developed a new approach that takes advantage of rapid DNA testing and the lessons learned from data exploration. That approach begins by acknowledging that just five beaches contribute to about 56% of poor water quality beach days. These beaches, which are some of the hardest to predict, should be routinely rapid tested due to their volatility. Water quality patterns at the remaining beaches can then be separated into clusters. In the new approach, one beach from each of these clusters would be rapid tested to get an immediate result. The remaining beaches would be predicted by the model.

A key feature of the new approach is its cost effectiveness. The increased cost of rapid testing would be offset because only half the beaches would be tested. Yet it performs better and provides more accurate notifications to the public.

Collaboration & Open Science

Clear Water is a collaborative, open source project that was developed by the City of Chicago, civic tech volunteers, and graduate students. The City of Chicago adopted approaches from open science to allow "citizen scientists" to contribute to research projects. As a result, the project can also be shared with others and improved by a network of scientists and researchers.

Volunteers donated over 1,000 hours to the project. Volunteers at Chi Hack Night developed the initial statistical model. Interns from DePaul University's Masters in Predictive Analytics helped evaluate the effectiveness of the statistical model. In addition, students from DePaul's Data Visualization course visualized how the predictive model operates.

Clear Water is also made available as an open source project. The code is written in R, an open source, widely-known programming language for statisticians. There is no need for expensive software licenses to view and run this code. Open sourcing the code also allows for collective advancement. The open source code contains the necessary data to let others test and try to improve upon the current analytic method.

Evaluating the Effectiveness of the Model

Data from 2006 through 2016 were used to conduct simulations. The 2017 pilot was evaluated with actual results, collected after the model was developed. The model used measurements from Rainbow, Calumet, South Shore, 63rd, and Montrose beaches to predict water quality at the 15 other beaches.

By mid-summer 2017, poor water quality conditions were detected 121 times at different beaches. The piloted model, paired with selective rapid testing, would have issued advisories for 69 of those occurances.

Typically, the prior model would have issued advisories for only 9 out of 112 occurances by mid-summer. The new approach so far has detected 60 more occurances than the prior model. Further, as more data on rapid testing is accumulated, the new model's performance should perform even better.

The new model also has better overall accuracy of predicting poor water quality. In 2017, the piloted model so far has an accuracy rate of 12%, which is 3 times higher than the prior model's accuracy rate of 4% under similar conditions. By performing clustering algorithms to optimize the selection of predictor beaches, simulations consistently show accuracy rates of over 20 percent.

This new approach to beach water quality forecasting requires a choice to be made: which beaches are tested and which are predicted. You can try making your own model to see if it can outperform ours.

Get the data

The Beach Water Quality data is available on Chicago's Data Portal and is updated daily during the beach season.