Clear Water

Innovative Science

Prior efforts to forecast beach water quality in Chicago and elsewhere have adopted similar approaches. Researchers would collect meteorological data near a swimming site, and then predict contamination levels in the water. However, this methodology can be unreliable and often does not identify days with high E. coli levels.

This project developed a new approach that takes advantage of rapid DNA testing and the lessons learned from data exploration. That approach begins by acknowledging that just five beaches contribute to about 56% of poor water quality beach days. These beaches, which are some of the hardest to predict, should be routinely rapid tested due to their volatility. Water quality patterns at the remaining beaches can then be separated into clusters. In the new approach, one beach from each of these clusters would be rapid tested to get an immediate result. The remaining beaches would be predicted by the model.

A key feature of the new approach is its cost effectiveness. The increased cost of rapid testing would be offset because only half the beaches would be tested. Yet it performs better and provides more accurate notifications to the public.

Collaboration & Open Science

Clear Water is a collaborative, open source project that was developed by the City of Chicago, civic tech volunteers, and graduate students. The City of Chicago adopted approaches from open science to allow "citizen scientists" to contribute to research projects. As a result, the project can also be shared with others and improved by a network of scientists and researchers.

Volunteers donated over 1,000 hours to the project. Volunteers at Chi Hack Night developed the initial statistical model. Interns from DePaul University's Masters in Predictive Analytics helped evaluate the effectiveness of the statistical model. In addition, students from DePaul's Data Visualization course visualized how the predictive model operates.

Clear Water is also made available as an open source project. The code is written in R, an open source, widely-known programming language for statisticians. There is no need for expensive software licenses to view and run this code. Open sourcing the code also allows for collective advancement. The open source code contains the necessary data to let others test and try to improve upon the current analytic method.

Evaluating the Effectiveness of the Model

Data from 2006 through 2016 were used to conduct simulations. The 2017 pilot was evaluated with actual results, collected after the model was developed. The model used measurements from Rainbow, Calumet, South Shore, 63rd, and Montrose beaches to predict water quality at the 15 other beaches.

By mid-summer 2017, poor water quality conditions were detected 121 times at different beaches. The piloted model, paired with selective rapid testing, would have issued advisories for 69 of those occurances.

Typically, the prior model would have issued advisories for only 9 out of 112 occurances by mid-summer. The new approach so far has detected 60 more occurances than the prior model. Further, as more data on rapid testing is accumulated, the new model's performance should perform even better.

The new model also has better overall accuracy of predicting poor water quality. In 2017, the piloted model so far has an accuracy rate of 12%, which is 3 times higher than the prior model's accuracy rate of 4% under similar conditions. By performing clustering algorithms to optimize the selection of predictor beaches, simulations consistently show accuracy rates of over 20 percent.

This new approach to beach water quality forecasting requires a choice to be made: which beaches are tested and which are predicted. You can try making your own model to see if it can outperform ours.

Beat the model

Water quality advisories issued with
3 times more accuracy
than previous model

Innovative Science

Collaboration & Open Science

Evaluating the Effectiveness of the Model

Get the data