The Department of Public Health and Department of Innovation and Technology have partnered to explore a combination of datasets to prioritize which establishments are more likely to yield a critical violation during an inspection. Staff from Allstate Insurance have also assisted with the research project.
After conducting interviews with the Department of Public Health's food inspection team, several data sources--ranging from 311 data to food inspections and weather--were explored. A dozen variables had substantial relationships with the likelihood of an establishment failing a food inspection.
Information about the food establishment, such as its CDPH-assigned risk level and whether the establishment had failed previous inspections, served as important predictors. Information about the establishment's community, such as its location and nearby sanitation complaints made through 311, was also related to the most severe violations.
When factoring all of these items together, the research team was able to provide a likelihood of critical violations for each establishment, which was developed to prioritize which ones should be inspected first.
The portal was an effective tool to allow for such collaborative research. This project was able to leverage Chicago's key data assets: its large volume of data, the transparency and size of its open data portal, and its ability and willingness to conduct research to improve city services, introduce savings, and increase engagement with Chicago-area businesses.
During this time, the Department of Public Health visited 1,637 food establishments. Almost 16 percent of them—258 establishments—yielded at least one critical violation during the experiment. Over half, 55 percent or 141 establishments, were found during the first month of the evaluation, whereas 117 establishments (45 percent) were found during the second half.
After all of the inspections were completed, the Department of Innovation and Technology used data to estimate the likelihood of each establishments having a critical violation. Researchers applied a probability to each establishment using historical data, and then investigated if these probabilities could be used to make the inspection process more efficient.
The simulation would show if riskier establishments would be inspected first.
Researchers found that food inspectors could be allocated more efficiently using the computer algorithm. During the simulation, 69 percent of inspections—178 establishments—with critical violations were found during the first half of work, compared to 55 percent during normal operations. Over the two month pilot, establishments with violations were found, on average, 7 and a half days earlier. That is, an additional 37 establishments would have been cited for violations in the first month, as opposed to being discovered later, potentially after patrons became ill.
Open sourcing the code also allows for collective advancement. The open source code contains the necessary data to let others test and try to improve upon the current analytic method. The project is also documented in a reproducible format, allowing researchers to read a description of the research and view its underlying calculations. The repository contains everything that is needed to let a community of researchers refine and derive better models.