Data and code to reproduce the analysis underlying the stories and interactive map analyzing the city of Chicago's problems with lead water service lines published on Aug. 28, 2025 by Inside Climate News, Grist, and WBEZ.
acs.R This pulls socioeconomic and race/ethnicity data from the 2023 5-year American Community Survey (ACS) for census tracts in the Chicago area.
process_addresses.R This parses addresses from the 2025 Chicago water service line inventory from the file 2025_inventory.xlsx in the data folder, then geocodes and associates each address with the corresponding Census tract. (Note, the pipeline also involved inspection and editing of results in OpenRefine and QGIS, spatial joins conducted in QGIS, plus some manual geocoding, so is not fully reproducible from this code.)
tract-to-cca-aggregation-income.ipynb tract-to-cca-aggregation-race.ipynb tract-to-cca-aggregation-poverty.ipynb These aggregate ACS data to the level of Chicago's 77 community areas.
process_inventory.R This combines the outputs of the previous scripts, notebooks and manual geocoding to create the map layers and the service line data used in the interactive app produced using the code in this GitHub repository.
consolidate_addresses.R This consolidates addresses from the inventory with overlapping street number ranges so that, for example, separate service lines located by the city at 11-13 E ILLINOIS ST and at 11 E ILLINOIS ST both appear in our interactive under the consolidated address 11-13 E ILLINOIS ST.
static_maps.R Code to generate panel of static maps, saved to the maps folder, subsequently edited by the partners' graphics desks to produce the versions used for publication.
majority-race-analysis.ipynb Code to reproduce the analysis of percentage of service lines requiring replacement in majority Black, Latino, Asian and white census tracts.
These are in the folder processed_data:
chicago_tracts_filled.geojson Geodata for census tracts for the city of Chicago plus tracts that are not part of the city but are contained within its outer perimeter. Contains the following variables:
-
geoidCensus Bureau identifier for the tract. -
pct_poverty2023 5-year ACS estimate for the percentage of the population below the federal poverty level. -
median_household_income2023 5-year ACS estimate for median household income. -
pct_black_nonhispanic2023 5-Year ACS estimate for the percentage of the population identifying as black alone, not Hispanic/Latino. -
pct_white_nonhispanic2023 5-Year ACS estimate for the percentage of the population identifying as white alone, not Hispanic/Latino. -
pct_asian_nonhispanic2023 5-Year ACS estimate for the percentage of the population identifying as Asian alone, not Hispanic/Latino. -
pct_minority2023 5-year ACS estimate for the percentage of the population identifying as anything other than white alone, not Hispanic/Latino. -
LNumber of service lines classified as lead, that is at least one of the gooseneck, public water system line, or customer-side line is lead. -
GRRNumber of service lines classified as galvanized requiring replacement, that is at least one of the gooseneck, public water system line, or customer-side line is GRR, where there is no lead component. -
UNumber of service lines classified as unknown, suspected lead; that is at least one of the gooseneck, public water system line, or customer-side line is unknown, suspected lead, where there is no lead or GRR component. -
NLNumber of service lines classified as not lead, that is no component is lead, GRR, or unknown, suspected lead. -
totalTotal number of service lines located to the tract. -
flagTRUE iftotalis less than 25. Used to apply transparency or a gray color to tracts with very few service lines on the interactive and static maps respectively. -
lead_plus_suspectedThe sum ofLandU. -
requires_replacementThe sum ofL,UandGRR. -
pct_leadPercentage of service lines classified asL. -
pct_grrPercentage of service lines classified asGRR. -
pct_suspected_leadPercentage of service lines classified asU. -
pct_lead_suspectedPercentage of service lines classified aslead_plus_suspected. -
pct_requires_replacementPercentage of service lines classified asrequires_replacement. -
pct_not_leadPercentage of service lines classified asNL.
chicago_community_areas.geojson chicago_community_areas.csv Data from the service line inventory aggregated to Chicago community areas, rather than census tracts. Contains the following variables:
-
communityName of the community area. -
area_num_1Numerical code for the community area. -
pct_poverty2023 5-year ACS estimate for the percentage of the population below the federal poverty level. -
median_household_income2023 5-year ACS estimate for median household income. -
pct_black_nonhispanic2023 5-Year ACS estimate for the percentage of the population identifying as black alone, not Hispanic/Latino. -
pct_white_nonhispanic2023 5-Year ACS estimate for the percentage of the population identifying as white alone, not Hispanic/Latino. -
pct_asian_nonhispanic2023 5-Year ACS estimate for the percentage of the population identifying as Asian alone, not Hispanic/Latino. -
pct_minority2023 5-year ACS estimate for the percentage of the population identifying as anything other than white alone, not Hispanic/Latino. -
LNumber of service lines classified as lead, that is at least one of the gooseneck, public water system line, or customer-side line is lead. -
GRRNumber of service lines classified as galvanized requiring replacement, that is at least one of the gooseneck, public water system line, or customer-side is GRR, where there is no lead component. -
UNumber of service lines classified as unknown, suspected lead, that is at least one of the gooseneck, public water system line, or customer-side line is unknown, suspected lead, where there is no lead or GRR component. -
NLNumber of service lines classified as not lead, that is no component is lead, GRR, or unknown, suspected lead. -
totalTotal number of service lines located to the tract. -
flagTRUE iftotalis less than 25. -
lead_plus_suspectedThe sum ofLandU. -
requires_replacementThe sum ofL,UandGRR. -
pct_leadPercentage of service lines classified asL. -
pct_grrPercentage of service lines classified asGRR. -
pct_suspected_leadPercentage of service lines classified asU. -
pct_lead_plus_suspectedPercentage of service lines classified aslead_plus_suspected. -
pct_requires_replacementPercentage of service lines classified asrequires_replacement. -
pct_not_leadPercentage of service lines classified asNL.
service_lines1.csv service_lines2.csv Parsed and geocoded data for addresses from the 2025 Chicago water service lines inventory, split into two files. They contain the following variables:
rowIndex from the rows in the service line inventory, range1:491,705.gooseneck_pigtailComposition of the gooseneck. Codes:UUnknown (suspected lead);LLead;UNLUnknown but not lead;CCopper- GRR - Galvanized requiring replacement
OCast/ductile iron or transite.
pws_owned_service_line_materialComposition of the public water system-owned service line, codes as above.customer_side_service_line_materialComposition of the customer-side service line, codes as above.classification_for_entire_service_lineThis is the variable used to aggregate results to tracts and Community Areas. Codes:LAt least one of the gooseneck, public water system line, or customer-side line is lead.GRRAt least one of the gooseneck, public water system line, or customer-side line is GRR, where there is no lead component.UAt least one of the gooseneck, public water system line, or customer-side is unknown, suspected lead, where there is no lead or GRR component.NLNo component is lead, GRR, or unknown, suspected lead. These are the only service lines that do not require replacement.
full_addressParsed address, after cleaning and consolidation with overlapping addresses, if necessary, followed by a series of variables parsed from it.is_intersectionTRUE or FALSE. Where TRUE, the series of variables beginningstare all null.stnum1The street number, or in the case of a building with a range of numbers in the service line inventory, the lowest number in the range.stnum2The street number, or in the case of a building with a range of numbers in the service line inventory, the highest number in the range.stdirN, S, E, or W. Not present in all addresses.stnameStreet name, minus the street type or suffix. Note there are edge cases that made name and type/suffix hard to parse.sttypeStreet type or suffix: ST, RD, AVE etc. Same caveat as for street name.zip5-digit Zip code.geocoderGeocoding service used to obtain result that passed quality screening. Includes a small number of manually geocoded addresses/intersections. There are just 21 addresses that we were unable to geocode with acceptable accuracy. Note, a few hundred of the geocoded addresses fall outside of the city of Chicago, many very near to its perimeter, and are included for the interactive lookup tool as residents may search for them.latLatitude returned by the geocoding service.longLongitude returned by the geocoding service.geoidCensus Bureau identifier for the tract containing the geocoded address. Obtained directly from the geocoding results for the Census Bureau service, by spatial join to Census Bureau tracts boundary data for the other geocoding services.matched_addressAddress returned by the geocoding service, converted into a standard format.m_stnum1m_stnum2m_stdirm_stnamem_sttypem_zipAddress elements as above, but parsed frommatched_address.
Email Peter Aldhous for questions on the R scripts or Amy Qin for questions on the Jupyter notebooks.