From 38cf181f40d9be34f22dc8949ebd4df558ea8d10 Mon Sep 17 00:00:00 2001 From: dadams Date: Sun, 8 Mar 2026 20:25:58 -0700 Subject: [PATCH] Initial commit: database schema, data source docs, chapter variable references --- .gitignore | 42 +++++++ README.md | 152 +++++++++++++++++++++++ data/ra-input/.gitkeep | 0 docs/chapter4-variables.md | 151 +++++++++++++++++++++++ docs/chapter5-variables.md | 162 ++++++++++++++++++++++++ docs/data-sources.md | 202 ++++++++++++++++++++++++++++++ docs/database-schema.md | 246 +++++++++++++++++++++++++++++++++++++ 7 files changed, 955 insertions(+) create mode 100644 .gitignore create mode 100644 README.md create mode 100644 data/ra-input/.gitkeep create mode 100644 docs/chapter4-variables.md create mode 100644 docs/chapter5-variables.md create mode 100644 docs/data-sources.md create mode 100644 docs/database-schema.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..21ff3e2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,42 @@ +# Large source data files — not tracked +data/raw/ +*.shp +*.dbf +*.shx +*.prj +*.sbn +*.sbx +*.sbx +*.cpg +*.shp.xml +*.shp.ea.iso.xml +*.shp.iso.xml + +# CSV data files (large) +*.csv + +# Jupyter +.ipynb_checkpoints/ +**/.ipynb_checkpoints/ + +# Python +__pycache__/ +*.py[cod] +*.egg-info/ +.env +.venv/ +venv/ + +# Figures (generated outputs — commit selectively) +figures/**/*.png +figures/**/*.pdf +figures/**/*.svg + +# OS +.DS_Store +Thumbs.db + +# Editor +.vscode/ +*.swp +*~ diff --git a/README.md b/README.md new file mode 100644 index 0000000..7a12867 --- /dev/null +++ b/README.md @@ -0,0 +1,152 @@ +# The Hydrocarbon Horizon: Orphaned Wells Analysis + +**Project:** *The Hydrocarbon Horizon: The Politics of Oil and Gas Site Closure* +**Author:** Dr. David P. Adams, California State University Fullerton +**Publisher:** Routledge (forthcoming) +**Collaborators:** Dr. Jon Fisk, Dr. Nurun Nahar + +--- + +## Overview + +This repository contains the data infrastructure, analysis notebooks, and SQL code supporting the empirical chapters of *The Hydrocarbon Horizon*. The book examines the politics of fossil fuel site closure in the United States, with particular attention to orphaned oil and gas wells as physical indicators of stranded assets and the unresolved liabilities of the fossil fuel regime. + +The core database (`orphaned_wells`, PostgreSQL/PostGIS) integrates: + +- **117,672 documented unplugged orphaned wells** across 27 states (USGS DOW dataset, 2022) +- **85,230 U.S. census tracts** (2021 TIGER/Line cartographic boundary file) +- **State governance framework data** — just transition offices and plugging prioritization schemes +- **Financial liability estimates** — per-well plugging costs and IIJA/BIL funding allocations +- **FGDC and ScienceBase metadata** — full provenance for the primary dataset + +--- + +## Repository Structure + +``` +new-orphan-wells/ +├── README.md +├── docs/ +│ ├── database-schema.md # Full schema reference: all tables, columns, indexes +│ ├── data-sources.md # Provenance, citations, and data quality notes +│ ├── chapter4-variables.md # Variable definitions for Chapter 4 (governance coding) +│ └── chapter5-variables.md # Variable definitions for Chapter 5 (financial liability) +├── notebooks/ +│ ├── ch4_geography/ # Chapter 4: Geography of Transition +│ ├── ch5_costs/ # Chapter 5: Costs of Transition +│ ├── ej_analysis/ # Environmental justice / ACS demographic joins +│ └── spatial/ # Mapping and spatial analysis +├── sql/ +│ ├── schema/ # Table and index DDL +│ ├── views/ # View definitions +│ └── queries/ # Analysis queries by chapter +├── data/ +│ ├── raw/ # Source files (gitignored — see below) +│ ├── processed/ # Derived/exported datasets +│ └── ra-input/ # RA Excel workbooks (Transition_Offices, Prioritization) +├── figures/ +│ ├── ch4/ # Maps and charts for Chapter 4 +│ └── ch5/ # Charts and tables for Chapter 5 +└── scripts/ # Shell and Python utility scripts +``` + +--- + +## Database + +**Name:** `orphaned_wells` +**Host:** localhost (PostgreSQL 18, PostGIS enabled) +**Connection:** `psql -U postgres -h localhost -d orphaned_wells` + +See [`docs/database-schema.md`](docs/database-schema.md) for the full schema reference. + +### Quick start + +```sql +-- Well count by state with liability estimates +SELECT state, state_name, well_count_dow, + est_mid_liability, iija_phase1, unfunded_mid +FROM v_ch5_liability_summary; + +-- State governance framework (populated after RA data loaded) +SELECT state, state_name, well_count_dow, framework_type, office_language_type +FROM v_ch4_state_analysis +ORDER BY well_count_dow DESC; + +-- Highest-density tracts (environmental justice targeting) +SELECT tract_geoid, tract_name, county_name, state_usps, well_count, wells_per_km2 +FROM v_highest_density_tracts +LIMIT 20; +``` + +--- + +## Data Sources + +| Dataset | Source | Citation | +|---|---|---| +| U.S. Orphaned Wells (DOW) | USGS ScienceBase | Grove & Merrill (2022), DOI: 10.5066/P91PJETI | +| Census Tracts | U.S. Census Bureau | cb_2021_us_tract_500k, TIGER/Line | +| Plugging Cost Estimates | Raimi et al. (2021) | DOI: 10.1021/acs.est.1c02234 | +| IIJA Funding | DOI/OSMRE | Phase 1 formula grants, Nov 2022 | +| Transition Offices | Climate Policy Dashboard | climatepolicydashboard.org | +| Prioritization Schemes | IOGCC | Prioritization Report, July 2023 | + +Full citations and data quality notes: [`docs/data-sources.md`](docs/data-sources.md) + +--- + +## RA Data Integration + +When the RA returns the Excel workbook (`Transition_Offices` + `Prioritization` tabs), load into the database: + +```bash +# From data/ra-input/ after saving tabs as CSV +psql -U postgres -h localhost -d orphaned_wells \ + -c "\COPY state_transition_offices (state,state_name,office_name,year_established,target_text,code_fossil,code_equity,source_url,date_collected,collected_by,notes) FROM 'data/ra-input/Transition_Offices.csv' CSV HEADER;" + +psql -U postgres -h localhost -d orphaned_wells \ + -c "\COPY state_prioritization (state,state_name,system_type,tech_factors,code_rural_urban,code_vuln,code_surface,pdf_page,source_quote,source_url,date_collected,collected_by,notes) FROM 'data/ra-input/Prioritization.csv' CSV HEADER;" +``` + +The `v_state_governance` and `v_ch4_state_analysis` views activate automatically. + +--- + +## Notebooks + +| Notebook | Chapter | Description | +|---|---|---| +| `ch4_geography/01_state_distribution.ipynb` | 4 | Well counts, maps, fossil dependence by state | +| `ch4_geography/02_governance_framework.ipynb` | 4 | Engineering vs. justice state typology | +| `ch4_geography/03_ej_concentration.ipynb` | 4 | Tract-level EJ analysis | +| `ch5_costs/01_liability_estimates.ipynb` | 5 | State-level cost and funding gap | +| `ch5_costs/02_iija_adequacy.ipynb` | 5 | IIJA Phase 1 coverage analysis | +| `ej_analysis/01_acs_join.ipynb` | 4–5 | ACS demographic join via tract_geoid | +| `spatial/01_national_map.ipynb` | 4 | National well distribution map | + +--- + +## Gitignore Notes + +Large source files are not tracked. Add to `.gitignore`: + +``` +data/raw/ +*.shp +*.dbf +*.shx +*.prj +*.sbn +*.sbx +*.cpg +*.csv +figures/**/*.png +figures/**/*.pdf +``` + +--- + +## Citation + +Adams, D.P. (forthcoming). *The Hydrocarbon Horizon: The Politics of Oil and Gas Site Closure*. Routledge. diff --git a/data/ra-input/.gitkeep b/data/ra-input/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/docs/chapter4-variables.md b/docs/chapter4-variables.md new file mode 100644 index 0000000..b3e4e2a --- /dev/null +++ b/docs/chapter4-variables.md @@ -0,0 +1,151 @@ +# Chapter 4 Variable Reference +## The Geography of Transition: Distribution and Consequences of Orphaned Wells + +--- + +## Research Questions + +1. How are orphaned wells spatially distributed and do they concentrate in historically fossil-dependent communities? +2. Do states with strong fossil industry dependence have formal energy transition governance mechanisms? +3. Do states frame orphaned well remediation as an engineering problem or a justice problem? +4. How does spatial distribution relate to political tensions over responsibility and funding? + +--- + +## Core Analytical Variables + +### Well Distribution + +| Variable | Source | DB Location | Notes | +|---|---|---|---| +| Well count by state | USGS DOW | `v_wells_by_state.well_count` | 27 states | +| Well count by county | USGS DOW | `v_wells_by_county.well_count` | 5-digit GEOID | +| Well count by tract | USGS DOW | `v_wells_by_tract.well_count` | 11-digit GEOID | +| Well density (wells/km²) | Calculated | `v_wells_by_tract.wells_per_km2` | Land area only | +| Well type (normalized) | USGS DOW | `wells.well_type_normalized` | 12 categories | +| Well status | USGS DOW | `wells.status` | State-specific terminology | + +### State Governance Framework (RA-coded) + +| Variable | Source | DB Location | Values | +|---|---|---|---| +| Transition office count | Climate Policy Dashboard | `v_state_governance.transition_office_count` | 0, 1, 2+ | +| Office fossil language | Climate Policy Dashboard | `state_transition_offices.code_fossil` | 0/1 | +| Office equity language | Climate Policy Dashboard | `state_transition_offices.code_equity` | 0/1 | +| Prioritization system type | IOGCC 2023 | `state_prioritization.system_type` | Text description | +| Technical factors used | IOGCC 2023 | `state_prioritization.tech_factors` | Semicolon list | +| Rural/urban in scoring | IOGCC 2023 | `state_prioritization.code_rural_urban` | 0/1 | +| Vulnerability/EJ in scoring | IOGCC 2023 | `state_prioritization.code_vuln` | 0/1 | +| Surface land use in scoring | IOGCC 2023 | `state_prioritization.code_surface` | 0/1 | + +### Derived Classification + +| Variable | DB Location | Logic | +|---|---|---| +| `framework_type` | `v_state_governance` | Justice if `code_vuln=1`; Mixed if `code_rural_urban=1`; Engineering if system documented but no EJ/density; Unclassified otherwise | +| `office_language_type` | `v_state_governance` | Fossil + Equity / Fossil only / Equity only / Office exists no language / No transition office | + +### Environmental Justice Indicators (requires ACS join) + +Join on `wells.tract_geoid` = ACS `geoid`: + +| Variable | ACS Table | Description | +|---|---|---| +| Median household income | B19013 | Tract-level; proxy for economic vulnerability | +| % Non-white | B03002 | Calculated from race/ethnicity totals | +| % Below poverty line | B17001 | Federal poverty threshold | +| Median housing age | B25035 | Proxy for legacy industrial neighborhood | +| % Unemployed | B23025 | Labor market conditions | + +--- + +## Key Queries + +### State governance summary (activate after RA data loaded) +```sql +SELECT state, state_name, well_count_dow, + framework_type, office_language_type, + code_vuln, code_rural_urban, code_fossil, code_equity, + est_liability_mid_usd +FROM v_ch4_state_analysis +ORDER BY well_count_dow DESC; +``` + +### Engineering vs. justice states, well count comparison +```sql +SELECT framework_type, + count(DISTINCT state) AS state_count, + sum(well_count_dow) AS total_wells, + round(avg(well_count_dow)) AS avg_wells_per_state +FROM v_ch4_state_analysis +GROUP BY framework_type +ORDER BY total_wells DESC; +``` + +### Highest-density tracts (for mapping) +```sql +SELECT tract_geoid, tract_name, county_name, state_usps, + well_count, wells_per_km2, tract_land_km2 +FROM v_highest_density_tracts +LIMIT 50; +``` + +### Wells in tracts below median income (EJ analysis — requires ACS) +```sql +SELECT w.state, count(*) AS wells_in_low_income_tracts +FROM wells w +JOIN acs_b19013 a ON w.tract_geoid = a.geoid +WHERE a.median_hh_income < 50000 +GROUP BY w.state +ORDER BY wells_in_low_income_tracts DESC; +``` + +### State transition office presence vs. well burden +```sql +SELECT sg.framework_type, + sg.office_language_type, + count(DISTINCT sg.state) AS states, + sum(sg.well_count_dow) AS total_wells, + avg(sg.well_count_dow) AS avg_wells +FROM v_state_governance sg +GROUP BY sg.framework_type, sg.office_language_type +ORDER BY total_wells DESC; +``` + +--- + +## Analytical Strategy (Chapter 4) + +### Section 1: Mapping the Distribution +- National map: well locations by `well_type_normalized` +- State-level choropleth: well count and well density +- County-level choropleth: `v_wells_by_county` joined to TIGER county boundaries +- Key finding to highlight: OH + PA + OK = 47% of all documented orphaned wells + +### Section 2: Fossil Dependence and Governance +- Crosstab: states by framework_type × well count +- Test: Do high-burden states have transition offices? (office_count > 0 vs. well_count) +- Key contrast: PA (no transition office, engineering frame) vs. CO (Just Transition Office, equity language) + +### Section 3: The Justice Dimension +- Map: well density by tract overlaid with % non-white or % below poverty +- Identify tracts where both are elevated — the "double burden" +- Use `v_highest_density_tracts` for case study selection + +### Section 4: Political Tensions +- Connect framework_type to state political context (add `state_politics` table if needed) +- Argument: justice framing is not randomly distributed — correlates with state political economy + +--- + +## Data Limitations for Chapter 4 + +1. **DOW dataset is documented wells only.** True orphaned well count is almost certainly higher. API estimates 2 million+ undocumented orphaned wells nationally. + +2. **Definitional inconsistency.** California "idle" wells differ legally from other states' "orphaned" definition. Flagged in `data_file_notes`. + +3. **Type field missingness (59.3% Unknown).** Major states (OH, PA, KY) did not classify type. Limit type-based analysis to states with complete type data or use normalized categories cautiously. + +4. **Snapshot data.** Data collected 2019–2022; plugging programs have been active since, so current counts are lower. + +5. **Spatial precision.** No formal accuracy tests. Some coordinates converted from PLSS — precision is lower for KS and MT wells. diff --git a/docs/chapter5-variables.md b/docs/chapter5-variables.md new file mode 100644 index 0000000..f292fff --- /dev/null +++ b/docs/chapter5-variables.md @@ -0,0 +1,162 @@ +# Chapter 5 Variable Reference +## The Costs of Transition: Financial and Climate Liabilities of Orphaned Wells + +--- + +## Research Questions + +1. What are the estimated plugging and remediation costs, and how do they vary by state? +2. How large is the unfunded liability gap, and which states carry the greatest exposure? +3. How does IIJA/BIL funding compare to estimated need? +4. How do financial liabilities translate into political debates over who bears transition costs? +5. How does financial exposure connect to climate and equity concerns? + +--- + +## Core Financial Variables + +### Per-Well Cost Estimates + +All cost estimates are in nominal USD. The Raimi et al. (2021) figures are the primary reference for this project. + +| Estimate | Per Well (Low) | Per Well (Mid) | Per Well (High) | Source | +|---|---|---|---|---| +| EPA OLEM (2018) | $5,000 | $25,000 | $85,000 | Older; widely considered low | +| **Raimi et al. (2021)** | **$9,000** | **$76,000** | **$280,000** | **Primary reference** | +| Carbon Tracker (2020) | $20,000 | $82,000 | $300,000 | Investor/financial risk focus | +| IOGCC (2023) | $5,000 | $33,000 | $150,000 | State-reported; inconsistent method | +| PA DEP (2022) | $10,000 | $68,000 | $220,000 | Actual contract data; PA-specific | + +Full citations in `docs/data-sources.md` and `plugging_cost_references` table. + +### State Liability Variables + +| Variable | DB Location | Description | +|---|---|---| +| `well_count_dow` | `state_liability` | USGS DOW well count per state | +| `est_liability_low_usd` | `state_liability` (generated) | `well_count × $9,000` | +| `est_liability_mid_usd` | `state_liability` (generated) | `well_count × $76,000` | +| `est_liability_high_usd` | `state_liability` (generated) | `well_count × $280,000` | +| `iija_phase1_formula_usd` | `state_liability` | Phase 1 formula grant allocation | +| `iija_phase2_perf_usd` | `state_liability` | Phase 2 performance grant (populate as awarded) | +| Unfunded mid | Calculated in view | `est_liability_mid_usd - iija_phase1_formula_usd` | +| `iija_covers_pct_mid` | `v_ch5_liability_summary` | IIJA Phase 1 as % of mid-range liability | + +### National Totals (from `v_ch5_national_totals`) + +| Metric | Value | +|---|---| +| Total documented wells | 117,672 | +| National liability (low) | ~$1.06 billion | +| National liability (mid) | ~$8.94 billion | +| National liability (high) | ~$32.95 billion | +| IIJA Phase 1 total | ~$310 million | +| IIJA Phase 1 coverage of mid estimate | **~3.47%** | +| Unfunded gap (mid) | ~$8.63 billion | + +--- + +## Key Queries + +### Full state liability table +```sql +SELECT * FROM v_ch5_liability_summary; +``` + +### National totals +```sql +SELECT * FROM v_ch5_national_totals; +``` + +### States where IIJA Phase 1 covers more than 5% of mid-range liability +```sql +SELECT state, state_name, well_count_dow, + iija_covers_pct_mid, unfunded_mid +FROM v_ch5_liability_summary +WHERE iija_covers_pct_mid > 5 +ORDER BY iija_covers_pct_mid DESC; +``` + +### Liability per well (for cross-state comparison) +```sql +SELECT state, state_name, well_count_dow, + est_liability_mid_usd / well_count_dow AS mid_per_well_usd, + iija_phase1_formula_usd / well_count_dow AS iija_per_well_usd +FROM state_liability +ORDER BY well_count_dow DESC; +``` + +### Cost estimate sensitivity analysis +```sql +SELECT r.source_name, r.source_year, + r.cost_mid_usd AS per_well_mid, + 117672 * r.cost_mid_usd AS national_mid_estimate +FROM plugging_cost_references r +ORDER BY r.cost_mid_usd; +``` + +### Combine governance framework with liability +```sql +SELECT sg.framework_type, + count(DISTINCT sg.state) AS states, + sum(sg.well_count_dow) AS total_wells, + sum(sl.est_liability_mid_usd) AS total_mid_liability, + sum(sl.iija_phase1_formula_usd) AS total_iija, + round( + sum(sl.iija_phase1_formula_usd)::numeric / + sum(sl.est_liability_mid_usd) * 100, 2 + ) AS iija_coverage_pct +FROM v_state_governance sg +JOIN state_liability sl ON sg.state = sl.state +WHERE sg.framework_type != 'Unclassified' +GROUP BY sg.framework_type +ORDER BY total_wells DESC; +``` + +--- + +## Analytical Strategy (Chapter 5) + +### Section 1: The Scale of the Problem +- Table: state-level liability (low/mid/high) — `v_ch5_liability_summary` +- National headline: $8.94B mid-range vs. $310M funded = 3.47% coverage +- Sensitivity analysis: show how the number changes across the 5 cost references +- Note: DOW dataset is documented wells only. Undocumented wells could multiply national liability 3–10× + +### Section 2: The IIJA and Its Limits +- Bar chart: IIJA Phase 1 allocation vs. estimated liability by state +- Key argument: even the largest recipients (OH, PA, OK, KY, WV, TX at $25M each) cover only 1.6–5.6% of their mid-range liability +- Discuss Phase 2 performance grants as the main mechanism — but competitive, not guaranteed +- Note the perverse incentive: states must document and begin plugging to qualify for Phase 2 + +### Section 3: Stranded Asset Framing +- Connect to `plugging_cost_references` — use Carbon Tracker framing for investor risk angle +- Argument: inadequate bonding requirements made this liability invisible until it became public +- `state_liability.bonding_required` and `bonding_adequacy` fields (to be populated from literature) + +### Section 4: Who Bears the Cost? +- Cross-tab: `framework_type` (from Chapter 4) × funding coverage percentage +- Do engineering-frame states receive proportionally more or less IIJA funding? +- EJ angle: are high-density tracts in low-income communities concentrated in states with largest unfunded gaps? + +### Section 5: Climate Liability +- Methane emissions from unplugged wells — reference EPA Greenhouse Gas Inventory +- Each orphaned well emits estimated 0.1–10 tonnes CO₂e/year (wide variance by well type and age) +- Can connect `well_type_normalized` to EPA emission factor ranges +- Spatial overlap: high-density tracts × Census tract air quality data (EJScreen API if available) + +--- + +## Data Limitations for Chapter 5 + +1. **Liability estimates are for documented wells only.** The USGS DOW dataset is the most comprehensive national compilation but excludes millions of undocumented pre-regulatory wells (estimated 2M+ nationally per API). National liability could be 10–20× higher. + +2. **Raimi et al. cost estimates are national averages.** Per-well costs vary enormously: shallow, simple onshore wells may cost $5,000–$15,000; deep, complex, or offshore wells can exceed $500,000. State-level averages mask this heterogeneity. + +3. **IIJA Phase 1 allocations are approximate.** Exact per-state grant amounts should be verified against official OSMRE grant award letters before publication. The Phase 2 allocation process is ongoing; check OSMRE for updates. + +4. **No bonding data yet.** The `state_liability.bonding_required` and `bonding_adequacy` fields are not yet populated. These are essential for the stranded asset argument and should be populated from IOGCC bonding reports or state regulatory research. + +5. **Inflation not applied.** Raimi et al. (2021) costs are in approximately 2020 dollars. Post-2021 inflation (particularly construction and materials costs) has likely increased per-well costs substantially. Consider applying BLS PPI for construction to update estimates. + +6. **Phase 2 grants not yet final.** As of March 2026, Phase 2 performance grant awards are ongoing. Query OSMRE for current state of awards before the manuscript goes to press. diff --git a/docs/data-sources.md b/docs/data-sources.md new file mode 100644 index 0000000..b1329ed --- /dev/null +++ b/docs/data-sources.md @@ -0,0 +1,202 @@ +# Data Sources and Provenance + +--- + +## Primary Dataset: USGS Documented Unplugged Orphaned Oil and Gas Wells (DOW) + +**Citation:** +Grove, C.A., and Merrill, M.D., 2022, United States Documented Unplugged Orphaned Oil and Gas Well Dataset: U.S. Geological Survey data release, https://doi.org/10.5066/P91PJETI. + +**Related report:** +Merrill, M.D., Grove, C.A., Gianoutsos, N.J., and Freeman, P.A., 2023, Analysis of the United States documented unplugged orphaned oil and gas well dataset: U.S. Geological Survey Data Report 1167, 10 p., https://doi.org/10.3133/dr1167. + +**ScienceBase item:** https://www.sciencebase.gov/catalog/item/62ebd67bd34eacf539724c56 +**DOI:** https://doi.org/10.5066/P91PJETI +**Interactive map:** https://energy.usgs.gov/usdowdb +**Published:** August 22, 2022 +**Data currency:** July 1, 2019 – June 2, 2022 + +### Coverage +- **117,672 wells** in **27 states** +- States: AL, AK, AR, CA, CO, IL, IN, KS, KY, LA, MI, MS, MO, MT, NE, NV, NM, NY, ND, OH, OK, PA, TN, TX, UT, WV, WY + +### Orphaned Well Definition +Varies by state. Included if state designates as orphaned, or if ALL of the following apply: +1. No production for average of 12 months (6–24 months depending on state) +2. Well is unplugged +3. No responsible party for future use or plugging +4. Location is documented + +### Data Collection Method +- Direct requests to state oil and gas regulatory agencies (email, phone, or website download) +- Location format conversion performed using BLM Township Decoder and KGS LEO 7.0 (Kansas and Montana only) +- No other manipulations beyond reformatting and explanatory notes + +### State Agencies (27 sources) + +| State | Agency | Data Description | +|---|---|---| +| AL | Alabama Oil and Gas Board | Abandoned wells | +| AK | Alaska Oil and Gas Compact Commission | Orphan wells | +| AR | Arkansas Dept. of Transformation and Shared Services GIS | Abandoned orphan wells | +| CA | CA Dept. of Conservation, Geologic Energy Management Division (CalGEM) | Idle wells | +| CO | Colorado Oil and Gas Conservation Commission | Orphan wells | +| IL | Illinois Dept. of Natural Resources | Temporarily abandoned wells | +| IN | Indiana Dept. of Natural Resources | Orphan abandoned wells | +| KS | Kansas Corporation Commission | Abandoned wells | +| KY | Kentucky Energy and Environment Cabinet | Orphan wells | +| LA | Louisiana Dept. of Natural Resources | Orphan wells | +| MI | Michigan Dept. of Environment, Great Lakes, and Energy (EGLE) | Orphan wells | +| MS | Mississippi State Oil and Gas Board | Orphan and potentially orphan wells | +| MO | Missouri Dept. of Natural Resources | Orphan and abandoned wells | +| MT | Montana Board of Oil & Gas Conservation | Orphan wells | +| NE | Nebraska Oil & Gas Conservation Commission | Abandoned and shut-in wells | +| NV | Nevada Bureau of Mines and Geology | Abandoned and shut-in wells | +| NM | New Mexico Oil Conservation Division | Orphan wells | +| NY | NY State Dept. of Environmental Conservation | Unknown status wells | +| ND | North Dakota Dept. of Mineral Resources | Abandoned wells | +| OH | Ohio Dept. of Natural Resources | Orphan and potential orphan wells | +| OK | Oklahoma Corporation Commission, Oil and Gas Conservation | Orphan wells | +| PA | Pennsylvania Dept. of Environmental Protection | Orphan wells | +| TN | Tennessee Dept. of Environment and Conservation | Forfeited wells | +| TX | Texas Railroad Commission | Orphan wells | +| UT | Utah Division of Oil, Gas and Mining | Orphan wells | +| WV | West Virginia Dept. of Environmental Protection | Abandoned wells | +| WY | Wyoming Oil & Gas Conservation Commission | Orphan wells | + +### Data Quality Notes + +**Coordinate accuracy:** No formal positional accuracy tests were conducted. Coordinates are state-provided. Some were converted from PLSS descriptions using BLM/KGS tools (KS, MT). + +**Type field completeness:** 9 states submitted data without type classification (blank field): OH, PA, KY, KS, IN, NM, TN, AK, and Oklahoma has ~1,081 blank rows. These are coded `Unknown/Unspecified` in `well_type_normalized`. This is a source-data limitation, not a processing error. + +**Status terminology:** Status language is not standardized across states. Ranges from "Abandoned Orphaned Well" (explicit) to "AB" (code), "Idle" (CA usage), or state-specific terms. Do not compare status values cross-state without normalization. + +**Alaska wells (12):** Very small count; Alaska data may underrepresent actual orphaned well inventory. + +**California wells (3,338):** Classified as "Idle" per CalGEM definition — California's statutory definition of idle wells differs from other states' orphan definitions. May warrant separate treatment in analysis. + +**File checksums (MD5):** +- `US_orphaned_wells.csv`: `52539416efe461884034fb8d9bb184b2` +- `US_orphaned_wells.zip`: `5a454abeae6d11bd837e3c5c29cb1ea0` +- `US_orphaned_wells.xml`: `1122b28bb82aea35c880f643c3570335` + +--- + +## Census Tracts: 2021 TIGER/Line Cartographic Boundary File + +**Source:** U.S. Census Bureau +**File:** `cb_2021_us_tract_500k` (1:500,000 scale) +**Coverage:** 85,230 tracts, all 50 states + DC + territories +**CRS:** NAD83 (EPSG:4269), reprojected to WGS84 (EPSG:4326) for database storage +**Vintage:** 2021 (aligns with 2017–2021 ACS 5-year estimates) +**Download:** https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html + +### Spatial Join Notes +- 117,156 of 117,672 wells (99.6%) matched via `ST_Within` +- 516 wells on tract boundaries resolved via `ST_DWithin` (5km) then KNN (`<->`) +- 4 wells on state borders were misassigned to neighboring state tracts and manually corrected to match USGS state attribution + +### ACS Join Key +Use `wells.tract_geoid` (= `census_tracts.geoid`, 11-digit FIPS) to join to any ACS table. The 2021 5-year estimates are the recommended vintage. + +**Suggested ACS tables for EJ analysis:** + +| Table | Content | +|---|---| +| B19013 | Median household income | +| B03002 | Race and Hispanic/Latino origin | +| B17001 | Poverty status | +| B25035 | Median year structure built (housing age proxy) | +| B23025 | Employment status | +| B15003 | Educational attainment | + +--- + +## Plugging Cost Estimates + +### Raimi et al. (2021) — Primary Reference + +**Citation:** +Raimi, D., Krupnick, A., Shah, J.S., and Thompson, A., 2021, Decommissioning Orphaned and Abandoned Oil and Gas Wells: New Estimates and Cost Drivers. *Environmental Science & Technology*, 55(15), 10224–10230. https://doi.org/10.1021/acs.est.1c02234 + +**Organization:** Resources for the Future (RFF) +**Estimates:** Low $9,000 / Median $76,000 / High $280,000 per well +**Method:** Bottom-up engineering cost model using 2.1 million documented wells; variables include depth, casing, age, location, regulatory requirements. +**Scope:** National (onshore U.S.) +**Use in this project:** Primary basis for `state_liability` calculated fields. + +### EPA OLEM (2018) + +Older EPA estimate widely cited in policy documents. Central estimate $25,000. Considered low by most recent literature due to pre-inflation data and exclusion of complex well types. Use with caution. + +### Carbon Tracker (2020) + +**Citation:** Carbon Tracker Initiative, 2020. *Fault Lines: How Diverging Oil and Gas Company Strategies Link to Stranded Asset Risk.* +Emphasizes investor/financial risk framing; useful for Chapter 5 stranded asset discussion. + +### IOGCC (2023) + +State-reported figures aggregated by the Interstate Oil and Gas Compact Commission. Methodology varies significantly by state; use for within-state comparisons, not cross-state. + +### Pennsylvania DEP (2022) + +Actual program expenditure data from PA DEP plugging contracts 2016–2022. Mid estimate $68,000. PA is one of the most data-rich state programs and can serve as a benchmark for high-documentation states. + +--- + +## IIJA / Bipartisan Infrastructure Law Funding + +**Legislation:** Infrastructure Investment and Jobs Act (IIJA), Public Law 117-58, signed November 15, 2021 +**Program:** Orphaned Well Site Plugging, Remediation, and Restoration Program +**Administering agency:** Office of Surface Mining Reclamation and Enforcement (OSMRE), Dept. of the Interior +**Total appropriation:** $4.7 billion over 5 years + +### Program Structure + +| Phase | Amount | Mechanism | Status | +|---|---|---|---| +| Initial grants | $25 million | Formula to states with existing programs | Announced 2021 | +| Phase 1 formula grants | $150 million | Formula based on documented well counts | Announced Nov 2022 | +| Phase 2 performance grants | $4.275 billion | Competitive, based on state plugging performance | Ongoing | +| Federal lands | $115 million | OSMRE direct plugging on federal land | Ongoing | + +**Phase 1 per-state allocations in `state_liability.iija_phase1_formula_usd`** are approximate figures from DOI press releases (Nov 2022). Verify exact amounts from official OSMRE grant letters before publication. +**Source:** https://www.doi.gov/pressreleases + +### Coverage Gap +Using Raimi et al. (2021) median estimates ($76,000/well × 117,672 wells): +- **Estimated national liability:** ~$8.94 billion +- **IIJA Phase 1 total:** ~$310 million +- **Coverage:** ~3.5% of median estimated liability + +This gap is the central financial argument of Chapter 5. + +--- + +## State Governance Data (RA-collected) + +### Transition Offices + +**Source:** Climate Policy Dashboard — Just Transition Offices and Staff +https://www.climatepolicydashboard.org/policies/climate-governance-equity/just-transition-offices-and-staff +**Coding protocol:** See `Undergrad Student Instructions.md` in project research files +**Coded by:** Julian Tong, RA +**PI supervision:** Dr. David P. Adams + +### Plugging Prioritization Schemes + +**Source:** IOGCC Prioritization Report, July 10, 2023 +https://oklahoma.gov/content/dam/ok/en/iogcc/documents/publications/prioritization_report_7.10.23.pdf +**Coding protocol:** See `Undergrad Student Instructions.md` +**Coded by:** Julian Tong, RA + +### Theoretical Framework + +The engineering/justice typology (Adams) classifies state prioritization approaches: + +- **Engineering:** Prioritizes technical risk factors (methane, groundwater, pressure) without explicit equity/density dimensions +- **Justice:** Explicitly incorporates DAC scores, EJ indexes, or disadvantaged community status into scoring +- **Mixed (density-aware):** Uses population density or urban/rural classification but not explicit EJ language + +The `v_state_governance.framework_type` column implements this classification automatically from RA-coded variables. diff --git a/docs/database-schema.md b/docs/database-schema.md new file mode 100644 index 0000000..1796d78 --- /dev/null +++ b/docs/database-schema.md @@ -0,0 +1,246 @@ +# Database Schema Reference + +**Database:** `orphaned_wells` +**Engine:** PostgreSQL 18 with PostGIS +**Connection:** `psql -U postgres -h localhost -d orphaned_wells` +**Last updated:** March 2026 + +--- + +## Tables + +### `wells` + +Primary data table. 117,672 documented unplugged orphaned wells across 27 U.S. states. +Source: USGS DOW Dataset (Grove & Merrill 2022). Geometry loaded from shapefile, reprojected to EPSG:4326. + +| Column | Type | Description | +|---|---|---| +| `gid` | integer | Primary key (auto) | +| `api_number` | varchar | 14-digit API well number (stripped of `API:` prefix) | +| `state` | varchar | Full state name (from USGS source data) | +| `county` | varchar | County name (from USGS source data) | +| `well_name` | varchar | Well name — typically operator + lease name | +| `well_number` | varchar | Order within lease/operator permit sequence | +| `type` | varchar | Raw well type as reported by state agency (119 distinct values) | +| `well_type_normalized` | text | Canonical type: Oil, Gas, Oil & Gas, Injection/Disposal, Dry/Exploratory, Water/Brine, Coalbed Methane, Enhanced Recovery, Gas Storage, Observation/Monitor, Other/Administrative, Unknown/Unspecified | +| `status` | varchar | Well status as reported by state agency | +| `latitude` | numeric | State-provided latitude (decimal degrees, NAD83/WGS84) | +| `longitude` | numeric | State-provided longitude (decimal degrees, NAD83/WGS84) | +| `principal_meridian` | varchar | PLSS principal meridian | +| `township` | numeric | PLSS township number | +| `t_dir` | varchar | Township direction (N/S) | +| `range` | numeric | PLSS range number | +| `r_dir` | varchar | Range direction (E/W) | +| `section` | numeric | PLSS section (1–36; some LA wells non-standard) | +| `qtr` | varchar | PLSS quarter section (¼ sq mi) | +| `qtr_qtr` | varchar | PLSS quarter-quarter section (1/16 sq mi) | +| `qtr_qtr_qtr` | varchar | PLSS quarter-quarter-quarter (1/64 sq mi) | +| `source` | varchar | State agency that provided the data | +| `data_file_date` | date | Date data was last updated by source agency | +| `well_info_notes` | varchar | Additional well information from source | +| `location_notes` | varchar | Location/coordinate methodology notes | +| `other_notes` | varchar | Other notes (often includes status date) | +| `geom` | geometry(Point, 4326) | PostGIS point in WGS84 | +| `state_fips` | char(2) | 2-digit Census FIPS state code | +| `tract_geoid` | char(11) | 11-digit Census tract FIPS (join key to ACS) | +| `tract_name` | text | Census tract label (e.g., "Census Tract 2048") | +| `county_fips` | char(3) | 3-digit Census county FIPS | +| `county_name` | text | County name from 2021 TIGER/Line | +| `state_usps` | char(2) | 2-letter USPS state abbreviation (from spatial join) | +| `tract_aland_m2` | bigint | Tract land area in square meters | +| `tract_awater_m2` | bigint | Tract water area in square meters | + +**Indexes:** `api_number`, `state`, `state_fips`, `state_usps`, `county_fips`, `tract_geoid`, `well_type_normalized`, `geom` (GIST) + +**Note on `well_type_normalized`:** 59.3% of wells have `Unknown/Unspecified` type because Ohio (20,557), Pennsylvania (19,160), Kentucky (12,695), Kansas (5,477), and several other states submitted data without type classification. See `data-sources.md` for details. + +--- + +### `census_tracts` + +2021 TIGER/Line cartographic boundary file (1:500k). 85,230 tracts covering all 50 states, DC, and territories. + +| Column | Type | Description | +|---|---|---| +| `gid` | integer | Primary key | +| `statefp` | varchar(2) | 2-digit state FIPS | +| `countyfp` | varchar(3) | 3-digit county FIPS | +| `tractce` | varchar(6) | 6-digit tract code | +| `affgeoid` | varchar(20) | Affinity GEOID | +| `geoid` | varchar(11) | 11-digit FIPS (state + county + tract) — ACS join key | +| `name` | varchar(100) | Tract number | +| `namelsad` | varchar(100) | Full tract name (e.g., "Census Tract 1042.01") | +| `stusps` | varchar(2) | 2-letter state postal abbreviation | +| `namelsadco` | varchar(100) | County name | +| `state_name` | varchar(100) | Full state name | +| `lsad` | varchar(2) | Legal/statistical area description code | +| `aland` | numeric | Land area in square meters | +| `awater` | numeric | Water area in square meters | +| `geom` | geometry(MultiPolygon, 4326) | PostGIS polygon in WGS84 | + +**Indexes:** `geoid`, `statefp`, `stusps`, `geom` (GIST) + +--- + +### `state_transition_offices` + +State-level just transition offices and programs. **One row per office/program.** +Coded by RA from [Climate Policy Dashboard](https://www.climatepolicydashboard.org/policies/climate-governance-equity/just-transition-offices-and-staff). +Populates the `v_state_governance` and `v_ch4_state_analysis` views. + +| Column | Type | Description | +|---|---|---| +| `id` | serial | Primary key | +| `state` | char(2) | USPS state abbreviation | +| `state_name` | text | Full state name | +| `office_name` | text | Exact name of office/program; `NA` if none | +| `year_established` | integer | Year established; NULL = not stated | +| `target_text` | text | Verbatim description of who the office serves | +| `code_fossil` | smallint | 1 = explicitly mentions oil/gas/coal/mining/power plant workers; 0 = absent | +| `code_equity` | smallint | 1 = explicitly mentions equity/justice/disadvantaged/low-income/EJ; 0 = absent | +| `source_url` | text | URL to Climate Policy Dashboard entry | +| `date_collected` | date | Date RA retrieved data | +| `collected_by` | text | RA name | +| `notes` | text | Ambiguity, edge cases, interpretation notes | + +**Coding rules:** +- `code_fossil = 1` ONLY with explicit language about fossil fuel workers, not merely "economic transition." +- `code_equity = 1` ONLY with explicit EJ/disadvantaged/low-income language, not merely "communities." +- States not on the dashboard: do not create a row unless instructed. +- States listed but with no program: one row with `office_name = NA`, codes = 0. + +--- + +### `state_prioritization` + +State orphaned well plugging prioritization schemes. **One row per state.** +Coded by RA from IOGCC Prioritization Report (July 2023). + +| Column | Type | Description | +|---|---|---| +| `id` | serial | Primary key | +| `state` | char(2) | USPS state abbreviation (unique) | +| `state_name` | text | Full state name | +| `system_type` | text | Brief description of scoring system (e.g., `1–100 score`); `NA` if not stated | +| `tech_factors` | text | Semicolon-separated list of technical/physical factors | +| `code_rural_urban` | smallint | 1 = explicitly uses population density or urban/rural classification as scoring factor | +| `code_vuln` | smallint | 1 = explicitly uses EJ/disadvantaged/low-income/DAC as prioritization factor | +| `code_surface` | smallint | 1 = explicitly uses surface land use as prioritization factor | +| `pdf_page` | text | Page(s) in IOGCC report; `NA` if uncertain | +| `source_quote` | text | 1–2 key sentences justifying coding decisions | +| `source_url` | text | IOGCC PDF URL | +| `date_collected` | date | Date RA retrieved data | +| `collected_by` | text | RA name | +| `notes` | text | Missing-state issues, interpretation risks | + +**Critical coding distinctions:** +- `code_rural_urban`: "distance to nearest building" is NOT sufficient — requires explicit urban/rural or density language. +- `code_vuln`: "near homes/schools" is NOT sufficient — requires explicit EJ/disadvantaged/DAC language about community *status*. +- States not in IOGCC report: one row, `system_type = NA`, all codes = 0, explain in notes. + +--- + +### `state_liability` + +State-level financial liability estimates. Auto-populated from `wells` table; IIJA funding entered manually from DOI/OSMRE announcements. + +| Column | Type | Description | +|---|---|---| +| `state` | char(2) | USPS abbreviation (unique) | +| `state_name` | text | Full state name | +| `well_count_dow` | integer | Wells in USGS DOW dataset | +| `iija_phase1_formula_usd` | bigint | IIJA/BIL Phase 1 formula grant (Nov 2022) | +| `iija_phase2_perf_usd` | bigint | Phase 2 performance grant (to be populated) | +| `iija_source` | text | Citation for funding figures | +| `est_liability_low_usd` | numeric | Generated: `well_count × $9,000` (Raimi et al. low) | +| `est_liability_mid_usd` | numeric | Generated: `well_count × $76,000` (Raimi et al. median) | +| `est_liability_high_usd` | numeric | Generated: `well_count × $280,000` (Raimi et al. high) | +| `bonding_required` | boolean | Does state require bonds for active wells? | +| `bonding_adequacy` | text | `adequate` / `inadequate` / `unknown` / `NA` | +| `bonding_notes` | text | Bonding context | +| `cost_estimate_source` | text | Most applicable `plugging_cost_references` entry | +| `notes` | text | State-specific context | + +**National totals (Raimi et al. mid-range):** $8.94 billion estimated liability; $310 million IIJA Phase 1 coverage = **3.47% funded.** + +--- + +### `plugging_cost_references` + +Five key published cost estimate sources for footnoting and methodology. + +| Source | Year | Low/Well | Mid/Well | High/Well | Scope | +|---|---|---|---|---|---| +| EPA OLEM | 2018 | $5,000 | $25,000 | $85,000 | National | +| Raimi et al. (RFF) | 2021 | $9,000 | $76,000 | $280,000 | National | +| Carbon Tracker | 2020 | $20,000 | $82,000 | $300,000 | National | +| IOGCC | 2023 | $5,000 | $33,000 | $150,000 | State-reported | +| PA DEP | 2022 | $10,000 | $68,000 | $220,000 | Pennsylvania | + +--- + +### `data_sources` + +The 27 state agencies and 3 ancillary sources that contributed to the USGS DOW dataset. + +| Column | Type | Description | +|---|---|---| +| `source_name` | text | Full agency name | +| `source_type` | text | `state_agency` / `federal` / `ngo` / `software` | +| `state` | text | State served | +| `description` | text | Data type provided | + +--- + +### `dataset_metadata` + +Full ScienceBase JSON and FGDC XML provenance for the primary USGS dataset. One row. + +Key fields: `doi`, `citation`, `summary`, `purpose`, `publication_date`, `data_start_date`, `data_end_date`, `bounding_box`, `sciencebase_url`, `related_report_doi`, `mapping_app_url`, `file_csv_md5`, `file_zip_md5`. + +--- + +### `dataset_contacts`, `dataset_tags`, `processing_steps` + +Supporting metadata tables from ScienceBase and FGDC XML. See `data-sources.md` for details. + +--- + +## Views + +| View | Chapter | Description | +|---|---|---| +| `v_wells_by_state` | 4 | Well counts, type breakdown, avg coordinates, date range per state | +| `v_wells_by_type` | 4–5 | Raw type distribution (119 values) across states | +| `v_wells_by_status` | 4 | Status classification distribution | +| `v_state_type_summary` | 4 | State × normalized type cross-tabulation | +| `v_wells_by_tract` | 4–5 | Well counts, type breakdown, density (wells/km²) per census tract | +| `v_wells_by_county` | 4–5 | County-level rollup with 5-digit GEOID | +| `v_highest_density_tracts` | 4 | Tracts ranked by well density (≥1 km² only) | +| `v_data_completeness` | — | Non-null/non-empty counts per column | +| `v_state_governance` | 4 | Combines transition offices + prioritization → `framework_type` classification | +| `v_ch4_state_analysis` | 4 | Governance framework × spatial × financial per state | +| `v_ch5_liability_summary` | 5 | State-level liability estimates vs. IIJA funding | +| `v_ch5_national_totals` | 5 | National aggregate: total liability, IIJA coverage percentage | + +--- + +## Key Joins + +```sql +-- Wells → ACS (via tract_geoid) +SELECT w.*, acs.* +FROM wells w +JOIN acs_2021_b19013 acs ON w.tract_geoid = acs.geoid; + +-- Wells → state governance framework +SELECT w.api_number, w.state, sg.framework_type +FROM wells w +JOIN v_state_governance sg ON w.state_usps = sg.state; + +-- County liability summary +SELECT v.*, sl.est_liability_mid_usd +FROM v_wells_by_county v +JOIN state_liability sl ON v.state_usps = sl.state; +```