# Database Schema Reference **Database:** `orphaned_wells` **Engine:** PostgreSQL 18 with PostGIS **Connection:** `psql -U postgres -h localhost -d orphaned_wells` **Last updated:** March 2026 --- ## Tables ### `wells` Primary data table. 117,672 documented unplugged orphaned wells across 27 U.S. states. Source: USGS DOW Dataset (Grove & Merrill 2022). Geometry loaded from shapefile, reprojected to EPSG:4326. | Column | Type | Description | |---|---|---| | `gid` | integer | Primary key (auto) | | `api_number` | varchar | 14-digit API well number (stripped of `API:` prefix) | | `state` | varchar | Full state name (from USGS source data) | | `county` | varchar | County name (from USGS source data) | | `well_name` | varchar | Well name — typically operator + lease name | | `well_number` | varchar | Order within lease/operator permit sequence | | `type` | varchar | Raw well type as reported by state agency (119 distinct values) | | `well_type_normalized` | text | Canonical type: Oil, Gas, Oil & Gas, Injection/Disposal, Dry/Exploratory, Water/Brine, Coalbed Methane, Enhanced Recovery, Gas Storage, Observation/Monitor, Other/Administrative, Unknown/Unspecified | | `status` | varchar | Well status as reported by state agency | | `latitude` | numeric | State-provided latitude (decimal degrees, NAD83/WGS84) | | `longitude` | numeric | State-provided longitude (decimal degrees, NAD83/WGS84) | | `principal_meridian` | varchar | PLSS principal meridian | | `township` | numeric | PLSS township number | | `t_dir` | varchar | Township direction (N/S) | | `range` | numeric | PLSS range number | | `r_dir` | varchar | Range direction (E/W) | | `section` | numeric | PLSS section (1–36; some LA wells non-standard) | | `qtr` | varchar | PLSS quarter section (¼ sq mi) | | `qtr_qtr` | varchar | PLSS quarter-quarter section (1/16 sq mi) | | `qtr_qtr_qtr` | varchar | PLSS quarter-quarter-quarter (1/64 sq mi) | | `source` | varchar | State agency that provided the data | | `data_file_date` | date | Date data was last updated by source agency | | `well_info_notes` | varchar | Additional well information from source | | `location_notes` | varchar | Location/coordinate methodology notes | | `other_notes` | varchar | Other notes (often includes status date) | | `geom` | geometry(Point, 4326) | PostGIS point in WGS84 | | `state_fips` | char(2) | 2-digit Census FIPS state code | | `tract_geoid` | char(11) | 11-digit Census tract FIPS (join key to ACS) | | `tract_name` | text | Census tract label (e.g., "Census Tract 2048") | | `county_fips` | char(3) | 3-digit Census county FIPS | | `county_name` | text | County name from 2021 TIGER/Line | | `state_usps` | char(2) | 2-letter USPS state abbreviation (from spatial join) | | `tract_aland_m2` | bigint | Tract land area in square meters | | `tract_awater_m2` | bigint | Tract water area in square meters | **Indexes:** `api_number`, `state`, `state_fips`, `state_usps`, `county_fips`, `tract_geoid`, `well_type_normalized`, `geom` (GIST) **Note on `well_type_normalized`:** 59.3% of wells have `Unknown/Unspecified` type because Ohio (20,557), Pennsylvania (19,160), Kentucky (12,695), Kansas (5,477), and several other states submitted data without type classification. See `data-sources.md` for details. --- ### `census_tracts` 2021 TIGER/Line cartographic boundary file (1:500k). 85,230 tracts covering all 50 states, DC, and territories. | Column | Type | Description | |---|---|---| | `gid` | integer | Primary key | | `statefp` | varchar(2) | 2-digit state FIPS | | `countyfp` | varchar(3) | 3-digit county FIPS | | `tractce` | varchar(6) | 6-digit tract code | | `affgeoid` | varchar(20) | Affinity GEOID | | `geoid` | varchar(11) | 11-digit FIPS (state + county + tract) — ACS join key | | `name` | varchar(100) | Tract number | | `namelsad` | varchar(100) | Full tract name (e.g., "Census Tract 1042.01") | | `stusps` | varchar(2) | 2-letter state postal abbreviation | | `namelsadco` | varchar(100) | County name | | `state_name` | varchar(100) | Full state name | | `lsad` | varchar(2) | Legal/statistical area description code | | `aland` | numeric | Land area in square meters | | `awater` | numeric | Water area in square meters | | `geom` | geometry(MultiPolygon, 4326) | PostGIS polygon in WGS84 | **Indexes:** `geoid`, `statefp`, `stusps`, `geom` (GIST) --- ### `state_transition_offices` State-level just transition offices and programs. **One row per office/program.** Coded by RA from [Climate Policy Dashboard](https://www.climatepolicydashboard.org/policies/climate-governance-equity/just-transition-offices-and-staff). Populates the `v_state_governance` and `v_ch4_state_analysis` views. | Column | Type | Description | |---|---|---| | `id` | serial | Primary key | | `state` | char(2) | USPS state abbreviation | | `state_name` | text | Full state name | | `office_name` | text | Exact name of office/program; `NA` if none | | `year_established` | integer | Year established; NULL = not stated | | `target_text` | text | Verbatim description of who the office serves | | `code_fossil` | smallint | 1 = explicitly mentions oil/gas/coal/mining/power plant workers; 0 = absent | | `code_equity` | smallint | 1 = explicitly mentions equity/justice/disadvantaged/low-income/EJ; 0 = absent | | `source_url` | text | URL to Climate Policy Dashboard entry | | `date_collected` | date | Date RA retrieved data | | `collected_by` | text | RA name | | `notes` | text | Ambiguity, edge cases, interpretation notes | **Coding rules:** - `code_fossil = 1` ONLY with explicit language about fossil fuel workers, not merely "economic transition." - `code_equity = 1` ONLY with explicit EJ/disadvantaged/low-income language, not merely "communities." - States not on the dashboard: do not create a row unless instructed. - States listed but with no program: one row with `office_name = NA`, codes = 0. --- ### `state_prioritization` State orphaned well plugging prioritization schemes. **One row per state.** Coded by RA from IOGCC Prioritization Report (July 2023). | Column | Type | Description | |---|---|---| | `id` | serial | Primary key | | `state` | char(2) | USPS state abbreviation (unique) | | `state_name` | text | Full state name | | `system_type` | text | Brief description of scoring system (e.g., `1–100 score`); `NA` if not stated | | `tech_factors` | text | Semicolon-separated list of technical/physical factors | | `code_rural_urban` | smallint | 1 = explicitly uses population density or urban/rural classification as scoring factor | | `code_vuln` | smallint | 1 = explicitly uses EJ/disadvantaged/low-income/DAC as prioritization factor | | `code_surface` | smallint | 1 = explicitly uses surface land use as prioritization factor | | `pdf_page` | text | Page(s) in IOGCC report; `NA` if uncertain | | `source_quote` | text | 1–2 key sentences justifying coding decisions | | `source_url` | text | IOGCC PDF URL | | `date_collected` | date | Date RA retrieved data | | `collected_by` | text | RA name | | `notes` | text | Missing-state issues, interpretation risks | **Critical coding distinctions:** - `code_rural_urban`: "distance to nearest building" is NOT sufficient — requires explicit urban/rural or density language. - `code_vuln`: "near homes/schools" is NOT sufficient — requires explicit EJ/disadvantaged/DAC language about community *status*. - States not in IOGCC report: one row, `system_type = NA`, all codes = 0, explain in notes. --- ### `state_liability` State-level financial liability estimates. Auto-populated from `wells` table; IIJA funding entered manually from DOI/OSMRE announcements. | Column | Type | Description | |---|---|---| | `state` | char(2) | USPS abbreviation (unique) | | `state_name` | text | Full state name | | `well_count_dow` | integer | Wells in USGS DOW dataset | | `iija_phase1_formula_usd` | bigint | IIJA/BIL Phase 1 formula grant (Nov 2022) | | `iija_phase2_perf_usd` | bigint | Phase 2 performance grant (to be populated) | | `iija_source` | text | Citation for funding figures | | `est_liability_low_usd` | numeric | Generated: `well_count × $9,000` (Raimi et al. low) | | `est_liability_mid_usd` | numeric | Generated: `well_count × $76,000` (Raimi et al. median) | | `est_liability_high_usd` | numeric | Generated: `well_count × $280,000` (Raimi et al. high) | | `bonding_required` | boolean | Does state require bonds for active wells? | | `bonding_adequacy` | text | `adequate` / `inadequate` / `unknown` / `NA` | | `bonding_notes` | text | Bonding context | | `cost_estimate_source` | text | Most applicable `plugging_cost_references` entry | | `notes` | text | State-specific context | **National totals (Raimi et al. mid-range):** $8.94 billion estimated liability; $310 million IIJA Phase 1 coverage = **3.47% funded.** --- ### `plugging_cost_references` Five key published cost estimate sources for footnoting and methodology. | Source | Year | Low/Well | Mid/Well | High/Well | Scope | |---|---|---|---|---|---| | EPA OLEM | 2018 | $5,000 | $25,000 | $85,000 | National | | Raimi et al. (RFF) | 2021 | $9,000 | $76,000 | $280,000 | National | | Carbon Tracker | 2020 | $20,000 | $82,000 | $300,000 | National | | IOGCC | 2023 | $5,000 | $33,000 | $150,000 | State-reported | | PA DEP | 2022 | $10,000 | $68,000 | $220,000 | Pennsylvania | --- ### `data_sources` The 27 state agencies and 3 ancillary sources that contributed to the USGS DOW dataset. | Column | Type | Description | |---|---|---| | `source_name` | text | Full agency name | | `source_type` | text | `state_agency` / `federal` / `ngo` / `software` | | `state` | text | State served | | `description` | text | Data type provided | --- ### `dataset_metadata` Full ScienceBase JSON and FGDC XML provenance for the primary USGS dataset. One row. Key fields: `doi`, `citation`, `summary`, `purpose`, `publication_date`, `data_start_date`, `data_end_date`, `bounding_box`, `sciencebase_url`, `related_report_doi`, `mapping_app_url`, `file_csv_md5`, `file_zip_md5`. --- ### `dataset_contacts`, `dataset_tags`, `processing_steps` Supporting metadata tables from ScienceBase and FGDC XML. See `data-sources.md` for details. --- ## Views | View | Chapter | Description | |---|---|---| | `v_wells_by_state` | 4 | Well counts, type breakdown, avg coordinates, date range per state | | `v_wells_by_type` | 4–5 | Raw type distribution (119 values) across states | | `v_wells_by_status` | 4 | Status classification distribution | | `v_state_type_summary` | 4 | State × normalized type cross-tabulation | | `v_wells_by_tract` | 4–5 | Well counts, type breakdown, density (wells/km²) per census tract | | `v_wells_by_county` | 4–5 | County-level rollup with 5-digit GEOID | | `v_highest_density_tracts` | 4 | Tracts ranked by well density (≥1 km² only) | | `v_data_completeness` | — | Non-null/non-empty counts per column | | `v_state_governance` | 4 | Combines transition offices + prioritization → `framework_type` classification | | `v_ch4_state_analysis` | 4 | Governance framework × spatial × financial per state | | `v_ch5_liability_summary` | 5 | State-level liability estimates vs. IIJA funding | | `v_ch5_national_totals` | 5 | National aggregate: total liability, IIJA coverage percentage | --- ## Key Joins ```sql -- Wells → ACS (via tract_geoid) SELECT w.*, acs.* FROM wells w JOIN acs_2021_b19013 acs ON w.tract_geoid = acs.geoid; -- Wells → state governance framework SELECT w.api_number, w.state, sg.framework_type FROM wells w JOIN v_state_governance sg ON w.state_usps = sg.state; -- County liability summary SELECT v.*, sl.est_liability_mid_usd FROM v_wells_by_county v JOIN state_liability sl ON v.state_usps = sl.state; ```