Files
new-orphan-wells/docs/database-schema.md

12 KiB
Raw Permalink Blame History

Database Schema Reference

Database: orphaned_wells Engine: PostgreSQL 18 with PostGIS Connection: psql -U postgres -h localhost -d orphaned_wells Last updated: March 2026


Tables

wells

Primary data table. 117,672 documented unplugged orphaned wells across 27 U.S. states. Source: USGS DOW Dataset (Grove & Merrill 2022). Geometry loaded from shapefile, reprojected to EPSG:4326.

Column Type Description
gid integer Primary key (auto)
api_number varchar 14-digit API well number (stripped of API: prefix)
state varchar Full state name (from USGS source data)
county varchar County name (from USGS source data)
well_name varchar Well name — typically operator + lease name
well_number varchar Order within lease/operator permit sequence
type varchar Raw well type as reported by state agency (119 distinct values)
well_type_normalized text Canonical type: Oil, Gas, Oil & Gas, Injection/Disposal, Dry/Exploratory, Water/Brine, Coalbed Methane, Enhanced Recovery, Gas Storage, Observation/Monitor, Other/Administrative, Unknown/Unspecified
status varchar Well status as reported by state agency
latitude numeric State-provided latitude (decimal degrees, NAD83/WGS84)
longitude numeric State-provided longitude (decimal degrees, NAD83/WGS84)
principal_meridian varchar PLSS principal meridian
township numeric PLSS township number
t_dir varchar Township direction (N/S)
range numeric PLSS range number
r_dir varchar Range direction (E/W)
section numeric PLSS section (136; some LA wells non-standard)
qtr varchar PLSS quarter section (¼ sq mi)
qtr_qtr varchar PLSS quarter-quarter section (1/16 sq mi)
qtr_qtr_qtr varchar PLSS quarter-quarter-quarter (1/64 sq mi)
source varchar State agency that provided the data
data_file_date date Date data was last updated by source agency
well_info_notes varchar Additional well information from source
location_notes varchar Location/coordinate methodology notes
other_notes varchar Other notes (often includes status date)
geom geometry(Point, 4326) PostGIS point in WGS84
state_fips char(2) 2-digit Census FIPS state code
tract_geoid char(11) 11-digit Census tract FIPS (join key to ACS)
tract_name text Census tract label (e.g., "Census Tract 2048")
county_fips char(3) 3-digit Census county FIPS
county_name text County name from 2021 TIGER/Line
state_usps char(2) 2-letter USPS state abbreviation (from spatial join)
tract_aland_m2 bigint Tract land area in square meters
tract_awater_m2 bigint Tract water area in square meters

Indexes: api_number, state, state_fips, state_usps, county_fips, tract_geoid, well_type_normalized, geom (GIST)

Note on well_type_normalized: 59.3% of wells have Unknown/Unspecified type because Ohio (20,557), Pennsylvania (19,160), Kentucky (12,695), Kansas (5,477), and several other states submitted data without type classification. See data-sources.md for details.


census_tracts

2021 TIGER/Line cartographic boundary file (1:500k). 85,230 tracts covering all 50 states, DC, and territories.

Column Type Description
gid integer Primary key
statefp varchar(2) 2-digit state FIPS
countyfp varchar(3) 3-digit county FIPS
tractce varchar(6) 6-digit tract code
affgeoid varchar(20) Affinity GEOID
geoid varchar(11) 11-digit FIPS (state + county + tract) — ACS join key
name varchar(100) Tract number
namelsad varchar(100) Full tract name (e.g., "Census Tract 1042.01")
stusps varchar(2) 2-letter state postal abbreviation
namelsadco varchar(100) County name
state_name varchar(100) Full state name
lsad varchar(2) Legal/statistical area description code
aland numeric Land area in square meters
awater numeric Water area in square meters
geom geometry(MultiPolygon, 4326) PostGIS polygon in WGS84

Indexes: geoid, statefp, stusps, geom (GIST)


state_transition_offices

State-level just transition offices and programs. One row per office/program. Coded by RA from Climate Policy Dashboard. Populates the v_state_governance and v_ch4_state_analysis views.

Column Type Description
id serial Primary key
state char(2) USPS state abbreviation
state_name text Full state name
office_name text Exact name of office/program; NA if none
year_established integer Year established; NULL = not stated
target_text text Verbatim description of who the office serves
code_fossil smallint 1 = explicitly mentions oil/gas/coal/mining/power plant workers; 0 = absent
code_equity smallint 1 = explicitly mentions equity/justice/disadvantaged/low-income/EJ; 0 = absent
source_url text URL to Climate Policy Dashboard entry
date_collected date Date RA retrieved data
collected_by text RA name
notes text Ambiguity, edge cases, interpretation notes

Coding rules:

  • code_fossil = 1 ONLY with explicit language about fossil fuel workers, not merely "economic transition."
  • code_equity = 1 ONLY with explicit EJ/disadvantaged/low-income language, not merely "communities."
  • States not on the dashboard: do not create a row unless instructed.
  • States listed but with no program: one row with office_name = NA, codes = 0.

state_prioritization

State orphaned well plugging prioritization schemes. One row per state. Coded by RA from IOGCC Prioritization Report (July 2023).

Column Type Description
id serial Primary key
state char(2) USPS state abbreviation (unique)
state_name text Full state name
system_type text Brief description of scoring system (e.g., 1100 score); NA if not stated
tech_factors text Semicolon-separated list of technical/physical factors
code_rural_urban smallint 1 = explicitly uses population density or urban/rural classification as scoring factor
code_vuln smallint 1 = explicitly uses EJ/disadvantaged/low-income/DAC as prioritization factor
code_surface smallint 1 = explicitly uses surface land use as prioritization factor
pdf_page text Page(s) in IOGCC report; NA if uncertain
source_quote text 12 key sentences justifying coding decisions
source_url text IOGCC PDF URL
date_collected date Date RA retrieved data
collected_by text RA name
notes text Missing-state issues, interpretation risks

Critical coding distinctions:

  • code_rural_urban: "distance to nearest building" is NOT sufficient — requires explicit urban/rural or density language.
  • code_vuln: "near homes/schools" is NOT sufficient — requires explicit EJ/disadvantaged/DAC language about community status.
  • States not in IOGCC report: one row, system_type = NA, all codes = 0, explain in notes.

state_liability

State-level financial liability estimates. Auto-populated from wells table; IIJA funding entered manually from DOI/OSMRE announcements.

Column Type Description
state char(2) USPS abbreviation (unique)
state_name text Full state name
well_count_dow integer Wells in USGS DOW dataset
iija_phase1_formula_usd bigint IIJA/BIL Phase 1 formula grant (Nov 2022)
iija_phase2_perf_usd bigint Phase 2 performance grant (to be populated)
iija_source text Citation for funding figures
est_liability_low_usd numeric Generated: well_count × $9,000 (Raimi et al. low)
est_liability_mid_usd numeric Generated: well_count × $76,000 (Raimi et al. median)
est_liability_high_usd numeric Generated: well_count × $280,000 (Raimi et al. high)
bonding_required boolean Does state require bonds for active wells?
bonding_adequacy text adequate / inadequate / unknown / NA
bonding_notes text Bonding context
cost_estimate_source text Most applicable plugging_cost_references entry
notes text State-specific context

National totals (Raimi et al. mid-range): $8.94 billion estimated liability; $310 million IIJA Phase 1 coverage = 3.47% funded.


plugging_cost_references

Five key published cost estimate sources for footnoting and methodology.

Source Year Low/Well Mid/Well High/Well Scope
EPA OLEM 2018 $5,000 $25,000 $85,000 National
Raimi et al. (RFF) 2021 $9,000 $76,000 $280,000 National
Carbon Tracker 2020 $20,000 $82,000 $300,000 National
IOGCC 2023 $5,000 $33,000 $150,000 State-reported
PA DEP 2022 $10,000 $68,000 $220,000 Pennsylvania

data_sources

The 27 state agencies and 3 ancillary sources that contributed to the USGS DOW dataset.

Column Type Description
source_name text Full agency name
source_type text state_agency / federal / ngo / software
state text State served
description text Data type provided

dataset_metadata

Full ScienceBase JSON and FGDC XML provenance for the primary USGS dataset. One row.

Key fields: doi, citation, summary, purpose, publication_date, data_start_date, data_end_date, bounding_box, sciencebase_url, related_report_doi, mapping_app_url, file_csv_md5, file_zip_md5.


dataset_contacts, dataset_tags, processing_steps

Supporting metadata tables from ScienceBase and FGDC XML. See data-sources.md for details.


Views

View Chapter Description
v_wells_by_state 4 Well counts, type breakdown, avg coordinates, date range per state
v_wells_by_type 45 Raw type distribution (119 values) across states
v_wells_by_status 4 Status classification distribution
v_state_type_summary 4 State × normalized type cross-tabulation
v_wells_by_tract 45 Well counts, type breakdown, density (wells/km²) per census tract
v_wells_by_county 45 County-level rollup with 5-digit GEOID
v_highest_density_tracts 4 Tracts ranked by well density (≥1 km² only)
v_data_completeness Non-null/non-empty counts per column
v_state_governance 4 Combines transition offices + prioritization → framework_type classification
v_ch4_state_analysis 4 Governance framework × spatial × financial per state
v_ch5_liability_summary 5 State-level liability estimates vs. IIJA funding
v_ch5_national_totals 5 National aggregate: total liability, IIJA coverage percentage

Key Joins

-- Wells → ACS (via tract_geoid)
SELECT w.*, acs.*
FROM wells w
JOIN acs_2021_b19013 acs ON w.tract_geoid = acs.geoid;

-- Wells → state governance framework
SELECT w.api_number, w.state, sg.framework_type
FROM wells w
JOIN v_state_governance sg ON w.state_usps = sg.state;

-- County liability summary
SELECT v.*, sl.est_liability_mid_usd
FROM v_wells_by_county v
JOIN state_liability sl ON v.state_usps = sl.state;