Files
new-orphan-wells/docs/database-schema.md

247 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Database Schema Reference
**Database:** `orphaned_wells`
**Engine:** PostgreSQL 18 with PostGIS
**Connection:** `psql -U postgres -h localhost -d orphaned_wells`
**Last updated:** March 2026
---
## Tables
### `wells`
Primary data table. 117,672 documented unplugged orphaned wells across 27 U.S. states.
Source: USGS DOW Dataset (Grove & Merrill 2022). Geometry loaded from shapefile, reprojected to EPSG:4326.
| Column | Type | Description |
|---|---|---|
| `gid` | integer | Primary key (auto) |
| `api_number` | varchar | 14-digit API well number (stripped of `API:` prefix) |
| `state` | varchar | Full state name (from USGS source data) |
| `county` | varchar | County name (from USGS source data) |
| `well_name` | varchar | Well name — typically operator + lease name |
| `well_number` | varchar | Order within lease/operator permit sequence |
| `type` | varchar | Raw well type as reported by state agency (119 distinct values) |
| `well_type_normalized` | text | Canonical type: Oil, Gas, Oil & Gas, Injection/Disposal, Dry/Exploratory, Water/Brine, Coalbed Methane, Enhanced Recovery, Gas Storage, Observation/Monitor, Other/Administrative, Unknown/Unspecified |
| `status` | varchar | Well status as reported by state agency |
| `latitude` | numeric | State-provided latitude (decimal degrees, NAD83/WGS84) |
| `longitude` | numeric | State-provided longitude (decimal degrees, NAD83/WGS84) |
| `principal_meridian` | varchar | PLSS principal meridian |
| `township` | numeric | PLSS township number |
| `t_dir` | varchar | Township direction (N/S) |
| `range` | numeric | PLSS range number |
| `r_dir` | varchar | Range direction (E/W) |
| `section` | numeric | PLSS section (136; some LA wells non-standard) |
| `qtr` | varchar | PLSS quarter section (¼ sq mi) |
| `qtr_qtr` | varchar | PLSS quarter-quarter section (1/16 sq mi) |
| `qtr_qtr_qtr` | varchar | PLSS quarter-quarter-quarter (1/64 sq mi) |
| `source` | varchar | State agency that provided the data |
| `data_file_date` | date | Date data was last updated by source agency |
| `well_info_notes` | varchar | Additional well information from source |
| `location_notes` | varchar | Location/coordinate methodology notes |
| `other_notes` | varchar | Other notes (often includes status date) |
| `geom` | geometry(Point, 4326) | PostGIS point in WGS84 |
| `state_fips` | char(2) | 2-digit Census FIPS state code |
| `tract_geoid` | char(11) | 11-digit Census tract FIPS (join key to ACS) |
| `tract_name` | text | Census tract label (e.g., "Census Tract 2048") |
| `county_fips` | char(3) | 3-digit Census county FIPS |
| `county_name` | text | County name from 2021 TIGER/Line |
| `state_usps` | char(2) | 2-letter USPS state abbreviation (from spatial join) |
| `tract_aland_m2` | bigint | Tract land area in square meters |
| `tract_awater_m2` | bigint | Tract water area in square meters |
**Indexes:** `api_number`, `state`, `state_fips`, `state_usps`, `county_fips`, `tract_geoid`, `well_type_normalized`, `geom` (GIST)
**Note on `well_type_normalized`:** 59.3% of wells have `Unknown/Unspecified` type because Ohio (20,557), Pennsylvania (19,160), Kentucky (12,695), Kansas (5,477), and several other states submitted data without type classification. See `data-sources.md` for details.
---
### `census_tracts`
2021 TIGER/Line cartographic boundary file (1:500k). 85,230 tracts covering all 50 states, DC, and territories.
| Column | Type | Description |
|---|---|---|
| `gid` | integer | Primary key |
| `statefp` | varchar(2) | 2-digit state FIPS |
| `countyfp` | varchar(3) | 3-digit county FIPS |
| `tractce` | varchar(6) | 6-digit tract code |
| `affgeoid` | varchar(20) | Affinity GEOID |
| `geoid` | varchar(11) | 11-digit FIPS (state + county + tract) — ACS join key |
| `name` | varchar(100) | Tract number |
| `namelsad` | varchar(100) | Full tract name (e.g., "Census Tract 1042.01") |
| `stusps` | varchar(2) | 2-letter state postal abbreviation |
| `namelsadco` | varchar(100) | County name |
| `state_name` | varchar(100) | Full state name |
| `lsad` | varchar(2) | Legal/statistical area description code |
| `aland` | numeric | Land area in square meters |
| `awater` | numeric | Water area in square meters |
| `geom` | geometry(MultiPolygon, 4326) | PostGIS polygon in WGS84 |
**Indexes:** `geoid`, `statefp`, `stusps`, `geom` (GIST)
---
### `state_transition_offices`
State-level just transition offices and programs. **One row per office/program.**
Coded by RA from [Climate Policy Dashboard](https://www.climatepolicydashboard.org/policies/climate-governance-equity/just-transition-offices-and-staff).
Populates the `v_state_governance` and `v_ch4_state_analysis` views.
| Column | Type | Description |
|---|---|---|
| `id` | serial | Primary key |
| `state` | char(2) | USPS state abbreviation |
| `state_name` | text | Full state name |
| `office_name` | text | Exact name of office/program; `NA` if none |
| `year_established` | integer | Year established; NULL = not stated |
| `target_text` | text | Verbatim description of who the office serves |
| `code_fossil` | smallint | 1 = explicitly mentions oil/gas/coal/mining/power plant workers; 0 = absent |
| `code_equity` | smallint | 1 = explicitly mentions equity/justice/disadvantaged/low-income/EJ; 0 = absent |
| `source_url` | text | URL to Climate Policy Dashboard entry |
| `date_collected` | date | Date RA retrieved data |
| `collected_by` | text | RA name |
| `notes` | text | Ambiguity, edge cases, interpretation notes |
**Coding rules:**
- `code_fossil = 1` ONLY with explicit language about fossil fuel workers, not merely "economic transition."
- `code_equity = 1` ONLY with explicit EJ/disadvantaged/low-income language, not merely "communities."
- States not on the dashboard: do not create a row unless instructed.
- States listed but with no program: one row with `office_name = NA`, codes = 0.
---
### `state_prioritization`
State orphaned well plugging prioritization schemes. **One row per state.**
Coded by RA from IOGCC Prioritization Report (July 2023).
| Column | Type | Description |
|---|---|---|
| `id` | serial | Primary key |
| `state` | char(2) | USPS state abbreviation (unique) |
| `state_name` | text | Full state name |
| `system_type` | text | Brief description of scoring system (e.g., `1100 score`); `NA` if not stated |
| `tech_factors` | text | Semicolon-separated list of technical/physical factors |
| `code_rural_urban` | smallint | 1 = explicitly uses population density or urban/rural classification as scoring factor |
| `code_vuln` | smallint | 1 = explicitly uses EJ/disadvantaged/low-income/DAC as prioritization factor |
| `code_surface` | smallint | 1 = explicitly uses surface land use as prioritization factor |
| `pdf_page` | text | Page(s) in IOGCC report; `NA` if uncertain |
| `source_quote` | text | 12 key sentences justifying coding decisions |
| `source_url` | text | IOGCC PDF URL |
| `date_collected` | date | Date RA retrieved data |
| `collected_by` | text | RA name |
| `notes` | text | Missing-state issues, interpretation risks |
**Critical coding distinctions:**
- `code_rural_urban`: "distance to nearest building" is NOT sufficient — requires explicit urban/rural or density language.
- `code_vuln`: "near homes/schools" is NOT sufficient — requires explicit EJ/disadvantaged/DAC language about community *status*.
- States not in IOGCC report: one row, `system_type = NA`, all codes = 0, explain in notes.
---
### `state_liability`
State-level financial liability estimates. Auto-populated from `wells` table; IIJA funding entered manually from DOI/OSMRE announcements.
| Column | Type | Description |
|---|---|---|
| `state` | char(2) | USPS abbreviation (unique) |
| `state_name` | text | Full state name |
| `well_count_dow` | integer | Wells in USGS DOW dataset |
| `iija_phase1_formula_usd` | bigint | IIJA/BIL Phase 1 formula grant (Nov 2022) |
| `iija_phase2_perf_usd` | bigint | Phase 2 performance grant (to be populated) |
| `iija_source` | text | Citation for funding figures |
| `est_liability_low_usd` | numeric | Generated: `well_count × $9,000` (Raimi et al. low) |
| `est_liability_mid_usd` | numeric | Generated: `well_count × $76,000` (Raimi et al. median) |
| `est_liability_high_usd` | numeric | Generated: `well_count × $280,000` (Raimi et al. high) |
| `bonding_required` | boolean | Does state require bonds for active wells? |
| `bonding_adequacy` | text | `adequate` / `inadequate` / `unknown` / `NA` |
| `bonding_notes` | text | Bonding context |
| `cost_estimate_source` | text | Most applicable `plugging_cost_references` entry |
| `notes` | text | State-specific context |
**National totals (Raimi et al. mid-range):** $8.94 billion estimated liability; $310 million IIJA Phase 1 coverage = **3.47% funded.**
---
### `plugging_cost_references`
Five key published cost estimate sources for footnoting and methodology.
| Source | Year | Low/Well | Mid/Well | High/Well | Scope |
|---|---|---|---|---|---|
| EPA OLEM | 2018 | $5,000 | $25,000 | $85,000 | National |
| Raimi et al. (RFF) | 2021 | $9,000 | $76,000 | $280,000 | National |
| Carbon Tracker | 2020 | $20,000 | $82,000 | $300,000 | National |
| IOGCC | 2023 | $5,000 | $33,000 | $150,000 | State-reported |
| PA DEP | 2022 | $10,000 | $68,000 | $220,000 | Pennsylvania |
---
### `data_sources`
The 27 state agencies and 3 ancillary sources that contributed to the USGS DOW dataset.
| Column | Type | Description |
|---|---|---|
| `source_name` | text | Full agency name |
| `source_type` | text | `state_agency` / `federal` / `ngo` / `software` |
| `state` | text | State served |
| `description` | text | Data type provided |
---
### `dataset_metadata`
Full ScienceBase JSON and FGDC XML provenance for the primary USGS dataset. One row.
Key fields: `doi`, `citation`, `summary`, `purpose`, `publication_date`, `data_start_date`, `data_end_date`, `bounding_box`, `sciencebase_url`, `related_report_doi`, `mapping_app_url`, `file_csv_md5`, `file_zip_md5`.
---
### `dataset_contacts`, `dataset_tags`, `processing_steps`
Supporting metadata tables from ScienceBase and FGDC XML. See `data-sources.md` for details.
---
## Views
| View | Chapter | Description |
|---|---|---|
| `v_wells_by_state` | 4 | Well counts, type breakdown, avg coordinates, date range per state |
| `v_wells_by_type` | 45 | Raw type distribution (119 values) across states |
| `v_wells_by_status` | 4 | Status classification distribution |
| `v_state_type_summary` | 4 | State × normalized type cross-tabulation |
| `v_wells_by_tract` | 45 | Well counts, type breakdown, density (wells/km²) per census tract |
| `v_wells_by_county` | 45 | County-level rollup with 5-digit GEOID |
| `v_highest_density_tracts` | 4 | Tracts ranked by well density (≥1 km² only) |
| `v_data_completeness` | — | Non-null/non-empty counts per column |
| `v_state_governance` | 4 | Combines transition offices + prioritization → `framework_type` classification |
| `v_ch4_state_analysis` | 4 | Governance framework × spatial × financial per state |
| `v_ch5_liability_summary` | 5 | State-level liability estimates vs. IIJA funding |
| `v_ch5_national_totals` | 5 | National aggregate: total liability, IIJA coverage percentage |
---
## Key Joins
```sql
-- Wells → ACS (via tract_geoid)
SELECT w.*, acs.*
FROM wells w
JOIN acs_2021_b19013 acs ON w.tract_geoid = acs.geoid;
-- Wells → state governance framework
SELECT w.api_number, w.state, sg.framework_type
FROM wells w
JOIN v_state_governance sg ON w.state_usps = sg.state;
-- County liability summary
SELECT v.*, sl.est_liability_mid_usd
FROM v_wells_by_county v
JOIN state_liability sl ON v.state_usps = sl.state;
```