Initial commit: database schema, data source docs, chapter variable references

This commit is contained in:
dadams
2026-03-08 20:25:58 -07:00
commit 38cf181f40
7 changed files with 955 additions and 0 deletions

246
docs/database-schema.md Normal file
View File

@@ -0,0 +1,246 @@
# Database Schema Reference
**Database:** `orphaned_wells`
**Engine:** PostgreSQL 18 with PostGIS
**Connection:** `psql -U postgres -h localhost -d orphaned_wells`
**Last updated:** March 2026
---
## Tables
### `wells`
Primary data table. 117,672 documented unplugged orphaned wells across 27 U.S. states.
Source: USGS DOW Dataset (Grove & Merrill 2022). Geometry loaded from shapefile, reprojected to EPSG:4326.
| Column | Type | Description |
|---|---|---|
| `gid` | integer | Primary key (auto) |
| `api_number` | varchar | 14-digit API well number (stripped of `API:` prefix) |
| `state` | varchar | Full state name (from USGS source data) |
| `county` | varchar | County name (from USGS source data) |
| `well_name` | varchar | Well name — typically operator + lease name |
| `well_number` | varchar | Order within lease/operator permit sequence |
| `type` | varchar | Raw well type as reported by state agency (119 distinct values) |
| `well_type_normalized` | text | Canonical type: Oil, Gas, Oil & Gas, Injection/Disposal, Dry/Exploratory, Water/Brine, Coalbed Methane, Enhanced Recovery, Gas Storage, Observation/Monitor, Other/Administrative, Unknown/Unspecified |
| `status` | varchar | Well status as reported by state agency |
| `latitude` | numeric | State-provided latitude (decimal degrees, NAD83/WGS84) |
| `longitude` | numeric | State-provided longitude (decimal degrees, NAD83/WGS84) |
| `principal_meridian` | varchar | PLSS principal meridian |
| `township` | numeric | PLSS township number |
| `t_dir` | varchar | Township direction (N/S) |
| `range` | numeric | PLSS range number |
| `r_dir` | varchar | Range direction (E/W) |
| `section` | numeric | PLSS section (136; some LA wells non-standard) |
| `qtr` | varchar | PLSS quarter section (¼ sq mi) |
| `qtr_qtr` | varchar | PLSS quarter-quarter section (1/16 sq mi) |
| `qtr_qtr_qtr` | varchar | PLSS quarter-quarter-quarter (1/64 sq mi) |
| `source` | varchar | State agency that provided the data |
| `data_file_date` | date | Date data was last updated by source agency |
| `well_info_notes` | varchar | Additional well information from source |
| `location_notes` | varchar | Location/coordinate methodology notes |
| `other_notes` | varchar | Other notes (often includes status date) |
| `geom` | geometry(Point, 4326) | PostGIS point in WGS84 |
| `state_fips` | char(2) | 2-digit Census FIPS state code |
| `tract_geoid` | char(11) | 11-digit Census tract FIPS (join key to ACS) |
| `tract_name` | text | Census tract label (e.g., "Census Tract 2048") |
| `county_fips` | char(3) | 3-digit Census county FIPS |
| `county_name` | text | County name from 2021 TIGER/Line |
| `state_usps` | char(2) | 2-letter USPS state abbreviation (from spatial join) |
| `tract_aland_m2` | bigint | Tract land area in square meters |
| `tract_awater_m2` | bigint | Tract water area in square meters |
**Indexes:** `api_number`, `state`, `state_fips`, `state_usps`, `county_fips`, `tract_geoid`, `well_type_normalized`, `geom` (GIST)
**Note on `well_type_normalized`:** 59.3% of wells have `Unknown/Unspecified` type because Ohio (20,557), Pennsylvania (19,160), Kentucky (12,695), Kansas (5,477), and several other states submitted data without type classification. See `data-sources.md` for details.
---
### `census_tracts`
2021 TIGER/Line cartographic boundary file (1:500k). 85,230 tracts covering all 50 states, DC, and territories.
| Column | Type | Description |
|---|---|---|
| `gid` | integer | Primary key |
| `statefp` | varchar(2) | 2-digit state FIPS |
| `countyfp` | varchar(3) | 3-digit county FIPS |
| `tractce` | varchar(6) | 6-digit tract code |
| `affgeoid` | varchar(20) | Affinity GEOID |
| `geoid` | varchar(11) | 11-digit FIPS (state + county + tract) — ACS join key |
| `name` | varchar(100) | Tract number |
| `namelsad` | varchar(100) | Full tract name (e.g., "Census Tract 1042.01") |
| `stusps` | varchar(2) | 2-letter state postal abbreviation |
| `namelsadco` | varchar(100) | County name |
| `state_name` | varchar(100) | Full state name |
| `lsad` | varchar(2) | Legal/statistical area description code |
| `aland` | numeric | Land area in square meters |
| `awater` | numeric | Water area in square meters |
| `geom` | geometry(MultiPolygon, 4326) | PostGIS polygon in WGS84 |
**Indexes:** `geoid`, `statefp`, `stusps`, `geom` (GIST)
---
### `state_transition_offices`
State-level just transition offices and programs. **One row per office/program.**
Coded by RA from [Climate Policy Dashboard](https://www.climatepolicydashboard.org/policies/climate-governance-equity/just-transition-offices-and-staff).
Populates the `v_state_governance` and `v_ch4_state_analysis` views.
| Column | Type | Description |
|---|---|---|
| `id` | serial | Primary key |
| `state` | char(2) | USPS state abbreviation |
| `state_name` | text | Full state name |
| `office_name` | text | Exact name of office/program; `NA` if none |
| `year_established` | integer | Year established; NULL = not stated |
| `target_text` | text | Verbatim description of who the office serves |
| `code_fossil` | smallint | 1 = explicitly mentions oil/gas/coal/mining/power plant workers; 0 = absent |
| `code_equity` | smallint | 1 = explicitly mentions equity/justice/disadvantaged/low-income/EJ; 0 = absent |
| `source_url` | text | URL to Climate Policy Dashboard entry |
| `date_collected` | date | Date RA retrieved data |
| `collected_by` | text | RA name |
| `notes` | text | Ambiguity, edge cases, interpretation notes |
**Coding rules:**
- `code_fossil = 1` ONLY with explicit language about fossil fuel workers, not merely "economic transition."
- `code_equity = 1` ONLY with explicit EJ/disadvantaged/low-income language, not merely "communities."
- States not on the dashboard: do not create a row unless instructed.
- States listed but with no program: one row with `office_name = NA`, codes = 0.
---
### `state_prioritization`
State orphaned well plugging prioritization schemes. **One row per state.**
Coded by RA from IOGCC Prioritization Report (July 2023).
| Column | Type | Description |
|---|---|---|
| `id` | serial | Primary key |
| `state` | char(2) | USPS state abbreviation (unique) |
| `state_name` | text | Full state name |
| `system_type` | text | Brief description of scoring system (e.g., `1100 score`); `NA` if not stated |
| `tech_factors` | text | Semicolon-separated list of technical/physical factors |
| `code_rural_urban` | smallint | 1 = explicitly uses population density or urban/rural classification as scoring factor |
| `code_vuln` | smallint | 1 = explicitly uses EJ/disadvantaged/low-income/DAC as prioritization factor |
| `code_surface` | smallint | 1 = explicitly uses surface land use as prioritization factor |
| `pdf_page` | text | Page(s) in IOGCC report; `NA` if uncertain |
| `source_quote` | text | 12 key sentences justifying coding decisions |
| `source_url` | text | IOGCC PDF URL |
| `date_collected` | date | Date RA retrieved data |
| `collected_by` | text | RA name |
| `notes` | text | Missing-state issues, interpretation risks |
**Critical coding distinctions:**
- `code_rural_urban`: "distance to nearest building" is NOT sufficient — requires explicit urban/rural or density language.
- `code_vuln`: "near homes/schools" is NOT sufficient — requires explicit EJ/disadvantaged/DAC language about community *status*.
- States not in IOGCC report: one row, `system_type = NA`, all codes = 0, explain in notes.
---
### `state_liability`
State-level financial liability estimates. Auto-populated from `wells` table; IIJA funding entered manually from DOI/OSMRE announcements.
| Column | Type | Description |
|---|---|---|
| `state` | char(2) | USPS abbreviation (unique) |
| `state_name` | text | Full state name |
| `well_count_dow` | integer | Wells in USGS DOW dataset |
| `iija_phase1_formula_usd` | bigint | IIJA/BIL Phase 1 formula grant (Nov 2022) |
| `iija_phase2_perf_usd` | bigint | Phase 2 performance grant (to be populated) |
| `iija_source` | text | Citation for funding figures |
| `est_liability_low_usd` | numeric | Generated: `well_count × $9,000` (Raimi et al. low) |
| `est_liability_mid_usd` | numeric | Generated: `well_count × $76,000` (Raimi et al. median) |
| `est_liability_high_usd` | numeric | Generated: `well_count × $280,000` (Raimi et al. high) |
| `bonding_required` | boolean | Does state require bonds for active wells? |
| `bonding_adequacy` | text | `adequate` / `inadequate` / `unknown` / `NA` |
| `bonding_notes` | text | Bonding context |
| `cost_estimate_source` | text | Most applicable `plugging_cost_references` entry |
| `notes` | text | State-specific context |
**National totals (Raimi et al. mid-range):** $8.94 billion estimated liability; $310 million IIJA Phase 1 coverage = **3.47% funded.**
---
### `plugging_cost_references`
Five key published cost estimate sources for footnoting and methodology.
| Source | Year | Low/Well | Mid/Well | High/Well | Scope |
|---|---|---|---|---|---|
| EPA OLEM | 2018 | $5,000 | $25,000 | $85,000 | National |
| Raimi et al. (RFF) | 2021 | $9,000 | $76,000 | $280,000 | National |
| Carbon Tracker | 2020 | $20,000 | $82,000 | $300,000 | National |
| IOGCC | 2023 | $5,000 | $33,000 | $150,000 | State-reported |
| PA DEP | 2022 | $10,000 | $68,000 | $220,000 | Pennsylvania |
---
### `data_sources`
The 27 state agencies and 3 ancillary sources that contributed to the USGS DOW dataset.
| Column | Type | Description |
|---|---|---|
| `source_name` | text | Full agency name |
| `source_type` | text | `state_agency` / `federal` / `ngo` / `software` |
| `state` | text | State served |
| `description` | text | Data type provided |
---
### `dataset_metadata`
Full ScienceBase JSON and FGDC XML provenance for the primary USGS dataset. One row.
Key fields: `doi`, `citation`, `summary`, `purpose`, `publication_date`, `data_start_date`, `data_end_date`, `bounding_box`, `sciencebase_url`, `related_report_doi`, `mapping_app_url`, `file_csv_md5`, `file_zip_md5`.
---
### `dataset_contacts`, `dataset_tags`, `processing_steps`
Supporting metadata tables from ScienceBase and FGDC XML. See `data-sources.md` for details.
---
## Views
| View | Chapter | Description |
|---|---|---|
| `v_wells_by_state` | 4 | Well counts, type breakdown, avg coordinates, date range per state |
| `v_wells_by_type` | 45 | Raw type distribution (119 values) across states |
| `v_wells_by_status` | 4 | Status classification distribution |
| `v_state_type_summary` | 4 | State × normalized type cross-tabulation |
| `v_wells_by_tract` | 45 | Well counts, type breakdown, density (wells/km²) per census tract |
| `v_wells_by_county` | 45 | County-level rollup with 5-digit GEOID |
| `v_highest_density_tracts` | 4 | Tracts ranked by well density (≥1 km² only) |
| `v_data_completeness` | — | Non-null/non-empty counts per column |
| `v_state_governance` | 4 | Combines transition offices + prioritization → `framework_type` classification |
| `v_ch4_state_analysis` | 4 | Governance framework × spatial × financial per state |
| `v_ch5_liability_summary` | 5 | State-level liability estimates vs. IIJA funding |
| `v_ch5_national_totals` | 5 | National aggregate: total liability, IIJA coverage percentage |
---
## Key Joins
```sql
-- Wells → ACS (via tract_geoid)
SELECT w.*, acs.*
FROM wells w
JOIN acs_2021_b19013 acs ON w.tract_geoid = acs.geoid;
-- Wells → state governance framework
SELECT w.api_number, w.state, sg.framework_type
FROM wells w
JOIN v_state_governance sg ON w.state_usps = sg.state;
-- County liability summary
SELECT v.*, sl.est_liability_mid_usd
FROM v_wells_by_county v
JOIN state_liability sl ON v.state_usps = sl.state;
```