247 lines
12 KiB
Markdown
247 lines
12 KiB
Markdown
# Database Schema Reference
|
||
|
||
**Database:** `orphaned_wells`
|
||
**Engine:** PostgreSQL 18 with PostGIS
|
||
**Connection:** `psql -U postgres -h localhost -d orphaned_wells`
|
||
**Last updated:** March 2026
|
||
|
||
---
|
||
|
||
## Tables
|
||
|
||
### `wells`
|
||
|
||
Primary data table. 117,672 documented unplugged orphaned wells across 27 U.S. states.
|
||
Source: USGS DOW Dataset (Grove & Merrill 2022). Geometry loaded from shapefile, reprojected to EPSG:4326.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| `gid` | integer | Primary key (auto) |
|
||
| `api_number` | varchar | 14-digit API well number (stripped of `API:` prefix) |
|
||
| `state` | varchar | Full state name (from USGS source data) |
|
||
| `county` | varchar | County name (from USGS source data) |
|
||
| `well_name` | varchar | Well name — typically operator + lease name |
|
||
| `well_number` | varchar | Order within lease/operator permit sequence |
|
||
| `type` | varchar | Raw well type as reported by state agency (119 distinct values) |
|
||
| `well_type_normalized` | text | Canonical type: Oil, Gas, Oil & Gas, Injection/Disposal, Dry/Exploratory, Water/Brine, Coalbed Methane, Enhanced Recovery, Gas Storage, Observation/Monitor, Other/Administrative, Unknown/Unspecified |
|
||
| `status` | varchar | Well status as reported by state agency |
|
||
| `latitude` | numeric | State-provided latitude (decimal degrees, NAD83/WGS84) |
|
||
| `longitude` | numeric | State-provided longitude (decimal degrees, NAD83/WGS84) |
|
||
| `principal_meridian` | varchar | PLSS principal meridian |
|
||
| `township` | numeric | PLSS township number |
|
||
| `t_dir` | varchar | Township direction (N/S) |
|
||
| `range` | numeric | PLSS range number |
|
||
| `r_dir` | varchar | Range direction (E/W) |
|
||
| `section` | numeric | PLSS section (1–36; some LA wells non-standard) |
|
||
| `qtr` | varchar | PLSS quarter section (¼ sq mi) |
|
||
| `qtr_qtr` | varchar | PLSS quarter-quarter section (1/16 sq mi) |
|
||
| `qtr_qtr_qtr` | varchar | PLSS quarter-quarter-quarter (1/64 sq mi) |
|
||
| `source` | varchar | State agency that provided the data |
|
||
| `data_file_date` | date | Date data was last updated by source agency |
|
||
| `well_info_notes` | varchar | Additional well information from source |
|
||
| `location_notes` | varchar | Location/coordinate methodology notes |
|
||
| `other_notes` | varchar | Other notes (often includes status date) |
|
||
| `geom` | geometry(Point, 4326) | PostGIS point in WGS84 |
|
||
| `state_fips` | char(2) | 2-digit Census FIPS state code |
|
||
| `tract_geoid` | char(11) | 11-digit Census tract FIPS (join key to ACS) |
|
||
| `tract_name` | text | Census tract label (e.g., "Census Tract 2048") |
|
||
| `county_fips` | char(3) | 3-digit Census county FIPS |
|
||
| `county_name` | text | County name from 2021 TIGER/Line |
|
||
| `state_usps` | char(2) | 2-letter USPS state abbreviation (from spatial join) |
|
||
| `tract_aland_m2` | bigint | Tract land area in square meters |
|
||
| `tract_awater_m2` | bigint | Tract water area in square meters |
|
||
|
||
**Indexes:** `api_number`, `state`, `state_fips`, `state_usps`, `county_fips`, `tract_geoid`, `well_type_normalized`, `geom` (GIST)
|
||
|
||
**Note on `well_type_normalized`:** 59.3% of wells have `Unknown/Unspecified` type because Ohio (20,557), Pennsylvania (19,160), Kentucky (12,695), Kansas (5,477), and several other states submitted data without type classification. See `data-sources.md` for details.
|
||
|
||
---
|
||
|
||
### `census_tracts`
|
||
|
||
2021 TIGER/Line cartographic boundary file (1:500k). 85,230 tracts covering all 50 states, DC, and territories.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| `gid` | integer | Primary key |
|
||
| `statefp` | varchar(2) | 2-digit state FIPS |
|
||
| `countyfp` | varchar(3) | 3-digit county FIPS |
|
||
| `tractce` | varchar(6) | 6-digit tract code |
|
||
| `affgeoid` | varchar(20) | Affinity GEOID |
|
||
| `geoid` | varchar(11) | 11-digit FIPS (state + county + tract) — ACS join key |
|
||
| `name` | varchar(100) | Tract number |
|
||
| `namelsad` | varchar(100) | Full tract name (e.g., "Census Tract 1042.01") |
|
||
| `stusps` | varchar(2) | 2-letter state postal abbreviation |
|
||
| `namelsadco` | varchar(100) | County name |
|
||
| `state_name` | varchar(100) | Full state name |
|
||
| `lsad` | varchar(2) | Legal/statistical area description code |
|
||
| `aland` | numeric | Land area in square meters |
|
||
| `awater` | numeric | Water area in square meters |
|
||
| `geom` | geometry(MultiPolygon, 4326) | PostGIS polygon in WGS84 |
|
||
|
||
**Indexes:** `geoid`, `statefp`, `stusps`, `geom` (GIST)
|
||
|
||
---
|
||
|
||
### `state_transition_offices`
|
||
|
||
State-level just transition offices and programs. **One row per office/program.**
|
||
Coded by RA from [Climate Policy Dashboard](https://www.climatepolicydashboard.org/policies/climate-governance-equity/just-transition-offices-and-staff).
|
||
Populates the `v_state_governance` and `v_ch4_state_analysis` views.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| `id` | serial | Primary key |
|
||
| `state` | char(2) | USPS state abbreviation |
|
||
| `state_name` | text | Full state name |
|
||
| `office_name` | text | Exact name of office/program; `NA` if none |
|
||
| `year_established` | integer | Year established; NULL = not stated |
|
||
| `target_text` | text | Verbatim description of who the office serves |
|
||
| `code_fossil` | smallint | 1 = explicitly mentions oil/gas/coal/mining/power plant workers; 0 = absent |
|
||
| `code_equity` | smallint | 1 = explicitly mentions equity/justice/disadvantaged/low-income/EJ; 0 = absent |
|
||
| `source_url` | text | URL to Climate Policy Dashboard entry |
|
||
| `date_collected` | date | Date RA retrieved data |
|
||
| `collected_by` | text | RA name |
|
||
| `notes` | text | Ambiguity, edge cases, interpretation notes |
|
||
|
||
**Coding rules:**
|
||
- `code_fossil = 1` ONLY with explicit language about fossil fuel workers, not merely "economic transition."
|
||
- `code_equity = 1` ONLY with explicit EJ/disadvantaged/low-income language, not merely "communities."
|
||
- States not on the dashboard: do not create a row unless instructed.
|
||
- States listed but with no program: one row with `office_name = NA`, codes = 0.
|
||
|
||
---
|
||
|
||
### `state_prioritization`
|
||
|
||
State orphaned well plugging prioritization schemes. **One row per state.**
|
||
Coded by RA from IOGCC Prioritization Report (July 2023).
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| `id` | serial | Primary key |
|
||
| `state` | char(2) | USPS state abbreviation (unique) |
|
||
| `state_name` | text | Full state name |
|
||
| `system_type` | text | Brief description of scoring system (e.g., `1–100 score`); `NA` if not stated |
|
||
| `tech_factors` | text | Semicolon-separated list of technical/physical factors |
|
||
| `code_rural_urban` | smallint | 1 = explicitly uses population density or urban/rural classification as scoring factor |
|
||
| `code_vuln` | smallint | 1 = explicitly uses EJ/disadvantaged/low-income/DAC as prioritization factor |
|
||
| `code_surface` | smallint | 1 = explicitly uses surface land use as prioritization factor |
|
||
| `pdf_page` | text | Page(s) in IOGCC report; `NA` if uncertain |
|
||
| `source_quote` | text | 1–2 key sentences justifying coding decisions |
|
||
| `source_url` | text | IOGCC PDF URL |
|
||
| `date_collected` | date | Date RA retrieved data |
|
||
| `collected_by` | text | RA name |
|
||
| `notes` | text | Missing-state issues, interpretation risks |
|
||
|
||
**Critical coding distinctions:**
|
||
- `code_rural_urban`: "distance to nearest building" is NOT sufficient — requires explicit urban/rural or density language.
|
||
- `code_vuln`: "near homes/schools" is NOT sufficient — requires explicit EJ/disadvantaged/DAC language about community *status*.
|
||
- States not in IOGCC report: one row, `system_type = NA`, all codes = 0, explain in notes.
|
||
|
||
---
|
||
|
||
### `state_liability`
|
||
|
||
State-level financial liability estimates. Auto-populated from `wells` table; IIJA funding entered manually from DOI/OSMRE announcements.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| `state` | char(2) | USPS abbreviation (unique) |
|
||
| `state_name` | text | Full state name |
|
||
| `well_count_dow` | integer | Wells in USGS DOW dataset |
|
||
| `iija_phase1_formula_usd` | bigint | IIJA/BIL Phase 1 formula grant (Nov 2022) |
|
||
| `iija_phase2_perf_usd` | bigint | Phase 2 performance grant (to be populated) |
|
||
| `iija_source` | text | Citation for funding figures |
|
||
| `est_liability_low_usd` | numeric | Generated: `well_count × $9,000` (Raimi et al. low) |
|
||
| `est_liability_mid_usd` | numeric | Generated: `well_count × $76,000` (Raimi et al. median) |
|
||
| `est_liability_high_usd` | numeric | Generated: `well_count × $280,000` (Raimi et al. high) |
|
||
| `bonding_required` | boolean | Does state require bonds for active wells? |
|
||
| `bonding_adequacy` | text | `adequate` / `inadequate` / `unknown` / `NA` |
|
||
| `bonding_notes` | text | Bonding context |
|
||
| `cost_estimate_source` | text | Most applicable `plugging_cost_references` entry |
|
||
| `notes` | text | State-specific context |
|
||
|
||
**National totals (Raimi et al. mid-range):** $8.94 billion estimated liability; $310 million IIJA Phase 1 coverage = **3.47% funded.**
|
||
|
||
---
|
||
|
||
### `plugging_cost_references`
|
||
|
||
Five key published cost estimate sources for footnoting and methodology.
|
||
|
||
| Source | Year | Low/Well | Mid/Well | High/Well | Scope |
|
||
|---|---|---|---|---|---|
|
||
| EPA OLEM | 2018 | $5,000 | $25,000 | $85,000 | National |
|
||
| Raimi et al. (RFF) | 2021 | $9,000 | $76,000 | $280,000 | National |
|
||
| Carbon Tracker | 2020 | $20,000 | $82,000 | $300,000 | National |
|
||
| IOGCC | 2023 | $5,000 | $33,000 | $150,000 | State-reported |
|
||
| PA DEP | 2022 | $10,000 | $68,000 | $220,000 | Pennsylvania |
|
||
|
||
---
|
||
|
||
### `data_sources`
|
||
|
||
The 27 state agencies and 3 ancillary sources that contributed to the USGS DOW dataset.
|
||
|
||
| Column | Type | Description |
|
||
|---|---|---|
|
||
| `source_name` | text | Full agency name |
|
||
| `source_type` | text | `state_agency` / `federal` / `ngo` / `software` |
|
||
| `state` | text | State served |
|
||
| `description` | text | Data type provided |
|
||
|
||
---
|
||
|
||
### `dataset_metadata`
|
||
|
||
Full ScienceBase JSON and FGDC XML provenance for the primary USGS dataset. One row.
|
||
|
||
Key fields: `doi`, `citation`, `summary`, `purpose`, `publication_date`, `data_start_date`, `data_end_date`, `bounding_box`, `sciencebase_url`, `related_report_doi`, `mapping_app_url`, `file_csv_md5`, `file_zip_md5`.
|
||
|
||
---
|
||
|
||
### `dataset_contacts`, `dataset_tags`, `processing_steps`
|
||
|
||
Supporting metadata tables from ScienceBase and FGDC XML. See `data-sources.md` for details.
|
||
|
||
---
|
||
|
||
## Views
|
||
|
||
| View | Chapter | Description |
|
||
|---|---|---|
|
||
| `v_wells_by_state` | 4 | Well counts, type breakdown, avg coordinates, date range per state |
|
||
| `v_wells_by_type` | 4–5 | Raw type distribution (119 values) across states |
|
||
| `v_wells_by_status` | 4 | Status classification distribution |
|
||
| `v_state_type_summary` | 4 | State × normalized type cross-tabulation |
|
||
| `v_wells_by_tract` | 4–5 | Well counts, type breakdown, density (wells/km²) per census tract |
|
||
| `v_wells_by_county` | 4–5 | County-level rollup with 5-digit GEOID |
|
||
| `v_highest_density_tracts` | 4 | Tracts ranked by well density (≥1 km² only) |
|
||
| `v_data_completeness` | — | Non-null/non-empty counts per column |
|
||
| `v_state_governance` | 4 | Combines transition offices + prioritization → `framework_type` classification |
|
||
| `v_ch4_state_analysis` | 4 | Governance framework × spatial × financial per state |
|
||
| `v_ch5_liability_summary` | 5 | State-level liability estimates vs. IIJA funding |
|
||
| `v_ch5_national_totals` | 5 | National aggregate: total liability, IIJA coverage percentage |
|
||
|
||
---
|
||
|
||
## Key Joins
|
||
|
||
```sql
|
||
-- Wells → ACS (via tract_geoid)
|
||
SELECT w.*, acs.*
|
||
FROM wells w
|
||
JOIN acs_2021_b19013 acs ON w.tract_geoid = acs.geoid;
|
||
|
||
-- Wells → state governance framework
|
||
SELECT w.api_number, w.state, sg.framework_type
|
||
FROM wells w
|
||
JOIN v_state_governance sg ON w.state_usps = sg.state;
|
||
|
||
-- County liability summary
|
||
SELECT v.*, sl.est_liability_mid_usd
|
||
FROM v_wells_by_county v
|
||
JOIN state_liability sl ON v.state_usps = sl.state;
|
||
```
|