Files
texas-district-analysis/README.md
2026-03-09 08:25:38 -07:00

6.3 KiB
Raw Permalink Blame History

Texas District Analysis: Regulatory Transparency and Enforcement in the Oil & Gas Industry

A research project examining how transparency disclosure reforms affect enforcement behavior in the Texas Railroad Commission (RRC), with a focus on district-level heterogeneity across 13 RRC regulatory districts from 20152025.

Research Overview

Core question: Does making well-level violation data publicly searchable change how quickly the RRC acts on violations?

The January 2019 RRC policy change — making well violation data publicly searchable — serves as the exogenous policy shock. The analysis tests whether and how this disclosure reform altered enforcement timing and compliance outcomes across districts, with particular attention to offshore-regulating districts (02, 03, 04) and structural moderators like basin composition, enforcement capacity, and environmental justice dimensions.

Key findings:

  • No immediate post-2019 level shift in enforcement timing (coef=0.1514, p=0.33)
  • Significant post-2019 trend acceleration: enforcement speed improves gradually over time (coef=0.3603, p=0.001)
  • Offshore-regulating districts show differential post-policy response (coef=0.3819, p<0.001), strongest in 20232024
  • Basin composition is the clearest structural correlate of district-level heterogeneity

Data

All raw data originates from the Texas Railroad Commission and supplementary government sources:

Source Description Size
Texas RRC ~3.6M inspection records (pipe-delimited) 424 MB
Texas RRC ~368K violation records (pipe-delimited) 66 MB
U.S. Census Poverty rates and demographics by census tract
USDA RUCA (2020) Rural-Urban Commuting Area classifications 25 MB
USEIA Shale basin and play shapefiles
Texas county shapefiles County boundaries for spatial visualization

Data covers approximately 1.01 million wells, 1.87 million inspections, and 191K violations within the 20152025 study window.

Note: Raw data files are large (several hundred MB each) and are excluded from version control via .gitignore. The data pipeline is fully documented in the rebuild/ notebooks.

Repository Structure

texas-district-analysis/
├── analysis/
│   ├── well_analyzer.py                          # Core analysis engine (PostgreSQL → metrics)
│   ├── updated_district_level_analysis_2015-2025_offshore_controls.ipynb  # Main analysis notebook
│   ├── draft.md                                  # Manuscript draft
│   ├── draft_appendix.md                         # Technical appendix with model specifications
│   ├── *.png                                     # Figures (event study, district maps, etc.)
│   └── archive/                                  # Earlier notebook versions and alternate specs
├── data/
│   ├── INSPECTIONS.txt                           # Raw inspection records (pipe-delimited)
│   ├── VIOLATIONS.txt                            # Raw violation records (pipe-delimited)
│   ├── RUCA-codes-2020-tract.csv                 # RUCA classification by census tract
│   ├── district_by_county.csv                    # Districtcounty crosswalk
│   └── {oil_gas_basin,shale_play,texas_county,texmex}_shape/  # ESRI shapefiles
├── rebuild/
│   ├── rrc_api_data.ipynb                        # Step 1: Fetch and process RRC API data
│   ├── create_violations_inspections.ipynb       # Step 2: Build cleaned inspection/violation files
│   ├── add_census_data.ipynb                     # Step 3: Link census demographics
│   ├── add_shape_layers.ipynb                    # Step 4: Spatial feature engineering
│   ├── well_shape.ipynb                          # Step 5: Well geometry and shapefile creation
│   └── well-api-manual.pdf                       # RRC API technical documentation
├── papers/                                       # Manuscript versions (DOCX + PDF)
├── analysis_output.json                          # Pre-computed summary statistics
└── requirements.txt                              # Python dependencies

Analysis Pipeline

Raw RRC Data (API)
    ↓  rebuild/rrc_api_data.ipynb
Cleaned Inspections & Violations CSVs
    ↓  rebuild/create_violations_inspections.ipynb
Link Census Demographics
    ↓  rebuild/add_census_data.ipynb
Add Geographic Layers
    ↓  rebuild/add_shape_layers.ipynb
PostgreSQL Data Warehouse
    ↓  analysis/well_analyzer.py
District-Year Panel
    ↓  analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb
Econometric Models → Figures → Manuscript

Econometric Models

Model Description
1 Interrupted time-series (all districts pooled)
2 District-specific post-policy fixed effects
3 Offshore jurisdiction moderator (districts 02/03/04)
4 Spatial autocorrelation diagnostics (Moran's I, 5,000 permutations)
5 Structural moderators: capacity, baseline compliance, EJ, geology, rurality, border proximity

Setup

Prerequisites

  • Python 3.9+
  • PostgreSQL (with PostGIS for spatial queries)
  • The well_analyzer.py module reads database credentials from environment variables

Install dependencies

pip install -r requirements.txt

Database configuration

Set the following environment variables before running analysis:

export PGHOST=localhost
export PGPORT=5432
export PGUSER=your_user
export PGPASSWORD=your_password
export PGDATABASE=texas_data

Run the data pipeline

Execute the notebooks in rebuild/ in order (steps 15) to populate the PostgreSQL database, then open analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb for the main analysis.

Key Statistics (20152025)

  • 1,878,764 inspections across 420,185 unique wells
  • Overall compliance rate: 89.9% (up from 88.4% in 2015 to 92.9% in 2024)
  • 193,338 violations across 81,670 unique wells
  • Mean days from violation discovery to enforcement action: 127 (median: 14)
  • Compliance on re-inspection: 57.2%
  • District compliance range: 81.2% (District 09) to 94.4% (District 8A)

Dependencies

pandas
numpy
sqlalchemy
psycopg2
scipy
statsmodels
matplotlib
seaborn
geopandas
shapely
libpysal
esda