Files
texas-district-analysis/README.md
2026-03-09 08:25:38 -07:00

146 lines
6.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Texas District Analysis: Regulatory Transparency and Enforcement in the Oil & Gas Industry
A research project examining how transparency disclosure reforms affect enforcement behavior in the Texas Railroad Commission (RRC), with a focus on district-level heterogeneity across 13 RRC regulatory districts from 20152025.
## Research Overview
**Core question**: Does making well-level violation data publicly searchable change how quickly the RRC acts on violations?
The January 2019 RRC policy change — making well violation data publicly searchable — serves as the exogenous policy shock. The analysis tests whether and how this disclosure reform altered enforcement timing and compliance outcomes across districts, with particular attention to offshore-regulating districts (02, 03, 04) and structural moderators like basin composition, enforcement capacity, and environmental justice dimensions.
**Key findings:**
- No immediate post-2019 level shift in enforcement timing (coef=0.1514, p=0.33)
- Significant post-2019 trend acceleration: enforcement speed improves gradually over time (coef=0.3603, p=0.001)
- Offshore-regulating districts show differential post-policy response (coef=0.3819, p<0.001), strongest in 20232024
- Basin composition is the clearest structural correlate of district-level heterogeneity
## Data
All raw data originates from the Texas Railroad Commission and supplementary government sources:
| Source | Description | Size |
|--------|-------------|------|
| Texas RRC | ~3.6M inspection records (pipe-delimited) | 424 MB |
| Texas RRC | ~368K violation records (pipe-delimited) | 66 MB |
| U.S. Census | Poverty rates and demographics by census tract | — |
| USDA RUCA (2020) | Rural-Urban Commuting Area classifications | 25 MB |
| USEIA | Shale basin and play shapefiles | — |
| Texas county shapefiles | County boundaries for spatial visualization | — |
Data covers approximately 1.01 million wells, 1.87 million inspections, and 191K violations within the 20152025 study window.
**Note**: Raw data files are large (several hundred MB each) and are excluded from version control via `.gitignore`. The data pipeline is fully documented in the `rebuild/` notebooks.
## Repository Structure
```
texas-district-analysis/
├── analysis/
│ ├── well_analyzer.py # Core analysis engine (PostgreSQL → metrics)
│ ├── updated_district_level_analysis_2015-2025_offshore_controls.ipynb # Main analysis notebook
│ ├── draft.md # Manuscript draft
│ ├── draft_appendix.md # Technical appendix with model specifications
│ ├── *.png # Figures (event study, district maps, etc.)
│ └── archive/ # Earlier notebook versions and alternate specs
├── data/
│ ├── INSPECTIONS.txt # Raw inspection records (pipe-delimited)
│ ├── VIOLATIONS.txt # Raw violation records (pipe-delimited)
│ ├── RUCA-codes-2020-tract.csv # RUCA classification by census tract
│ ├── district_by_county.csv # Districtcounty crosswalk
│ └── {oil_gas_basin,shale_play,texas_county,texmex}_shape/ # ESRI shapefiles
├── rebuild/
│ ├── rrc_api_data.ipynb # Step 1: Fetch and process RRC API data
│ ├── create_violations_inspections.ipynb # Step 2: Build cleaned inspection/violation files
│ ├── add_census_data.ipynb # Step 3: Link census demographics
│ ├── add_shape_layers.ipynb # Step 4: Spatial feature engineering
│ ├── well_shape.ipynb # Step 5: Well geometry and shapefile creation
│ └── well-api-manual.pdf # RRC API technical documentation
├── papers/ # Manuscript versions (DOCX + PDF)
├── analysis_output.json # Pre-computed summary statistics
└── requirements.txt # Python dependencies
```
## Analysis Pipeline
```
Raw RRC Data (API)
↓ rebuild/rrc_api_data.ipynb
Cleaned Inspections & Violations CSVs
↓ rebuild/create_violations_inspections.ipynb
Link Census Demographics
↓ rebuild/add_census_data.ipynb
Add Geographic Layers
↓ rebuild/add_shape_layers.ipynb
PostgreSQL Data Warehouse
↓ analysis/well_analyzer.py
District-Year Panel
↓ analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb
Econometric Models → Figures → Manuscript
```
### Econometric Models
| Model | Description |
|-------|-------------|
| 1 | Interrupted time-series (all districts pooled) |
| 2 | District-specific post-policy fixed effects |
| 3 | Offshore jurisdiction moderator (districts 02/03/04) |
| 4 | Spatial autocorrelation diagnostics (Moran's I, 5,000 permutations) |
| 5 | Structural moderators: capacity, baseline compliance, EJ, geology, rurality, border proximity |
## Setup
### Prerequisites
- Python 3.9+
- PostgreSQL (with PostGIS for spatial queries)
- The `well_analyzer.py` module reads database credentials from environment variables
### Install dependencies
```bash
pip install -r requirements.txt
```
### Database configuration
Set the following environment variables before running analysis:
```bash
export PGHOST=localhost
export PGPORT=5432
export PGUSER=your_user
export PGPASSWORD=your_password
export PGDATABASE=texas_data
```
### Run the data pipeline
Execute the notebooks in `rebuild/` in order (steps 15) to populate the PostgreSQL database, then open `analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb` for the main analysis.
## Key Statistics (20152025)
- **1,878,764** inspections across **420,185** unique wells
- Overall compliance rate: **89.9%** (up from 88.4% in 2015 to 92.9% in 2024)
- **193,338** violations across **81,670** unique wells
- Mean days from violation discovery to enforcement action: **127** (median: 14)
- Compliance on re-inspection: **57.2%**
- District compliance range: 81.2% (District 09) to 94.4% (District 8A)
## Dependencies
```
pandas
numpy
sqlalchemy
psycopg2
scipy
statsmodels
matplotlib
seaborn
geopandas
shapely
libpysal
esda
```