diff --git a/README.md b/README.md new file mode 100644 index 0000000..c4bb190 --- /dev/null +++ b/README.md @@ -0,0 +1,145 @@ +# Texas District Analysis: Regulatory Transparency and Enforcement in the Oil & Gas Industry + +A research project examining how transparency disclosure reforms affect enforcement behavior in the Texas Railroad Commission (RRC), with a focus on district-level heterogeneity across 13 RRC regulatory districts from 2015–2025. + +## Research Overview + +**Core question**: Does making well-level violation data publicly searchable change how quickly the RRC acts on violations? + +The January 2019 RRC policy change — making well violation data publicly searchable — serves as the exogenous policy shock. The analysis tests whether and how this disclosure reform altered enforcement timing and compliance outcomes across districts, with particular attention to offshore-regulating districts (02, 03, 04) and structural moderators like basin composition, enforcement capacity, and environmental justice dimensions. + +**Key findings:** +- No immediate post-2019 level shift in enforcement timing (coef=0.1514, p=0.33) +- Significant post-2019 trend acceleration: enforcement speed improves gradually over time (coef=−0.3603, p=0.001) +- Offshore-regulating districts show differential post-policy response (coef=0.3819, p<0.001), strongest in 2023–2024 +- Basin composition is the clearest structural correlate of district-level heterogeneity + +## Data + +All raw data originates from the Texas Railroad Commission and supplementary government sources: + +| Source | Description | Size | +|--------|-------------|------| +| Texas RRC | ~3.6M inspection records (pipe-delimited) | 424 MB | +| Texas RRC | ~368K violation records (pipe-delimited) | 66 MB | +| U.S. Census | Poverty rates and demographics by census tract | — | +| USDA RUCA (2020) | Rural-Urban Commuting Area classifications | 25 MB | +| USEIA | Shale basin and play shapefiles | — | +| Texas county shapefiles | County boundaries for spatial visualization | — | + +Data covers approximately 1.01 million wells, 1.87 million inspections, and 191K violations within the 2015–2025 study window. + +**Note**: Raw data files are large (several hundred MB each) and are excluded from version control via `.gitignore`. The data pipeline is fully documented in the `rebuild/` notebooks. + +## Repository Structure + +``` +texas-district-analysis/ +├── analysis/ +│ ├── well_analyzer.py # Core analysis engine (PostgreSQL → metrics) +│ ├── updated_district_level_analysis_2015-2025_offshore_controls.ipynb # Main analysis notebook +│ ├── draft.md # Manuscript draft +│ ├── draft_appendix.md # Technical appendix with model specifications +│ ├── *.png # Figures (event study, district maps, etc.) +│ └── archive/ # Earlier notebook versions and alternate specs +├── data/ +│ ├── INSPECTIONS.txt # Raw inspection records (pipe-delimited) +│ ├── VIOLATIONS.txt # Raw violation records (pipe-delimited) +│ ├── RUCA-codes-2020-tract.csv # RUCA classification by census tract +│ ├── district_by_county.csv # District–county crosswalk +│ └── {oil_gas_basin,shale_play,texas_county,texmex}_shape/ # ESRI shapefiles +├── rebuild/ +│ ├── rrc_api_data.ipynb # Step 1: Fetch and process RRC API data +│ ├── create_violations_inspections.ipynb # Step 2: Build cleaned inspection/violation files +│ ├── add_census_data.ipynb # Step 3: Link census demographics +│ ├── add_shape_layers.ipynb # Step 4: Spatial feature engineering +│ ├── well_shape.ipynb # Step 5: Well geometry and shapefile creation +│ └── well-api-manual.pdf # RRC API technical documentation +├── papers/ # Manuscript versions (DOCX + PDF) +├── analysis_output.json # Pre-computed summary statistics +└── requirements.txt # Python dependencies +``` + +## Analysis Pipeline + +``` +Raw RRC Data (API) + ↓ rebuild/rrc_api_data.ipynb +Cleaned Inspections & Violations CSVs + ↓ rebuild/create_violations_inspections.ipynb +Link Census Demographics + ↓ rebuild/add_census_data.ipynb +Add Geographic Layers + ↓ rebuild/add_shape_layers.ipynb +PostgreSQL Data Warehouse + ↓ analysis/well_analyzer.py +District-Year Panel + ↓ analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb +Econometric Models → Figures → Manuscript +``` + +### Econometric Models + +| Model | Description | +|-------|-------------| +| 1 | Interrupted time-series (all districts pooled) | +| 2 | District-specific post-policy fixed effects | +| 3 | Offshore jurisdiction moderator (districts 02/03/04) | +| 4 | Spatial autocorrelation diagnostics (Moran's I, 5,000 permutations) | +| 5 | Structural moderators: capacity, baseline compliance, EJ, geology, rurality, border proximity | + +## Setup + +### Prerequisites + +- Python 3.9+ +- PostgreSQL (with PostGIS for spatial queries) +- The `well_analyzer.py` module reads database credentials from environment variables + +### Install dependencies + +```bash +pip install -r requirements.txt +``` + +### Database configuration + +Set the following environment variables before running analysis: + +```bash +export PGHOST=localhost +export PGPORT=5432 +export PGUSER=your_user +export PGPASSWORD=your_password +export PGDATABASE=texas_data +``` + +### Run the data pipeline + +Execute the notebooks in `rebuild/` in order (steps 1–5) to populate the PostgreSQL database, then open `analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb` for the main analysis. + +## Key Statistics (2015–2025) + +- **1,878,764** inspections across **420,185** unique wells +- Overall compliance rate: **89.9%** (up from 88.4% in 2015 to 92.9% in 2024) +- **193,338** violations across **81,670** unique wells +- Mean days from violation discovery to enforcement action: **127** (median: 14) +- Compliance on re-inspection: **57.2%** +- District compliance range: 81.2% (District 09) to 94.4% (District 8A) + +## Dependencies + +``` +pandas +numpy +sqlalchemy +psycopg2 +scipy +statsmodels +matplotlib +seaborn +geopandas +shapely +libpysal +esda +```