add README with project overview, pipeline, and setup instructions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
145
README.md
Normal file
145
README.md
Normal file
@@ -0,0 +1,145 @@
|
|||||||
|
# Texas District Analysis: Regulatory Transparency and Enforcement in the Oil & Gas Industry
|
||||||
|
|
||||||
|
A research project examining how transparency disclosure reforms affect enforcement behavior in the Texas Railroad Commission (RRC), with a focus on district-level heterogeneity across 13 RRC regulatory districts from 2015–2025.
|
||||||
|
|
||||||
|
## Research Overview
|
||||||
|
|
||||||
|
**Core question**: Does making well-level violation data publicly searchable change how quickly the RRC acts on violations?
|
||||||
|
|
||||||
|
The January 2019 RRC policy change — making well violation data publicly searchable — serves as the exogenous policy shock. The analysis tests whether and how this disclosure reform altered enforcement timing and compliance outcomes across districts, with particular attention to offshore-regulating districts (02, 03, 04) and structural moderators like basin composition, enforcement capacity, and environmental justice dimensions.
|
||||||
|
|
||||||
|
**Key findings:**
|
||||||
|
- No immediate post-2019 level shift in enforcement timing (coef=0.1514, p=0.33)
|
||||||
|
- Significant post-2019 trend acceleration: enforcement speed improves gradually over time (coef=−0.3603, p=0.001)
|
||||||
|
- Offshore-regulating districts show differential post-policy response (coef=0.3819, p<0.001), strongest in 2023–2024
|
||||||
|
- Basin composition is the clearest structural correlate of district-level heterogeneity
|
||||||
|
|
||||||
|
## Data
|
||||||
|
|
||||||
|
All raw data originates from the Texas Railroad Commission and supplementary government sources:
|
||||||
|
|
||||||
|
| Source | Description | Size |
|
||||||
|
|--------|-------------|------|
|
||||||
|
| Texas RRC | ~3.6M inspection records (pipe-delimited) | 424 MB |
|
||||||
|
| Texas RRC | ~368K violation records (pipe-delimited) | 66 MB |
|
||||||
|
| U.S. Census | Poverty rates and demographics by census tract | — |
|
||||||
|
| USDA RUCA (2020) | Rural-Urban Commuting Area classifications | 25 MB |
|
||||||
|
| USEIA | Shale basin and play shapefiles | — |
|
||||||
|
| Texas county shapefiles | County boundaries for spatial visualization | — |
|
||||||
|
|
||||||
|
Data covers approximately 1.01 million wells, 1.87 million inspections, and 191K violations within the 2015–2025 study window.
|
||||||
|
|
||||||
|
**Note**: Raw data files are large (several hundred MB each) and are excluded from version control via `.gitignore`. The data pipeline is fully documented in the `rebuild/` notebooks.
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
texas-district-analysis/
|
||||||
|
├── analysis/
|
||||||
|
│ ├── well_analyzer.py # Core analysis engine (PostgreSQL → metrics)
|
||||||
|
│ ├── updated_district_level_analysis_2015-2025_offshore_controls.ipynb # Main analysis notebook
|
||||||
|
│ ├── draft.md # Manuscript draft
|
||||||
|
│ ├── draft_appendix.md # Technical appendix with model specifications
|
||||||
|
│ ├── *.png # Figures (event study, district maps, etc.)
|
||||||
|
│ └── archive/ # Earlier notebook versions and alternate specs
|
||||||
|
├── data/
|
||||||
|
│ ├── INSPECTIONS.txt # Raw inspection records (pipe-delimited)
|
||||||
|
│ ├── VIOLATIONS.txt # Raw violation records (pipe-delimited)
|
||||||
|
│ ├── RUCA-codes-2020-tract.csv # RUCA classification by census tract
|
||||||
|
│ ├── district_by_county.csv # District–county crosswalk
|
||||||
|
│ └── {oil_gas_basin,shale_play,texas_county,texmex}_shape/ # ESRI shapefiles
|
||||||
|
├── rebuild/
|
||||||
|
│ ├── rrc_api_data.ipynb # Step 1: Fetch and process RRC API data
|
||||||
|
│ ├── create_violations_inspections.ipynb # Step 2: Build cleaned inspection/violation files
|
||||||
|
│ ├── add_census_data.ipynb # Step 3: Link census demographics
|
||||||
|
│ ├── add_shape_layers.ipynb # Step 4: Spatial feature engineering
|
||||||
|
│ ├── well_shape.ipynb # Step 5: Well geometry and shapefile creation
|
||||||
|
│ └── well-api-manual.pdf # RRC API technical documentation
|
||||||
|
├── papers/ # Manuscript versions (DOCX + PDF)
|
||||||
|
├── analysis_output.json # Pre-computed summary statistics
|
||||||
|
└── requirements.txt # Python dependencies
|
||||||
|
```
|
||||||
|
|
||||||
|
## Analysis Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Raw RRC Data (API)
|
||||||
|
↓ rebuild/rrc_api_data.ipynb
|
||||||
|
Cleaned Inspections & Violations CSVs
|
||||||
|
↓ rebuild/create_violations_inspections.ipynb
|
||||||
|
Link Census Demographics
|
||||||
|
↓ rebuild/add_census_data.ipynb
|
||||||
|
Add Geographic Layers
|
||||||
|
↓ rebuild/add_shape_layers.ipynb
|
||||||
|
PostgreSQL Data Warehouse
|
||||||
|
↓ analysis/well_analyzer.py
|
||||||
|
District-Year Panel
|
||||||
|
↓ analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb
|
||||||
|
Econometric Models → Figures → Manuscript
|
||||||
|
```
|
||||||
|
|
||||||
|
### Econometric Models
|
||||||
|
|
||||||
|
| Model | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| 1 | Interrupted time-series (all districts pooled) |
|
||||||
|
| 2 | District-specific post-policy fixed effects |
|
||||||
|
| 3 | Offshore jurisdiction moderator (districts 02/03/04) |
|
||||||
|
| 4 | Spatial autocorrelation diagnostics (Moran's I, 5,000 permutations) |
|
||||||
|
| 5 | Structural moderators: capacity, baseline compliance, EJ, geology, rurality, border proximity |
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Python 3.9+
|
||||||
|
- PostgreSQL (with PostGIS for spatial queries)
|
||||||
|
- The `well_analyzer.py` module reads database credentials from environment variables
|
||||||
|
|
||||||
|
### Install dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database configuration
|
||||||
|
|
||||||
|
Set the following environment variables before running analysis:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export PGHOST=localhost
|
||||||
|
export PGPORT=5432
|
||||||
|
export PGUSER=your_user
|
||||||
|
export PGPASSWORD=your_password
|
||||||
|
export PGDATABASE=texas_data
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run the data pipeline
|
||||||
|
|
||||||
|
Execute the notebooks in `rebuild/` in order (steps 1–5) to populate the PostgreSQL database, then open `analysis/updated_district_level_analysis_2015-2025_offshore_controls.ipynb` for the main analysis.
|
||||||
|
|
||||||
|
## Key Statistics (2015–2025)
|
||||||
|
|
||||||
|
- **1,878,764** inspections across **420,185** unique wells
|
||||||
|
- Overall compliance rate: **89.9%** (up from 88.4% in 2015 to 92.9% in 2024)
|
||||||
|
- **193,338** violations across **81,670** unique wells
|
||||||
|
- Mean days from violation discovery to enforcement action: **127** (median: 14)
|
||||||
|
- Compliance on re-inspection: **57.2%**
|
||||||
|
- District compliance range: 81.2% (District 09) to 94.4% (District 8A)
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
```
|
||||||
|
pandas
|
||||||
|
numpy
|
||||||
|
sqlalchemy
|
||||||
|
psycopg2
|
||||||
|
scipy
|
||||||
|
statsmodels
|
||||||
|
matplotlib
|
||||||
|
seaborn
|
||||||
|
geopandas
|
||||||
|
shapely
|
||||||
|
libpysal
|
||||||
|
esda
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user