diff --git a/README.md b/README.md new file mode 100644 index 0000000..2431cc6 --- /dev/null +++ b/README.md @@ -0,0 +1,166 @@ +# Texas Borderlands: Regulatory Enforcement Disparities + +An empirical research project examining whether oil and gas regulatory enforcement in Texas differs systematically between border-proximate and interior districts — and whether a 2019 disclosure reform produced heterogeneous effects across those regions. + +## Research Questions + +- **RQ1:** Do border-exposed Texas Railroad Commission (RRC) districts differ from non-border districts in inspection intensity, violation detection, enforcement timing, and resolution rates? +- **RQ2:** Did the 2019 disclosure reform change enforcement outcomes differently in border districts versus non-border districts? + +## Key Findings + +| Outcome | Border Districts | Non-Border Districts | +|--------|-----------------|---------------------| +| Inspections per well | 1.329 | 1.515 | +| Violations per inspection | 0.130 | 0.098 | +| Days to enforcement | 145.2 | 122.8 | +| Resolution rate | 0.543 | 0.596 | + +**Post-2019 reform effect:** Enforcement processing time in border districts improved by **~75 days** (p=0.016) relative to non-border districts — but inspection reach and resolution rates did not converge. Conclusion: *faster pipeline, not wider pipeline*. + +## Project Structure + +``` +texas-borderlands/ +├── analysis/ +│ ├── borderlands.ipynb # Main analysis notebook +│ ├── well_analyzer.py # WellAnalyzer class (data loading + metrics) +│ └── output_borderlands/ +│ ├── rq1_results.csv # RQ1 regression results +│ ├── rq2_results.csv # RQ2 FE interaction results +│ ├── district_year_panel_borderlands.csv +│ ├── border_vs_nonborder_trends.png +│ ├── money_plot_timing_border_prepost2019.png +│ ├── well_border_exposure_map.png +│ ├── continuous_exposure_results.csv +│ ├── cutoff_sensitivity_results.csv +│ └── border_type_split_results.csv +│ +├── data/ +│ ├── oil_gas_basin_shape/ # EIA TX shale basin boundaries +│ ├── shale_play_shape/ # EIA TX shale play delineations +│ ├── texas_county_shape/ # US Census TX county subdivisions (2025) +│ ├── texmex_shape/ # US Census TX-MX international boundary (2023) +│ ├── competition_panel.csv +│ └── district_competitor_links.csv +│ +├── intro_thoery_methods_analysis_results_discussion.md # Full paper draft +├── appendix.md # Supplementary tables and robustness checks +└── requirements.txt +``` + +## Tech Stack + +- **Python 3** +- **pandas / numpy** — data manipulation and panel construction +- **sqlalchemy / psycopg2** — PostgreSQL database access +- **geopandas / shapely** — geospatial analysis and border proximity measurement +- **scipy / statsmodels** — regression models (OLS, fixed effects) +- **libpysal / esda** — spatial econometrics +- **matplotlib / seaborn** — visualization +- **python-dotenv** — environment configuration + +## Setup + +### 1. Install dependencies + +```bash +pip install -r requirements.txt +``` + +### 2. Configure the database connection + +Create a `.env` file in the project root (or set environment variables directly): + +```env +PGHOST=localhost +PGPORT=5432 +PGUSER=postgres +PGPASSWORD=your_password +PGDATABASE=texas_data +``` + +The database should have PostGIS enabled and contain the following tables: + +| Table | Description | +|-------|-------------| +| `well_shape_tract` (or similar) | Wells with location and demographic enrichment | +| `inspections` | Inspection records with dates and district info | +| `violations` | Violation records with enforcement and resolution dates | + +The `WellAnalyzer` class auto-detects the wells table name from a set of known aliases. + +### 3. Run the analysis + +Open and run the Jupyter notebook: + +```bash +jupyter notebook analysis/borderlands.ipynb +``` + +Or use the `WellAnalyzer` class directly: + +```python +from analysis.well_analyzer import WellAnalyzer + +analyzer = WellAnalyzer() +analyzer.print_analysis() +analyzer.export_analysis("output.json") +``` + +## Data + +### Primary Data (PostgreSQL) + +- **~1.01M wells** with geospatial coordinates and demographic/census tract enrichment +- **~1.87M inspections** (2015–2025) +- **~191.7K violations** (2015–2025) +- **District-year panel:** 143 observations (13 RRC districts × 11 years) + +### Shapefiles + +| File | Source | Purpose | +|------|--------|---------| +| `texmex_shape/` | US Census Bureau (2023) | TX-MX border geometry for proximity calculations | +| `texas_county_shape/` | US Census Bureau (2025) | State and county boundaries | +| `oil_gas_basin_shape/` | US EIA | Texas shale basin delineations | +| `shale_play_shape/` | US EIA | Texas shale play delineations | + +### Border Exposure Definitions + +- **District-level:** Binary — district centroid or wells within 50 km of any state/international border +- **Well-level:** Binary flags at 25 km and 50 km buffers from TX-Mexico border +- **Border subtypes:** TX-MX, TX-NM, TX-OK, TX-LA + +Border-exposed wells (50 km buffer): **169,520** of 1,010,432 total. + +## Empirical Design + +**Unit of analysis:** Texas RRC district × year (2015–2025) + +**Outcome variables:** +- Inspection intensity (inspections per well) +- Violation rate (violations per inspection) +- Days to enforcement action +- Resolution rate (compliance on reinspection) + +**RQ1 — Levels model:** +``` +Y_{dt} = α + β·Border_d + γ·X_{dt} + ε_{dt} +``` + +**RQ2 — Fixed effects interaction model:** +``` +Y_{dt} = α_d + δ_t + β·(Post2019_t × Border_d) + γ·X_{dt} + ε_{dt} +``` + +**Robustness checks:** Border-type splitting, continuous exposure shares, cutoff sensitivity (25/75/100 km thresholds). + +## Documentation + +- `intro_thoery_methods_analysis_results_discussion.md` — full paper draft covering theory, methods, results, and discussion +- `appendix.md` — supplementary regression tables, robustness checks, and district profiles + +## License + +This project is for academic research purposes. Underlying data sources are public records from the Texas Railroad Commission and US federal agencies.