Files
texas-borderlands/README.md
2026-03-09 08:21:50 -07:00

167 lines
6.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Texas Borderlands: Regulatory Enforcement Disparities
An empirical research project examining whether oil and gas regulatory enforcement in Texas differs systematically between border-proximate and interior districts — and whether a 2019 disclosure reform produced heterogeneous effects across those regions.
## Research Questions
- **RQ1:** Do border-exposed Texas Railroad Commission (RRC) districts differ from non-border districts in inspection intensity, violation detection, enforcement timing, and resolution rates?
- **RQ2:** Did the 2019 disclosure reform change enforcement outcomes differently in border districts versus non-border districts?
## Key Findings
| Outcome | Border Districts | Non-Border Districts |
|--------|-----------------|---------------------|
| Inspections per well | 1.329 | 1.515 |
| Violations per inspection | 0.130 | 0.098 |
| Days to enforcement | 145.2 | 122.8 |
| Resolution rate | 0.543 | 0.596 |
**Post-2019 reform effect:** Enforcement processing time in border districts improved by **~75 days** (p=0.016) relative to non-border districts — but inspection reach and resolution rates did not converge. Conclusion: *faster pipeline, not wider pipeline*.
## Project Structure
```
texas-borderlands/
├── analysis/
│ ├── borderlands.ipynb # Main analysis notebook
│ ├── well_analyzer.py # WellAnalyzer class (data loading + metrics)
│ └── output_borderlands/
│ ├── rq1_results.csv # RQ1 regression results
│ ├── rq2_results.csv # RQ2 FE interaction results
│ ├── district_year_panel_borderlands.csv
│ ├── border_vs_nonborder_trends.png
│ ├── money_plot_timing_border_prepost2019.png
│ ├── well_border_exposure_map.png
│ ├── continuous_exposure_results.csv
│ ├── cutoff_sensitivity_results.csv
│ └── border_type_split_results.csv
├── data/
│ ├── oil_gas_basin_shape/ # EIA TX shale basin boundaries
│ ├── shale_play_shape/ # EIA TX shale play delineations
│ ├── texas_county_shape/ # US Census TX county subdivisions (2025)
│ ├── texmex_shape/ # US Census TX-MX international boundary (2023)
│ ├── competition_panel.csv
│ └── district_competitor_links.csv
├── intro_thoery_methods_analysis_results_discussion.md # Full paper draft
├── appendix.md # Supplementary tables and robustness checks
└── requirements.txt
```
## Tech Stack
- **Python 3**
- **pandas / numpy** — data manipulation and panel construction
- **sqlalchemy / psycopg2** — PostgreSQL database access
- **geopandas / shapely** — geospatial analysis and border proximity measurement
- **scipy / statsmodels** — regression models (OLS, fixed effects)
- **libpysal / esda** — spatial econometrics
- **matplotlib / seaborn** — visualization
- **python-dotenv** — environment configuration
## Setup
### 1. Install dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure the database connection
Create a `.env` file in the project root (or set environment variables directly):
```env
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGPASSWORD=your_password
PGDATABASE=texas_data
```
The database should have PostGIS enabled and contain the following tables:
| Table | Description |
|-------|-------------|
| `well_shape_tract` (or similar) | Wells with location and demographic enrichment |
| `inspections` | Inspection records with dates and district info |
| `violations` | Violation records with enforcement and resolution dates |
The `WellAnalyzer` class auto-detects the wells table name from a set of known aliases.
### 3. Run the analysis
Open and run the Jupyter notebook:
```bash
jupyter notebook analysis/borderlands.ipynb
```
Or use the `WellAnalyzer` class directly:
```python
from analysis.well_analyzer import WellAnalyzer
analyzer = WellAnalyzer()
analyzer.print_analysis()
analyzer.export_analysis("output.json")
```
## Data
### Primary Data (PostgreSQL)
- **~1.01M wells** with geospatial coordinates and demographic/census tract enrichment
- **~1.87M inspections** (20152025)
- **~191.7K violations** (20152025)
- **District-year panel:** 143 observations (13 RRC districts × 11 years)
### Shapefiles
| File | Source | Purpose |
|------|--------|---------|
| `texmex_shape/` | US Census Bureau (2023) | TX-MX border geometry for proximity calculations |
| `texas_county_shape/` | US Census Bureau (2025) | State and county boundaries |
| `oil_gas_basin_shape/` | US EIA | Texas shale basin delineations |
| `shale_play_shape/` | US EIA | Texas shale play delineations |
### Border Exposure Definitions
- **District-level:** Binary — district centroid or wells within 50 km of any state/international border
- **Well-level:** Binary flags at 25 km and 50 km buffers from TX-Mexico border
- **Border subtypes:** TX-MX, TX-NM, TX-OK, TX-LA
Border-exposed wells (50 km buffer): **169,520** of 1,010,432 total.
## Empirical Design
**Unit of analysis:** Texas RRC district × year (20152025)
**Outcome variables:**
- Inspection intensity (inspections per well)
- Violation rate (violations per inspection)
- Days to enforcement action
- Resolution rate (compliance on reinspection)
**RQ1 — Levels model:**
```
Y_{dt} = α + β·Border_d + γ·X_{dt} + ε_{dt}
```
**RQ2 — Fixed effects interaction model:**
```
Y_{dt} = α_d + δ_t + β·(Post2019_t × Border_d) + γ·X_{dt} + ε_{dt}
```
**Robustness checks:** Border-type splitting, continuous exposure shares, cutoff sensitivity (25/75/100 km thresholds).
## Documentation
- `intro_thoery_methods_analysis_results_discussion.md` — full paper draft covering theory, methods, results, and discussion
- `appendix.md` — supplementary regression tables, robustness checks, and district profiles
## License
This project is for academic research purposes. Underlying data sources are public records from the Texas Railroad Commission and US federal agencies.