added README.md

This commit is contained in:
dadams
2026-03-09 08:21:50 -07:00
parent 8a7e82cd03
commit 5127e187b9

166
README.md Normal file
View File

@@ -0,0 +1,166 @@
# Texas Borderlands: Regulatory Enforcement Disparities
An empirical research project examining whether oil and gas regulatory enforcement in Texas differs systematically between border-proximate and interior districts — and whether a 2019 disclosure reform produced heterogeneous effects across those regions.
## Research Questions
- **RQ1:** Do border-exposed Texas Railroad Commission (RRC) districts differ from non-border districts in inspection intensity, violation detection, enforcement timing, and resolution rates?
- **RQ2:** Did the 2019 disclosure reform change enforcement outcomes differently in border districts versus non-border districts?
## Key Findings
| Outcome | Border Districts | Non-Border Districts |
|--------|-----------------|---------------------|
| Inspections per well | 1.329 | 1.515 |
| Violations per inspection | 0.130 | 0.098 |
| Days to enforcement | 145.2 | 122.8 |
| Resolution rate | 0.543 | 0.596 |
**Post-2019 reform effect:** Enforcement processing time in border districts improved by **~75 days** (p=0.016) relative to non-border districts — but inspection reach and resolution rates did not converge. Conclusion: *faster pipeline, not wider pipeline*.
## Project Structure
```
texas-borderlands/
├── analysis/
│ ├── borderlands.ipynb # Main analysis notebook
│ ├── well_analyzer.py # WellAnalyzer class (data loading + metrics)
│ └── output_borderlands/
│ ├── rq1_results.csv # RQ1 regression results
│ ├── rq2_results.csv # RQ2 FE interaction results
│ ├── district_year_panel_borderlands.csv
│ ├── border_vs_nonborder_trends.png
│ ├── money_plot_timing_border_prepost2019.png
│ ├── well_border_exposure_map.png
│ ├── continuous_exposure_results.csv
│ ├── cutoff_sensitivity_results.csv
│ └── border_type_split_results.csv
├── data/
│ ├── oil_gas_basin_shape/ # EIA TX shale basin boundaries
│ ├── shale_play_shape/ # EIA TX shale play delineations
│ ├── texas_county_shape/ # US Census TX county subdivisions (2025)
│ ├── texmex_shape/ # US Census TX-MX international boundary (2023)
│ ├── competition_panel.csv
│ └── district_competitor_links.csv
├── intro_thoery_methods_analysis_results_discussion.md # Full paper draft
├── appendix.md # Supplementary tables and robustness checks
└── requirements.txt
```
## Tech Stack
- **Python 3**
- **pandas / numpy** — data manipulation and panel construction
- **sqlalchemy / psycopg2** — PostgreSQL database access
- **geopandas / shapely** — geospatial analysis and border proximity measurement
- **scipy / statsmodels** — regression models (OLS, fixed effects)
- **libpysal / esda** — spatial econometrics
- **matplotlib / seaborn** — visualization
- **python-dotenv** — environment configuration
## Setup
### 1. Install dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure the database connection
Create a `.env` file in the project root (or set environment variables directly):
```env
PGHOST=localhost
PGPORT=5432
PGUSER=postgres
PGPASSWORD=your_password
PGDATABASE=texas_data
```
The database should have PostGIS enabled and contain the following tables:
| Table | Description |
|-------|-------------|
| `well_shape_tract` (or similar) | Wells with location and demographic enrichment |
| `inspections` | Inspection records with dates and district info |
| `violations` | Violation records with enforcement and resolution dates |
The `WellAnalyzer` class auto-detects the wells table name from a set of known aliases.
### 3. Run the analysis
Open and run the Jupyter notebook:
```bash
jupyter notebook analysis/borderlands.ipynb
```
Or use the `WellAnalyzer` class directly:
```python
from analysis.well_analyzer import WellAnalyzer
analyzer = WellAnalyzer()
analyzer.print_analysis()
analyzer.export_analysis("output.json")
```
## Data
### Primary Data (PostgreSQL)
- **~1.01M wells** with geospatial coordinates and demographic/census tract enrichment
- **~1.87M inspections** (20152025)
- **~191.7K violations** (20152025)
- **District-year panel:** 143 observations (13 RRC districts × 11 years)
### Shapefiles
| File | Source | Purpose |
|------|--------|---------|
| `texmex_shape/` | US Census Bureau (2023) | TX-MX border geometry for proximity calculations |
| `texas_county_shape/` | US Census Bureau (2025) | State and county boundaries |
| `oil_gas_basin_shape/` | US EIA | Texas shale basin delineations |
| `shale_play_shape/` | US EIA | Texas shale play delineations |
### Border Exposure Definitions
- **District-level:** Binary — district centroid or wells within 50 km of any state/international border
- **Well-level:** Binary flags at 25 km and 50 km buffers from TX-Mexico border
- **Border subtypes:** TX-MX, TX-NM, TX-OK, TX-LA
Border-exposed wells (50 km buffer): **169,520** of 1,010,432 total.
## Empirical Design
**Unit of analysis:** Texas RRC district × year (20152025)
**Outcome variables:**
- Inspection intensity (inspections per well)
- Violation rate (violations per inspection)
- Days to enforcement action
- Resolution rate (compliance on reinspection)
**RQ1 — Levels model:**
```
Y_{dt} = α + β·Border_d + γ·X_{dt} + ε_{dt}
```
**RQ2 — Fixed effects interaction model:**
```
Y_{dt} = α_d + δ_t + β·(Post2019_t × Border_d) + γ·X_{dt} + ε_{dt}
```
**Robustness checks:** Border-type splitting, continuous exposure shares, cutoff sensitivity (25/75/100 km thresholds).
## Documentation
- `intro_thoery_methods_analysis_results_discussion.md` — full paper draft covering theory, methods, results, and discussion
- `appendix.md` — supplementary regression tables, robustness checks, and district profiles
## License
This project is for academic research purposes. Underlying data sources are public records from the Texas Railroad Commission and US federal agencies.