added README.md
This commit is contained in:
166
README.md
Normal file
166
README.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Texas Borderlands: Regulatory Enforcement Disparities
|
||||
|
||||
An empirical research project examining whether oil and gas regulatory enforcement in Texas differs systematically between border-proximate and interior districts — and whether a 2019 disclosure reform produced heterogeneous effects across those regions.
|
||||
|
||||
## Research Questions
|
||||
|
||||
- **RQ1:** Do border-exposed Texas Railroad Commission (RRC) districts differ from non-border districts in inspection intensity, violation detection, enforcement timing, and resolution rates?
|
||||
- **RQ2:** Did the 2019 disclosure reform change enforcement outcomes differently in border districts versus non-border districts?
|
||||
|
||||
## Key Findings
|
||||
|
||||
| Outcome | Border Districts | Non-Border Districts |
|
||||
|--------|-----------------|---------------------|
|
||||
| Inspections per well | 1.329 | 1.515 |
|
||||
| Violations per inspection | 0.130 | 0.098 |
|
||||
| Days to enforcement | 145.2 | 122.8 |
|
||||
| Resolution rate | 0.543 | 0.596 |
|
||||
|
||||
**Post-2019 reform effect:** Enforcement processing time in border districts improved by **~75 days** (p=0.016) relative to non-border districts — but inspection reach and resolution rates did not converge. Conclusion: *faster pipeline, not wider pipeline*.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
texas-borderlands/
|
||||
├── analysis/
|
||||
│ ├── borderlands.ipynb # Main analysis notebook
|
||||
│ ├── well_analyzer.py # WellAnalyzer class (data loading + metrics)
|
||||
│ └── output_borderlands/
|
||||
│ ├── rq1_results.csv # RQ1 regression results
|
||||
│ ├── rq2_results.csv # RQ2 FE interaction results
|
||||
│ ├── district_year_panel_borderlands.csv
|
||||
│ ├── border_vs_nonborder_trends.png
|
||||
│ ├── money_plot_timing_border_prepost2019.png
|
||||
│ ├── well_border_exposure_map.png
|
||||
│ ├── continuous_exposure_results.csv
|
||||
│ ├── cutoff_sensitivity_results.csv
|
||||
│ └── border_type_split_results.csv
|
||||
│
|
||||
├── data/
|
||||
│ ├── oil_gas_basin_shape/ # EIA TX shale basin boundaries
|
||||
│ ├── shale_play_shape/ # EIA TX shale play delineations
|
||||
│ ├── texas_county_shape/ # US Census TX county subdivisions (2025)
|
||||
│ ├── texmex_shape/ # US Census TX-MX international boundary (2023)
|
||||
│ ├── competition_panel.csv
|
||||
│ └── district_competitor_links.csv
|
||||
│
|
||||
├── intro_thoery_methods_analysis_results_discussion.md # Full paper draft
|
||||
├── appendix.md # Supplementary tables and robustness checks
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Python 3**
|
||||
- **pandas / numpy** — data manipulation and panel construction
|
||||
- **sqlalchemy / psycopg2** — PostgreSQL database access
|
||||
- **geopandas / shapely** — geospatial analysis and border proximity measurement
|
||||
- **scipy / statsmodels** — regression models (OLS, fixed effects)
|
||||
- **libpysal / esda** — spatial econometrics
|
||||
- **matplotlib / seaborn** — visualization
|
||||
- **python-dotenv** — environment configuration
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Install dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Configure the database connection
|
||||
|
||||
Create a `.env` file in the project root (or set environment variables directly):
|
||||
|
||||
```env
|
||||
PGHOST=localhost
|
||||
PGPORT=5432
|
||||
PGUSER=postgres
|
||||
PGPASSWORD=your_password
|
||||
PGDATABASE=texas_data
|
||||
```
|
||||
|
||||
The database should have PostGIS enabled and contain the following tables:
|
||||
|
||||
| Table | Description |
|
||||
|-------|-------------|
|
||||
| `well_shape_tract` (or similar) | Wells with location and demographic enrichment |
|
||||
| `inspections` | Inspection records with dates and district info |
|
||||
| `violations` | Violation records with enforcement and resolution dates |
|
||||
|
||||
The `WellAnalyzer` class auto-detects the wells table name from a set of known aliases.
|
||||
|
||||
### 3. Run the analysis
|
||||
|
||||
Open and run the Jupyter notebook:
|
||||
|
||||
```bash
|
||||
jupyter notebook analysis/borderlands.ipynb
|
||||
```
|
||||
|
||||
Or use the `WellAnalyzer` class directly:
|
||||
|
||||
```python
|
||||
from analysis.well_analyzer import WellAnalyzer
|
||||
|
||||
analyzer = WellAnalyzer()
|
||||
analyzer.print_analysis()
|
||||
analyzer.export_analysis("output.json")
|
||||
```
|
||||
|
||||
## Data
|
||||
|
||||
### Primary Data (PostgreSQL)
|
||||
|
||||
- **~1.01M wells** with geospatial coordinates and demographic/census tract enrichment
|
||||
- **~1.87M inspections** (2015–2025)
|
||||
- **~191.7K violations** (2015–2025)
|
||||
- **District-year panel:** 143 observations (13 RRC districts × 11 years)
|
||||
|
||||
### Shapefiles
|
||||
|
||||
| File | Source | Purpose |
|
||||
|------|--------|---------|
|
||||
| `texmex_shape/` | US Census Bureau (2023) | TX-MX border geometry for proximity calculations |
|
||||
| `texas_county_shape/` | US Census Bureau (2025) | State and county boundaries |
|
||||
| `oil_gas_basin_shape/` | US EIA | Texas shale basin delineations |
|
||||
| `shale_play_shape/` | US EIA | Texas shale play delineations |
|
||||
|
||||
### Border Exposure Definitions
|
||||
|
||||
- **District-level:** Binary — district centroid or wells within 50 km of any state/international border
|
||||
- **Well-level:** Binary flags at 25 km and 50 km buffers from TX-Mexico border
|
||||
- **Border subtypes:** TX-MX, TX-NM, TX-OK, TX-LA
|
||||
|
||||
Border-exposed wells (50 km buffer): **169,520** of 1,010,432 total.
|
||||
|
||||
## Empirical Design
|
||||
|
||||
**Unit of analysis:** Texas RRC district × year (2015–2025)
|
||||
|
||||
**Outcome variables:**
|
||||
- Inspection intensity (inspections per well)
|
||||
- Violation rate (violations per inspection)
|
||||
- Days to enforcement action
|
||||
- Resolution rate (compliance on reinspection)
|
||||
|
||||
**RQ1 — Levels model:**
|
||||
```
|
||||
Y_{dt} = α + β·Border_d + γ·X_{dt} + ε_{dt}
|
||||
```
|
||||
|
||||
**RQ2 — Fixed effects interaction model:**
|
||||
```
|
||||
Y_{dt} = α_d + δ_t + β·(Post2019_t × Border_d) + γ·X_{dt} + ε_{dt}
|
||||
```
|
||||
|
||||
**Robustness checks:** Border-type splitting, continuous exposure shares, cutoff sensitivity (25/75/100 km thresholds).
|
||||
|
||||
## Documentation
|
||||
|
||||
- `intro_thoery_methods_analysis_results_discussion.md` — full paper draft covering theory, methods, results, and discussion
|
||||
- `appendix.md` — supplementary regression tables, robustness checks, and district profiles
|
||||
|
||||
## License
|
||||
|
||||
This project is for academic research purposes. Underlying data sources are public records from the Texas Railroad Commission and US federal agencies.
|
||||
Reference in New Issue
Block a user