diff --git a/appendix.docx b/appendix.docx new file mode 100644 index 0000000..5f6436b Binary files /dev/null and b/appendix.docx differ diff --git a/appendix.md b/appendix.md new file mode 100644 index 0000000..4e826d9 --- /dev/null +++ b/appendix.md @@ -0,0 +1,127 @@ +# Appendix + +## A1. Data Construction and Scope + +- Analysis window: 2015-2025. +- District-year panel size: 143 rows (13 districts x 11 years). +- Primary identifiers: `api_norm`, `district`, `year`. +- Border geometry includes TX-MX plus TX-NM, TX-OK, TX-LA boundary proximity. + +### Build validation counts + +| Metric | Value | +|---|---:| +| Wells loaded | 1,010,432 | +| Inspections (2015-2025) | 1,867,859 | +| Violations (2015-2025) | 191,762 | +| Border-exposed wells (any 50 km border) | 169,520 | +| Panel observations | 143 | +| Districts | 13 | + +### Border subtype counts (50 km) + +| Border subtype | Wells | +|---|---:| +| TX-MX | 40,339 | +| TX-NM | 81,567 | +| TX-OK | 19,643 | +| TX-LA | 29,675 | + +## A2. Supplementary Equations + +### Border-type FE interaction model + +$$ +Y_{dt} = \alpha_d + \gamma_t + \sum_k \lambda_k(Post2019_t \times Type_{kd}) + \sum_k \phi_k(PostTrend_t \times Type_{kd}) + \varepsilon_{dt} +$$ +where $k \in \{\text{TX-MX}, \text{TX-NM}, \text{TX-OK}, \text{TX-LA}\}$. + +### Continuous exposure FE interaction model + +$$ +Y_{dt} = \alpha_d + \gamma_t + \eta_1(Post2019_t \times ShareBorder_{dt}) + \eta_2(PostTrend_t \times ShareBorder_{dt}) + \varepsilon_{dt} +$$ + +### Cutoff-specific exposure + +$$ +ShareBorder^{(c)}_{dt}, \quad c \in \{25,75,100\} +$$ +substituted into the same FE interaction framework. + +## A3. Main FE Interaction Table (RQ2) + +| Outcome | post_2019 x border | p-value | post_trend x border | p-value | N | +|---|---:|---:|---:|---:|---:| +| inspection_intensity | -0.1191 | 0.0753 | -0.0052 | 0.8181 | 143 | +| violations_per_inspection | 0.0040 | 0.8881 | -0.0012 | 0.8350 | 143 | +| avg_days_to_enforcement | -74.5893 | 0.0156 | -1.1587 | 0.9252 | 143 | +| resolution_rate | 0.0404 | 0.4520 | -0.0186 | 0.3404 | 143 | + +## A4. Border-Type Timing Interactions (Money Plot Companion) + +| Term | Coefficient | p-value | +|---|---:|---:| +| post_2019:has_tx_mex | 4.0900 | 0.9062 | +| post_2019:has_tx_nm | -18.7442 | 0.6013 | +| post_2019:has_tx_ok | -14.2446 | 0.8134 | +| post_2019:has_tx_la | -43.6598 | 0.6415 | +| post_trend:has_tx_mex | -0.0148 | 0.9991 | +| post_trend:has_tx_nm | 22.9067 | 0.0189 | +| post_trend:has_tx_ok | -16.7188 | 0.0794 | +| post_trend:has_tx_la | 0.6415 | 0.9551 | + +## A5. Continuous Exposure Results + +| Family | Outcome | Term | Coef | p-value | N | +|---|---|---|---:|---:|---:| +| RQ1 levels continuous | inspection_intensity | share_border_exposed_insp | 0.2095 | 0.4757 | 143 | +| RQ1 levels continuous | avg_days_to_enforcement | share_border_exposed_insp | 103.4683 | 0.5710 | 143 | +| RQ1 levels continuous | violations_per_inspection | share_border_exposed_insp | -0.1585 | 0.0144 | 143 | +| RQ1 levels continuous | resolution_rate | share_border_exposed_insp | -0.0420 | 0.8619 | 143 | +| RQ2 FE continuous | avg_days_to_enforcement | post_2019:share_border_exposed_insp | -109.4067 | 0.4449 | 143 | +| RQ2 FE continuous | avg_days_to_enforcement | post_trend:share_border_exposed_insp | 13.9623 | 0.7415 | 143 | +| RQ2 FE continuous | resolution_rate | post_2019:share_border_exposed_insp | -0.0322 | 0.8163 | 143 | +| RQ2 FE continuous | resolution_rate | post_trend:share_border_exposed_insp | -0.0979 | 0.0423 | 143 | + +## A6. Cutoff Sensitivity (Timing-Focused Terms) + +| Cutoff km | Term | Coef | p-value | N | +|---:|---|---:|---:|---:| +| 25 | post_2019:share_border_25km | -101.9283 | 0.7010 | 143 | +| 75 | post_2019:share_border_75km | -75.6591 | 0.5116 | 143 | +| 100 | post_2019:share_border_100km | -4.4795 | 0.9474 | 143 | + +## A7. District Border-Type Profile + +| District | Wells | Dominant type | TX-MX share | TX-NM share | TX-OK share | TX-LA share | +|---|---:|---|---:|---:|---:|---:| +| 01 | 31,898 | TX-MX | 0.2313 | 0.0000 | 0.0000 | 0.0000 | +| 02 | 17,099 | NONE | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +| 03 | 16,700 | TX-LA | 0.0000 | 0.0000 | 0.0000 | 0.1166 | +| 04 | 20,973 | TX-MX | 0.6384 | 0.0000 | 0.0000 | 0.0000 | +| 05 | 9,938 | TX-OK | 0.0000 | 0.0000 | 0.0022 | 0.0000 | +| 06 | 24,422 | TX-LA | 0.0000 | 0.0000 | 0.0293 | 0.5235 | +| 08 | 105,931 | TX-NM | 0.0001 | 0.1905 | 0.0000 | 0.0000 | +| 09 | 46,485 | NONE | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +| 10 | 29,621 | TX-OK | 0.0000 | 0.0009 | 0.3020 | 0.0000 | +| 6E | 6,235 | NONE | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +| 7B | 21,230 | NONE | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +| 7C | 43,061 | NONE | 0.0000 | 0.0000 | 0.0000 | 0.0000 | +| 8A | 42,005 | TX-NM | 0.0000 | 0.4182 | 0.0000 | 0.0000 | + +## A8. Figures and Output Artifacts + +- Border exposure map: `analysis/output_borderlands/well_border_exposure_map.png` +- Border vs non-border trends: `analysis/output_borderlands/border_vs_nonborder_trends.png` +- Main timing figure: `analysis/output_borderlands/money_plot_timing_border_prepost2019.png` +- Timing CI table: `analysis/output_borderlands/money_plot_timing_ci_by_year.csv` + +## A9. Optional Prior Artifact (Not Estimated in Current Causal Scope) + +`analysis/output_borderlands/competition_asymmetry_results.csv` contains: + +- `gap_pos` = 0.5837 (p < 0.001) +- `gap_neg` = 0.4993 (p = 0.0027) + +These estimates are not part of the current notebook's identified model scope and should not be interpreted as a completed reaction-function test in this manuscript version. diff --git a/intro_thoery_methods_analysis_results_discussion.docx b/intro_thoery_methods_analysis_results_discussion.docx new file mode 100644 index 0000000..cff8125 Binary files /dev/null and b/intro_thoery_methods_analysis_results_discussion.docx differ diff --git a/intro_thoery_methods_analysis_results_discussion.md b/intro_thoery_methods_analysis_results_discussion.md new file mode 100644 index 0000000..fbd567a --- /dev/null +++ b/intro_thoery_methods_analysis_results_discussion.md @@ -0,0 +1,180 @@ +# Introduction + +Regulatory enforcement in borderlands jurisdictions is often expected to differ from interior jurisdictions due to administrative constraints, multi-jurisdictional exposure, and monitoring frictions. This manuscript analyzes Texas Railroad Commission district-year outcomes (2015-2025) to assess whether border-exposed districts show systematic enforcement gaps and whether those gaps changed after the 2019 disclosure reform. + +The empirical design centers on two research questions from the notebook: + +1. RQ1 (Border gaps): Do border-exposed Texas districts differ from non-border districts in enforcement intensity and pipeline outcomes? +2. RQ2 (Disclosure heterogeneity): Did the 2019 disclosure reform change enforcement outcomes differently in border districts versus non-border districts (level shift and post-policy trend differential)? + +# Theory + +We use a borderlands governance framing with two linked mechanisms: capacity asymmetry and transparency-throughput effects. The corresponding hypotheses are: + +1. H1 (Border inspection gap): Border districts have lower inspection intensity than non-border districts. +2. H2 (Border pipeline disadvantage): Border districts show weaker enforcement pipeline outcomes (higher violations per inspection and/or slower timing and/or lower resolution rates). +3. H3 (Disclosure heterogeneity in levels): Post-2019 level shifts differ between border and non-border districts (`post_2019:border`). +4. H4 (Disclosure heterogeneity in trends): Post-2019 trend shifts differ between border and non-border districts (`post_trend:border`). + +This yields a core empirical claim: post-2019 border effects should be strongest in enforcement timing rather than in inspection coverage or resolution outcomes. + +# Methods + +## Data and Unit of Analysis + +- Unit: district-year. +- Coverage: 13 Texas RRC districts, 2015-2025. +- Source tables: `well_shape_tract`, `inspections`, `violations`. +- Sample in current run: 1,010,432 wells; 1,867,859 inspections; 191,762 violations; 143 district-year observations. + +## Border Measurement: District Coding and Well Proximity + +We use two complementary border constructions. + +1. District-level baseline treatment (`border_district`): districts in the predefined border-adjacent set (`01`, `02`, `06`, `08`, `8A`, `09`, `10`) are coded 1; others are coded 0. +2. Well-level proximity treatment: each well is classified by spatial proximity to border segments, then rolled up to district-year exposure shares. + +Well-level proximity was constructed from latitude/longitude and shapefiles as follows: + +1. Texas-Mexico distance/flags from `WellAnalyzer` (`within_25km_texmex`, `within_50km_texmex`). +2. Additional state-border segments (TX-NM, TX-OK, TX-LA) built from Texas county boundary geometry and seed lines. +3. Distances computed in projected CRS (EPSG:5070), then threshold flags generated at 25 km and 50 km. +4. Composite exposure indicators created: + - `within_50km_state_border_any` + - `well_border_exposed` (1 if within 50 km of TX-MX or any TX-state border segment). + +District-year well-proximity exposure is measured as: +$$ +ShareBorder_{dt} = \frac{BorderExposedInspections_{dt}}{Inspections_{dt}} +$$ +and an alternative district treatment is defined as `border_exposure_district = 1` when `ShareBorder_{dt} \ge 0.25`. + +## Outcomes + +$$ +InspectionIntensity_{dt} = \frac{Inspections_{dt}}{UniqueWells_{dt}} +$$ +$$ +ViolPerInsp_{dt} = \frac{Violations_{dt}}{Inspections_{dt}} +$$ +$$ +DaysToEnf_{dt} = \frac{1}{N_{dt}} \sum_{i=1}^{N_{dt}} (EnforcementDate_i - ViolationDiscoveryDate_i) +$$ +$$ +ResolutionRate_{dt} = \frac{CompliantOnReinspection_{dt}}{Violations_{dt}} +$$ + +## Exposure Definitions + +- Baseline treatment: `border_district` (binary district border status). +- Additional robustness exposures: + +1. Border-type indicators (`TX-MX`, `TX-NM`, `TX-OK`, `TX-LA`) +2. Continuous exposure share: + +$$ +ShareBorder_{dt} = \frac{BorderExposedInspections_{dt}}{Inspections_{dt}} +$$ +3. Cutoff sensitivity with 25/75/100 km thresholds. + +## Estimating Equations + +RQ1 levels: +$$ +Y_{dt} = \alpha + \beta_1 Border_d + \beta_2 \log(UniqueWells_{dt}) + \gamma_t + \varepsilon_{dt} +$$ + +RQ2 FE interaction: +$$ +Y_{dt} = \alpha_d + \gamma_t + \theta_1(Post2019_t \times Border_d) + \theta_2(PostTrend_t \times Border_d) + \varepsilon_{dt} +$$ +$$ +Post2019_t = \mathbb{1}[t \ge 2019], \quad PostTrend_t = \max(0, t-2019) +$$ + +Inference uses district-clustered standard errors (13 clusters), with emphasis on effect size and consistency across specifications. + +## Tests Run in Notebook + +The notebook estimated the following test families: + +1. Descriptive border-gap tests: + - Border vs non-border means for inspection intensity, violations per inspection, days to enforcement, and resolution rate. +2. RQ1 levels regressions (border gaps): + - Outcomes: `inspection_intensity`, `violations_per_inspection`. + - Specification: `border_district + log_unique_wells + C(year)`. +3. RQ2 FE interaction regressions (post-2019 heterogeneity): + - Outcomes: `inspection_intensity`, `violations_per_inspection`, `avg_days_to_enforcement`, `resolution_rate`. + - Specification: `C(district) + C(year) + post_2019:border_district + post_trend:border_district`. +4. Border-type robustness tests: + - District profiles for `TX-MX`, `TX-NM`, `TX-OK`, `TX-LA` exposure. + - RQ1-style levels with `has_tx_*` indicators. + - RQ2-style FE interactions with `post_2019:has_tx_*` and `post_trend:has_tx_*`. +5. Continuous-exposure robustness tests: + - Replace binary border indicator with `share_border_exposed_insp` in both RQ1-style and RQ2-style specifications. +6. Cutoff-sensitivity tests: + - Recompute proximity exposure from minimum distance to any border at 25 km, 75 km, and 100 km. + - Estimate RQ1-style models for inspection intensity and RQ2-style timing interaction models. +7. Visualization and reporting tests: + - Border/non-border trend plots. + - Main timing figure with district-year group means and 95% confidence intervals. +8. Competition/reaction-function scaffolding (not estimated as causal model): + - District-to-competitor jurisdiction link table and template generated for future interstate stringency integration. + +# Analysis + +## Descriptive Border Gaps + +| Outcome | Non-border | Border | +|---|---:|---:| +| Inspection intensity | 1.515 | 1.329 | +| Violations per inspection | 0.098 | 0.130 | +| Mean days to enforcement | 122.8 | 145.2 | +| Mean resolution rate | 0.596 | 0.543 | + +Descriptively, border districts show weaker enforcement conditions across coverage, detection conditional on inspection, timing, and follow-through. + +## Main Regression Evidence + +| Model | Coefficient | p-value | N | +|---|---:|---:|---:| +| RQ1: `border_district` on `inspection_intensity` | -0.1755 | 0.0999 | 143 | +| RQ1: `border_district` on `violations_per_inspection` | 0.0434 | 0.0949 | 143 | +| RQ2: `post_2019:border` on `inspection_intensity` | -0.1191 | 0.0753 | 143 | +| RQ2: `post_2019:border` on `violations_per_inspection` | 0.0040 | 0.8881 | 143 | +| RQ2: `post_2019:border` on `avg_days_to_enforcement` | -74.5893 | 0.0156 | 143 | +| RQ2: `post_2019:border` on `resolution_rate` | 0.0404 | 0.4520 | 143 | + +The most stable differential post-2019 effect is a border-specific improvement in enforcement timing. + +# Results + +## Hypothesis Tests + +| Hypothesis | Test evidence | Decision (current run) | +|---|---|---| +| H1: Border districts have lower inspection intensity | RQ1: `border_district -> inspection_intensity` = -0.1755, p = 0.0999; descriptives 1.329 (border) vs 1.515 (non-border) | Partial support | +| H2: Border districts have weaker pipeline outcomes | Descriptives: 0.130 vs 0.098 violations/inspection, 145.2 vs 122.8 days, 0.543 vs 0.596 resolution; RQ1 `border_district -> violations_per_inspection` = 0.0434, p = 0.0949 | Supported descriptively, mixed regression support | +| H3: Border-specific post-2019 level shift | RQ2 `post_2019:border -> avg_days_to_enforcement` = -74.5893, p = 0.0156; other outcomes null | Supported for timing only | +| H4: Border-specific post-2019 trend shift | RQ2 `post_trend:border` terms: inspection p = 0.8181, violations p = 0.8350, timing p = 0.9252, resolution p = 0.3404 | Not supported in baseline model | + +The hypothesis tests indicate the clearest inferential signal is a border-specific post-2019 timing level shift, consistent with "faster pipeline, not wider pipeline." + +## Figure Callouts + +Figure 1 (group trends): `analysis/output_borderlands/border_vs_nonborder_trends.png` +Figure 2 (main timing figure with CI): `analysis/output_borderlands/money_plot_timing_border_prepost2019.png` + +Figure 2 uses district-year means with equal district weighting: +$$ +\bar{Y}_{gt} = \frac{1}{n_{gt}} \sum_{d \in g} Y_{dt}, \quad +CI_{95\%} = \bar{Y}_{gt} \pm 1.96 \cdot \frac{s_{gt}}{\sqrt{n_{gt}}} +$$ + +# Discussion + +The findings are consistent with a transparency-throughput mechanism: disclosure-era pressure appears to accelerate processing where baseline constraints are stronger, but this does not map cleanly to expansion of enforcement reach or follow-through. The strongest claim supported by this design is "faster pipeline, not wider pipeline." + +The contribution is a boundary condition argument: transparency reforms can produce uneven administrative effects across territorial governance contexts, with timing responsiveness exceeding capacity expansion. + +The design does not identify interstate strategic competition. A full Neil Woods-style test requires district-year competitor stringency series and explicit enforcement-gap dynamics. That's the next step in the research agenda, but the current analysis provides a necessary first step by establishing the presence of border-specific enforcement gaps and their heterogeneous response to disclosure reform.