diff --git a/analysis/draft.docx b/analysis/draft.docx new file mode 100644 index 0000000..28515ad Binary files /dev/null and b/analysis/draft.docx differ diff --git a/analysis/draft.md b/analysis/draft.md index 52e2719..00241aa 100644 --- a/analysis/draft.md +++ b/analysis/draft.md @@ -4,7 +4,9 @@ How does transparency alter regulatory enforcement in high-capacity but locally discretionary bureaucracies? We study the January 2019 Texas Railroad Commission (RRC) disclosure change that made well-level violation information publicly searchable. The policy constitutes a statewide transparency shock, but implementation and enforcement remain district-administered. This setting allows us to test both system-wide effects and district-level heterogeneity in policy response. -Our core empirical finding is a two-part pattern. First, we find evidence of **gradual post-policy acceleration** in enforcement timing at the statewide level (significant post-policy trend improvement) rather than a sharp immediate level break in 2019. Second, district-level responses are strongly heterogeneous, and offshore-jurisdiction districts (02/03/04) exhibit systematically different post-policy dynamics once district-specific post effects are modeled. +While targeted transparency is increasingly utilized as a regulatory tool to improve accountability, its actual impact is mediated by the bureaucratic discretion of local field offices. Because policy implementation often experiences a lag, we utilize an Interrupted Time Series design to capture gradual enforcement acceleration, while explicitly modeling the structural, spatial, and demographic factors that drive street-level bureaucratic heterogeneity. + +Our core empirical finding is a two-part pattern. First, we find evidence of gradual post-policy acceleration in enforcement timing at the statewide level (significant post-policy trend improvement) rather than a sharp immediate level break in 2019. Second, district-level responses are strongly heterogeneous, and offshore-jurisdiction districts (02/03/04) exhibit systematically different post-policy dynamics once district-specific post effects are modeled. ## Theory and Hypotheses @@ -22,42 +24,49 @@ We test: ## Data and Measures -We construct a district-year panel (2015-2025, 13 RRC districts) from administrative inspection and violation records. Well-level integration uses `api_norm` as the normalized identifier across sources. +We construct a district-year panel (2015-2025, 13 RRC districts) from administrative inspection and violation records. Well-level records are linked across sources prior to district-year aggregation. Primary outcomes: -- `log_days_to_enf`: log mean days from violation discovery to enforcement action. -- `resolution_rate`: percent compliant on re-inspection. -- `compliance_rate`: percent compliant at inspection. -- `violations_per_inspection`. +- Enforcement delay: the logged district-year mean number of days from violation discovery to enforcement action. +- Resolution on re-inspection: the district-year share of violations marked compliant at re-inspection. +- Inspection compliance rate: the district-year share of inspections marked compliant. +- Violations per inspection: total violations divided by total inspections in each district-year. ## Empirical Strategy -We estimate policy effects in three layers. +To evaluate the January 2019 transparency reform, we pair an all-district interrupted panel design with district-specific heterogeneity models and a spatial dependence diagnostic. This sequence matches the hypotheses: H1 tests system-wide timing change, H2 tests district divergence, H5 tests offshore moderation, and H4 tests whether estimated district effects are spatially clustered. Administrative records are extracted from a PostgreSQL backend, linked across inspection and violation files at the well level, and aggregated to district-year panels in Python (`pandas`, `numpy`). Estimation is conducted with `statsmodels` (with `scipy` for auxiliary tests); figures are produced with `matplotlib`/`seaborn` and district map joins use `geopandas`. The H4 spatial test uses a permutation-based global Moran's I computed from district contiguity weights. ### Model 1: All-district policy-year shift (H1) -\[ -Y_{dt}=\alpha_d + \beta_1 \text{YearNum}_t + \beta_2 \text{Post2019}_t + \beta_3 \text{PostTrend}_t + \varepsilon_{dt} -\] +$$ +Y_{dt} = \alpha_d + \beta_1 \mathrm{YearNum}_t + \beta_2 \mathrm{Post2019}_t + \beta_3 \mathrm{PostTrend}_t + \varepsilon_{dt} +$$ -Where \(\text{PostTrend}_t = \max(0, t-2018)\). This distinguishes an immediate post-2019 level shift (\(\beta_2\)) from post-policy slope change (\(\beta_3\)). +Where $(PostTrend_t = \max(0, t-2018))$. This distinguishes an immediate post-2019 level shift $(\beta_2)$ from a post-policy slope change $(\beta_3)$. +This follows interrupted time-series logic for a common policy shock, separating immediate and gradual responses (Biglan, Ary, & Wagenaar, 2000; Bernal, Cummins, & Gasparrini, 2017; Linden, 2015). ### Model 2: District heterogeneity (H2) -\[ -Y_{dt}=\alpha_d + \gamma_t + \sum_d \theta_d (\text{District}_d\times \text{Post2019}_t) + \varepsilon_{dt} -\] +$$ +Y_{dt} = \alpha_d + \gamma_t + \sum_{d} \theta_d \bigl(\mathrm{District}_d \times \mathrm{Post2019}_t\bigr) + \varepsilon_{dt} +$$ This yields district-specific post-policy effects and a joint heterogeneity test. +Because all districts are exposed in the same year, this is not a staggered-adoption DiD problem. Still, recent DiD work highlights that pooled average effects can mask meaningful treatment-effect heterogeneity, so we estimate district-specific post effects directly rather than rely on a single pooled interaction (de Chaisemartin & D'Haultfœuille, 2020; Goodman-Bacon, 2021; Sun & Abraham, 2021). ### Model 3: Offshore moderation (H5) -\[ -Y_{dt}=\alpha_d + \gamma_t + \sum_d \theta_d (\text{District}_d\times \text{Post2019}_t) + \phi(\text{Post2019}_t\times \text{Offshore}_d) + \varepsilon_{dt} -\] +$$ +Y_{dt} = \alpha_d + \gamma_t + \sum_{d} \theta_d \bigl(\mathrm{District}_d \times \mathrm{Post2019}_t\bigr) + \phi\bigl(\mathrm{Post2019}_t \times \mathrm{Offshore}_d\bigr) + \varepsilon_{dt} +$$ Where `Offshore_d = 1` for districts 02/03/04. +This specification tests whether offshore-regulating districts differ systematically from other districts after controlling for district-specific post-policy shifts. + +### Spatial diagnostic (H4) + +After estimating district treatment effects, we test for global spatial autocorrelation using permutation-based Moran's I (Anselin, 1995). This assesses whether high- and low-response districts are geographically clustered in ways consistent with diffusion or regional administrative spillovers. All models use district-clustered standard errors. @@ -76,18 +85,18 @@ Figure 1 visualizes these system-level changes across the regulatory pipeline. T **Model 1 (timing outcome):** -- `post_2019` level shift: **0.1514**, p=0.3294. -- `post_trend` slope shift: **-0.3603**, p=0.0010. +- Immediate post-2019 level shift: **0.1514**, p=0.3294. +- Post-2019 slope shift: **-0.3603**, p=0.0010. Interpretation: no statistically significant immediate level break in 2019, but a significant post-policy acceleration trend in enforcement timing. **Table 1. Core policy-year and moderator estimates** -| Model | Parameter | Coefficient | P-value | Interpretation | +| Model | Effect term | Coefficient | P-value | Interpretation | | :--- | :--- | ---: | ---: | :--- | -| Model 1 (All districts, interrupted panel) | `post_2019` | 0.1514 | 0.3294 | No immediate level break | -| Model 1 (All districts, interrupted panel) | `post_trend` | -0.3603 | 0.0010 | Significant post-policy acceleration trend | -| Model 3 (District heterogeneity + offshore) | `post_2019:offshore_jurisdiction` | 0.3819 | <0.001 | Offshore districts relatively slower post-policy timing | +| Model 1 (All districts, interrupted panel) | Immediate post-2019 level shift | 0.1514 | 0.3294 | No immediate level break | +| Model 1 (All districts, interrupted panel) | Post-2019 annual trend shift | -0.3603 | 0.0010 | Significant post-policy acceleration trend | +| Model 3 (District heterogeneity + offshore) | Offshore-by-post-policy differential | 0.3819 | <0.001 | Offshore districts relatively slower post-policy timing | Table 1 provides the baseline inferential results for the article’s identification strategy. The table shows that the main all-district effect appears in the post-policy slope term rather than a one-time post-2019 level break, and it also shows that offshore jurisdiction remains a statistically important differential once district heterogeneity is modeled. @@ -136,7 +145,7 @@ The map indicates that large positive and negative effects coexist across region In the conditional heterogeneity model (Model 3): -- `post_2019:offshore_jurisdiction = 0.3819`, p<0.001. +- Offshore-by-post-policy differential = **0.3819**, p<0.001. This indicates that, net of district-specific post effects, offshore-jurisdiction districts experience relatively slower post-policy enforcement timing. @@ -159,14 +168,14 @@ Overall, H3 receives limited support except partial geology effects. **Table 3. Structural moderator tests** -| Hypothesis | Term | Coefficient | P-value | Result | +| Hypothesis | Moderator term | Coefficient | P-value | Result | | :--- | :--- | ---: | ---: | :--- | -| H3a Capacity | `post_2019:high_capacity` | -0.0188 | 0.9415 | Not supported | -| H3b Baseline performance | `post_2019:low_baseline_compliance` | -0.0884 | 0.7144 | Not supported | -| H3c EJ context | `post_2019:high_eji` | 0.1818 | 0.4866 | Not supported | -| H3e Border proximity | `post_2019:border_competition` | -0.3626 | 0.1669 | Not supported | -| H3f Rurality | `post_2019:high_rural` | 0.2213 | 0.4649 | Not supported | -| H3d Geology | `C(primary_basin):post_2019` | Mixed | Mixed | Partial support | +| H3a Capacity | High-capacity district x post-policy | -0.0188 | 0.9415 | Not supported | +| H3b Baseline performance | Low-baseline-compliance district x post-policy | -0.0884 | 0.7144 | Not supported | +| H3c EJ context | High-EJ district x post-policy | 0.1818 | 0.4866 | Not supported | +| H3e Border proximity | Border-proximity district x post-policy | -0.3626 | 0.1669 | Not supported | +| H3f Rurality | High-rurality district x post-policy | 0.2213 | 0.4649 | Not supported | +| H3d Geology | Basin category x post-policy interactions | Mixed | Mixed | Partial support | Table 3 summarizes why structural accounts are only partially successful in this run: most moderators are imprecisely estimated, while geology shows selective basin-specific effects. Figure 5 and Figure 6 then provide visual context for these moderator patterns. @@ -218,7 +227,7 @@ Across variants, the post-policy **slope** result is more stable than the immedi **Table 4. Robustness summary (interrupted panel framework)** -| Check | `post_2019` (p) | `post_trend` (p) | Read | +| Check | Immediate post-policy level effect (p) | Post-policy trend effect (p) | Read | | :--- | :--- | :--- | :--- | | Full sample | 0.1514 (0.3294) | -0.3603 (0.0010) | Slope effect robust; level break weak | | Exclude extreme districts | 0.1917 (0.1930) | -0.2972 (0.0133) | Slope remains significant | @@ -234,3 +243,21 @@ Table 4 consolidates robustness evidence in one place: level-shift estimates are The transparency reform is associated with a gradual statewide acceleration in enforcement timing rather than a single immediate break at implementation. At the same time, district responses diverge sharply, confirming bureaucratic heterogeneity. Offshore jurisdiction explains a meaningful share of that heterogeneity once district-specific post effects are included, while most other structural moderators are weak or inconsistent in this run. Spatial diffusion across neighboring districts is not supported by global autocorrelation tests. These findings suggest that transparency reforms in decentralized regulatory systems should be evaluated as dynamic, district-conditioned processes, not monolithic statewide shocks. + +### References + +Anselin, L. (1995). Local Indicators of Spatial Association—LISA. *Geographical Analysis*, 27(2), 93-115. + +Biglan, A., Ary, D., & Wagenaar, A. C. (2000). The Value of Interrupted Time-Series Experiments for Community Intervention Research. *Prevention Science*, 1(1), 31-49. + +Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: A tutorial. *International Journal of Epidemiology*, 46(1), 348-355. + +de Chaisemartin, C., & D'Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. *American Economic Review*, 110(9), 2964-96. + +Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*, 225(2), 254-277. + +Linden, A. (2015). Conducting interrupted time-series analysis for single- and multiple-group comparisons. *The Stata Journal*, 15(2), 480-500. + +Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with Python. *Proceedings of the 9th Python in Science Conference*, 57-61. + +Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*, 225(2), 175-199. diff --git a/analysis/draft_appendix.md b/analysis/draft_appendix.md index c9889a2..c1230ec 100644 --- a/analysis/draft_appendix.md +++ b/analysis/draft_appendix.md @@ -9,24 +9,26 @@ The analysis combines inspection and violation administrative records (2015-2025 ### A2. Core variables -| Variable | Definition | +| Measure | Definition | | :--- | :--- | -| `log_days_to_enf` | Log of district-year mean days from violation discovery to enforcement action | -| `resolution_rate` | Share of violations compliant on re-inspection | -| `compliance_rate` | Share of inspections marked compliant | -| `violations_per_inspection` | Total violations divided by inspections | -| `post_2019` | Indicator for years >= 2019 | -| `post_trend` | Piecewise linear trend after policy (`max(year-2018,0)`) | -| `offshore_jurisdiction` | Indicator for districts 02/03/04 | -| `high_capacity` | District above median pre-policy inspection volume | -| `low_baseline_compliance` | District below median pre-policy compliance | -| `high_eji` | District above median EJ score | -| `high_rural` | District above median RUCA | -| `border_competition` | Operationalized border-proximity indicator | -| `primary_basin` | Dominant basin category | +| Enforcement delay (logged) | Log of district-year mean days from violation discovery to enforcement action | +| Resolution on re-inspection | Share of violations compliant on re-inspection | +| Inspection compliance rate | Share of inspections marked compliant | +| Violations per inspection | Total violations divided by inspections | +| Post-policy period indicator | Indicator for years >= 2019 | +| Post-policy trend term | Piecewise linear trend after policy (`max(year-2018,0)`) | +| Offshore jurisdiction indicator | Indicator for districts 02/03/04 | +| High-capacity indicator | District above median pre-policy inspection volume | +| Low-baseline-compliance indicator | District below median pre-policy compliance | +| High-EJ indicator | District above median EJ score | +| High-rurality indicator | District above median RUCA | +| Border-proximity indicator | Operationalized border-proximity indicator | +| Dominant basin category | Dominant basin category | ## Appendix B. Econometric Specifications +The specification sequence follows the main text: a common-shock interrupted panel for H1, district-specific post-policy heterogeneity for H2, an offshore moderator for H5, and a global spatial autocorrelation diagnostic for H4. Because all districts are exposed in the same policy year, heterogeneity is modeled through district-by-post interactions rather than staggered-adoption treatment-timing estimators. + ### B1. Interrupted panel (all districts; H1) \[ @@ -47,14 +49,18 @@ Y_{dt}=\alpha_d + \gamma_t + \sum_d \theta_d (\text{District}_d\times \text{Post All models report district-clustered standard errors. +### B4. Spatial diagnostic (H4) + +H4 is tested using permutation-based global Moran's I on estimated district treatment effects. + ## Appendix C. Main Run Outputs ### C1. H1 (all-district timing outcome) -| Parameter | Coefficient | P-value | +| Effect term | Coefficient | P-value | | :--- | ---: | ---: | -| `post_2019` | 0.1514 | 0.3294 | -| `post_trend` | -0.3603 | 0.0010 | +| Immediate post-2019 level shift | 0.1514 | 0.3294 | +| Post-2019 annual trend shift | -0.3603 | 0.0010 | Interpretation: no significant immediate level shift; significant post-policy acceleration slope. Substantively, this table supports the main-text conclusion that the policy effect is best characterized as gradual acceleration through the enforcement pipeline rather than a single break at policy adoption. @@ -93,9 +99,9 @@ These estimates indicate that offshore jurisdictions diverge from non-offshore d ### C4. H5 offshore moderator (conditional model) -| Parameter | Coefficient | P-value | +| Effect term | Coefficient | P-value | | :--- | ---: | ---: | -| `post_2019:offshore_jurisdiction` | 0.3819 | <0.001 | +| Offshore-by-post-policy differential | 0.3819 | <0.001 | See **Figure 4** in the main text (`district_treatment_effects_map_psj.png`) for the geographic distribution of district treatment effects. Read alongside C3, this pooled interaction should be interpreted as an average offshore differential in the post period after district heterogeneity is already modeled, not as a claim that offshore status is the dominant driver of all district variation. @@ -132,7 +138,7 @@ The sign and magnitude of Moran’s I are both small, indicating no evidence tha ### E1. Placebo policy years (all-district interrupted model) -| Placebo year | Coefficient (`post`) | P-value | +| Placebo year | Estimated level shift | P-value | | :--- | ---: | ---: | | 2017 | 0.6565 | 0.0020 | | 2021 | -0.0245 | 0.9191 | @@ -141,7 +147,7 @@ The significant 2017 placebo estimate suggests that single-cut timing designs ca ### E2. Alternative outcomes (all-district interrupted model) -| Outcome | `post` coef (p) | `post_trend` coef (p) | +| Outcome | Immediate post-policy level effect (p) | Post-policy trend effect (p) | | :--- | :--- | :--- | | Resolution rate | 4.3721 (0.2104) | -2.9371 (0.1424) | | Compliance rate | -0.1311 (0.9316) | -0.5562 (0.1870) | @@ -151,7 +157,7 @@ This table shows that timing acceleration does not mechanically translate into i ### E3. Sample restrictions (all-district interrupted model) -| Restriction | `post_2019` coef (p) | `post_trend` coef (p) | +| Restriction | Immediate post-policy level effect (p) | Post-policy trend effect (p) | | :--- | :--- | :--- | | Full sample | 0.1514 (0.3294) | -0.3603 (0.0010) | | Exclude extreme districts | 0.1917 (0.1930) | -0.2972 (0.0133) | @@ -162,7 +168,7 @@ Across restrictions, the post-trend estimate remains negative and generally sign ### E4. Specification sensitivity -| Specification | `post` effect | `post_trend` effect | +| Specification | Immediate post-policy level effect | Post-policy trend effect | | :--- | :--- | :--- | | Linear interrupted | -41.9298 (p=0.3104) | -67.0420 (p=0.0100) | | Winsorized interrupted | 0.2137 (p=0.1021) | -0.3147 (p=0.0016) | diff --git a/analysis/archive/well_analyzer.py b/analysis/well_analyzer.py similarity index 100% rename from analysis/archive/well_analyzer.py rename to analysis/well_analyzer.py