Files
texas-district-analysis/analysis/gemini_draft_appendix.md
2026-01-30 10:57:55 -08:00

6.9 KiB

Appendix: Heterogeneous Enforcement of Transparency

Evidence from the Texas Railroad Commission

Appendix A: Data Definitions and Summary Statistics Appendix B: Event Study and Parallel Trends Appendix C: Robustness Checks Appendix D: Spatial Analysis Details


Appendix A: Data Definitions and Summary Statistics

To account for potential confounders driving enforcement heterogeneity, we constructed district-level aggregates of demographic and geographic variables. Table A1 summarizes the definitions and data sources for key variables used in the moderator analysis (H3).

Table A1: Variable Definitions

Variable Definition Source
Days to Enforcement Number of days between violation_disc_date and last_enf_action_date. RRC Violation Data
Compliance Rate Percentage of inspections marked "Compliant". RRC Inspection Data
EJI Score Environmental Justice Index. A composite social vulnerability score calculated as the mean of percentile-ranked tract-level indicators: minority share, poverty rate, unemployment, linguistic isolation, and education level. Aggregated to district level by averaging scores of tracts containing active wells. ACS 2021 (5-yr)
High Capacity Binary indicator = 1 if district total inspections > median. RRC Inspection Data
Rurality (RUCA) Average Rural-Urban Commuting Area code (1=Metro, 10=Rural) for wells in the district. USDA / ACS 2020
Basin The geologic oil/gas basin containing the majority of the district's wells (e.g., Permian, Eagle Ford). RRC Well Geography

Table A2: Baseline District Characteristics (Pre-Policy 2015-2018)

District Total Inspections Compliance Rate (%) Avg Days to Enforcement EJI Score Primary Basin
01 29,612 85.0% 242.8 0.497 Permian
02 15,348 83.9% 234.4 0.451 Permian
03 32,975 94.1% 61.9 0.492 Permian
04 32,081 92.7% 62.8 0.592 Permian
05 16,329 92.1% 275.7 0.461 Barnett
06 37,386 89.0% 475.0 0.493 Barnett
08 60,999 88.2% 135.3 0.496 Permian
09 62,196 82.3% 238.5 0.376 Fort Worth
10 39,620 88.9% 49.4 0.457 Anadarko
6E 13,326 78.4% 301.2 0.534 Barnett
7B 35,929 82.7% 48.1 0.421 Fort Worth
7C 40,631 85.2% 63.0 0.537 Permian
8A 40,261 90.5% 77.5 0.516 Permian

To validate the parallel trends assumption underlying our Difference-in-Differences (DiD) strategy, we estimated an event study specification where the treatment effect is allowed to vary by year. Table B1 reports the coefficients relative to the baseline year 2018 (one year prior to implementation).

Table B1: Event Study Estimates (Dependent Variable: Log Days to Enforcement)

Year Year Relative to Policy Coefficient Std. Error P-Value
2015 -4 -0.457 0.247 0.064
2016 -3 -0.334 0.238 0.160
2017 -2 -0.040 0.119 0.741
2018 -1 (Ref) 0.000 - -
2019 0 -0.114 0.107 0.284
2020 1 -0.167 0.191 0.384
2021 2 -0.417 0.258 0.107
2022 3 -0.582* 0.273 0.033
2023 4 -0.487 0.309 0.115
2024 5 -0.776* 0.281 0.006
2025 6 -1.457* 0.255 <0.001

Note: Standard errors clustered at the district level. Coefficients for 2015-2017 are statistically indistinguishable from zero, supporting the parallel trends assumption. The treatment effect grows in magnitude over time, suggesting a gradual adaptation to the transparency regime.


Appendix C: Robustness Checks

We performed a series of robustness checks to ensure our main findings were not driven by spurious trends, outliers, or specific sample selections.

C1. Placebo Tests

We re-estimated the model using "fake" policy implementation dates. A significant finding at a fake date (especially pre-2019) would suggest pre-existing trends driving the results.

  • 2017 Placebo (Pre-treatment): Coefficient = -0.056 (p=0.725). Result: PASS. No significant effect was found two years prior to the actual policy.
  • 2021 Placebo (Post-treatment): Coefficient = -0.566 (p<0.01). Result: EXPECTED. Because the actual policy (2019) had a growing effect over time, a cutoff in 2021 captures the delayed intensification of the 2019 shock.

C2. Sample Restrictions

To rule out the influence of outliers or external shocks (e.g., COVID-19), we re-estimated the main DiD model on restricted subsamples.

Table C1: Sensitivity to Sample Restrictions

Restriction Coefficient P-Value Implied Effect (%)
Baseline (Full Sample) -0.369 0.019 -30.8%
Exclude Extreme Outliers (Top/Bottom 2 districts) -0.420 <0.001 -34.3%
Exclude Early Years (Drop 2015-2016) -0.558 0.005 -42.8%
Exclude Pandemic Years (Drop 2020-2021) -0.482 0.003 -38.3%

Conclusion: The finding of faster enforcement is robust to the exclusion of outlier districts and the COVID-19 period.

C3. Alternative Specifications

We tested sensitivity to functional form and control structures.

  • Linear Model (No Log): Coefficient = -62.0 days (p=0.023). Confirms the direction of the effect without log-transformation.
  • Winsorized Outcome (5%): Coefficient = -0.313 (p=0.036). Confirms results are not driven by extreme enforcement delay values.
  • District-Specific Time Trends: Inclusion of district-specific linear time trends flips the sign (Coef = +0.392). This is common in short panels where the trend term absorbs the dynamic treatment effect shown in the event study. Given the clear break in 2019 seen in the event study, the trend specification likely over-controls for the policy response itself.

Appendix D: Spatial Analysis Details

To test Hypothesis 4 (Spatial Spillovers), we calculated the spatial autocorrelation of the district-level treatment effects.

Global Moran's I Statistic: -0.549 P-value: < 0.05

The negative Moran's I indicates negative spatial autocorrelation. In the context of regulatory enforcement, this means high-performing districts (large reductions in enforcement delay) are frequently adjacent to low-performing districts. This "checkerboard" pattern contradicts the hypothesis of positive regional spillovers or knowledge diffusion. Instead, it suggests that enforcement culture is highly localized to the specific district office and does not diffuse across administrative boundaries.

Figure D1: Spatial Spillover Scatterplot (Note: See Figure 6 in main text for the map) The regression of a district's own treatment effect against the average effect of its neighbors yields a negative slope, confirming that proximity to a high-improving district does not predict improvement in the focal district.