Texas RRC Inspection Expenses Analysis¶
Research question: Does organizational capacity (budget, staffing) predict better regulatory outputs (inspections, compliance, enforcement), and how is that relationship moderated by goal ambiguity, district-level heterogeneity, and spatial/geographic factors?
Hypotheses¶
- H1 — Capacity → Outputs: Higher OGI budget and FTE predict more inspections, higher compliance rates, and faster violation resolution.
- H2 — Goal Ambiguity: When a larger share of RRC budget goes to the more ambiguous "Energy Resource Development" goal, the capacity → output relationship weakens.
- H3 — Multilevel / District Effects: The capacity → output relationship varies across RRC districts (budget slope heterogeneity).
- H4 — Spatial & Geographic: Offshore-jurisdiction and border districts moderate the capacity → output relationship; spatial autocorrelation in residuals is tested via Moran's I.
Data:
- PostgreSQL warehouse (
texas_data):inspections,violations,well_shape_tract RRC Budget Data.xlsx: statewide RRC budget by strategy, 2016–2024- Analysis panel: 2016–2025 (N = 130 district-years); regression sample: 2016–2023 (N = 104)
import os
import warnings
from pathlib import Path
from urllib.parse import quote_plus
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import statsmodels.formula.api as smf
from dotenv import load_dotenv
from sqlalchemy import create_engine, text
from scipy.spatial.distance import cdist
warnings.filterwarnings("ignore", category=UserWarning)
pd.set_option("display.float_format", "{:,.2f}".format)
load_dotenv(override=False)
host = os.getenv("PGHOST", "localhost")
port = os.getenv("5433", "5433")
user = os.getenv("PGUSER", "postgres")
password = quote_plus(os.getenv("PGPASSWORD", ""))
database = os.getenv("PGDATABASE", "texas_data")
engine = create_engine(
f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}"
)
print(f"Connected → {database} on {host}:{port}")
Connected → texas_data on localhost:5433
1. Data Loading¶
# District-year inspection metrics aggregated in SQL.
# LAG() computes days since the previous inspection for the same well (api_norm).
insp_sql = """
WITH lagged AS (
SELECT
district,
EXTRACT(year FROM inspection_date)::int AS year,
api_norm,
inspection_date,
CASE WHEN UPPER(compliance::text) IN ('YES', 'Y') THEN 1.0 ELSE 0.0 END AS is_compliant,
EXTRACT(EPOCH FROM (
inspection_date
- LAG(inspection_date) OVER (PARTITION BY api_norm ORDER BY inspection_date)
)) / 86400.0 AS days_since_prev
FROM inspections
WHERE inspection_date IS NOT NULL
AND district IS NOT NULL
AND EXTRACT(year FROM inspection_date) BETWEEN 2016 AND 2025
)
SELECT
district,
year,
COUNT(*) AS total_inspections,
COUNT(DISTINCT api_norm) AS unique_wells,
ROUND(AVG(is_compliant)::numeric * 100, 2) AS compliance_rate,
ROUND(AVG(days_since_prev)::numeric, 1) AS avg_days_between_inspections
FROM lagged
GROUP BY district, year
ORDER BY district, year
"""
insp = pd.read_sql(text(insp_sql), engine)
print(f"Inspections panel: {len(insp):,} district-year rows | {insp['district'].nunique()} districts")
insp.head()
Inspections panel: 130 district-year rows | 13 districts
| district | year | total_inspections | unique_wells | compliance_rate | avg_days_between_inspections | |
|---|---|---|---|---|---|---|
| 0 | 01 | 2016 | 13975 | 4055 | 69.42 | 18.90 |
| 1 | 01 | 2017 | 18022 | 6153 | 83.52 | 56.80 |
| 2 | 01 | 2018 | 23826 | 9109 | 85.61 | 53.50 |
| 3 | 01 | 2019 | 19790 | 6447 | 84.97 | 79.80 |
| 4 | 01 | 2020 | 26006 | 8716 | 85.52 | 122.90 |
# District-year violation metrics. Blank last_enf_action strings treated as no action.
viol_sql = """
SELECT
district,
EXTRACT(year FROM violation_disc_date)::int AS year,
COUNT(*) AS total_violations,
COUNT(DISTINCT api_norm) AS unique_wells_with_violations,
SUM(CASE WHEN major_viol_ind = 'Y' THEN 1 ELSE 0 END) AS major_violations,
ROUND(AVG(CASE WHEN compliant_on_reinsp = 'Y' THEN 1.0 ELSE 0.0 END)::numeric * 100, 2)
AS resolution_rate,
ROUND(AVG(CASE WHEN last_enf_action IS NOT NULL AND last_enf_action <> ''
THEN 1.0 ELSE 0.0 END)::numeric * 100, 2) AS enforcement_rate,
ROUND(AVG(
CASE WHEN last_enf_action_date IS NOT NULL
THEN EXTRACT(EPOCH FROM (last_enf_action_date - violation_disc_date)) / 86400.0
END
)::numeric, 1) AS avg_days_to_enforcement
FROM violations
WHERE violation_disc_date IS NOT NULL
AND district IS NOT NULL
AND EXTRACT(year FROM violation_disc_date) BETWEEN 2016 AND 2025
GROUP BY district, year
ORDER BY district, year
"""
viol = pd.read_sql(text(viol_sql), engine)
print(f"Violations panel: {len(viol):,} district-year rows")
viol.head()
Violations panel: 130 district-year rows
| district | year | total_violations | unique_wells_with_violations | major_violations | resolution_rate | enforcement_rate | avg_days_to_enforcement | |
|---|---|---|---|---|---|---|---|---|
| 0 | 01 | 2016 | 5720 | 1009 | 0 | 21.42 | 100.00 | 198.60 |
| 1 | 01 | 2017 | 4380 | 767 | 0 | 44.36 | 100.00 | 269.50 |
| 2 | 01 | 2018 | 5766 | 997 | 0 | 64.46 | 100.00 | 229.00 |
| 3 | 01 | 2019 | 3593 | 902 | 4 | 49.37 | 100.00 | 239.00 |
| 4 | 01 | 2020 | 4838 | 1019 | 5 | 27.43 | 100.00 | 402.90 |
BUDGET_PATH = Path("RRC Budget Data.xlsx")
raw = pd.read_excel(BUDGET_PATH, header=None)
YEARS = [2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]
COLS = slice(1, 10) # spreadsheet columns 1-9 map to years 2016-2024
# ── Section 1: Energy Resource Development (rows 7-18) ──────────────────────
erd = pd.DataFrame({
"year": YEARS,
"strategy": "Energy Resource Development",
"total_budget": raw.iloc[1, COLS].values.astype(float),
"salaries": raw.iloc[7, COLS].values.astype(float),
"other_personnel": raw.iloc[8, COLS].values.astype(float),
"professional_fees": raw.iloc[9, COLS].values.astype(float),
"travel": raw.iloc[13, COLS].values.astype(float),
"other_operating": raw.iloc[16, COLS].values.astype(float),
"capital_exp": raw.iloc[17, COLS].values.astype(float),
"fte": raw.iloc[18, COLS].values.astype(float),
})
# ── Section 2: Oil/Gas Monitoring & Inspections (rows 20-31) ────────────────
ogi = pd.DataFrame({
"year": YEARS,
"strategy": "Oil/Gas Monitoring & Inspections",
"total_budget": raw.iloc[2, COLS].values.astype(float),
"salaries": raw.iloc[20, COLS].values.astype(float),
"other_personnel": raw.iloc[21, COLS].values.astype(float),
"professional_fees": raw.iloc[22, COLS].values.astype(float),
"travel": raw.iloc[26, COLS].values.astype(float),
"other_operating": raw.iloc[29, COLS].values.astype(float),
"capital_exp": raw.iloc[30, COLS].values.astype(float),
"fte": raw.iloc[31, COLS].values.astype(float),
})
budget_long = pd.concat([erd, ogi], ignore_index=True)
print(f"Budget long: {len(budget_long)} rows (2 strategies × {len(YEARS)} years)")
budget_long
Budget long: 18 rows (2 strategies × 9 years)
| year | strategy | total_budget | salaries | other_personnel | professional_fees | travel | other_operating | capital_exp | fte | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2016 | Energy Resource Development | 11,708,475.00 | 7,669,719.00 | 398,589.00 | 3,366,389.00 | 16,477.00 | 210,293.00 | 0.00 | 130.60 |
| 1 | 2017 | Energy Resource Development | 10,911,094.00 | 7,273,775.00 | 389,348.00 | 3,118,066.00 | 6,792.00 | 77,855.00 | 0.00 | 120.30 |
| 2 | 2018 | Energy Resource Development | 9,846,886.00 | 7,292,933.00 | 282,337.00 | 977,645.00 | 28,694.00 | 1,045,727.00 | 0.00 | 131.00 |
| 3 | 2019 | Energy Resource Development | 11,123,757.00 | 8,068,497.00 | 217,988.00 | 1,493,755.00 | 73,651.00 | 988,740.00 | 13,232.00 | 137.40 |
| 4 | 2020 | Energy Resource Development | 17,280,569.00 | 9,707,894.00 | 236,356.00 | 5,989,236.00 | 41,752.00 | 1,165,481.00 | 54,037.00 | 153.40 |
| 5 | 2021 | Energy Resource Development | 16,237,704.00 | 10,887,561.00 | 237,777.00 | 3,562,816.00 | 5,614.00 | 1,446,301.00 | 10,140.00 | 168.10 |
| 6 | 2022 | Energy Resource Development | 25,583,205.00 | 11,166,309.00 | 246,340.00 | 12,560,550.00 | 37,731.00 | 1,246,443.00 | 19,985.00 | 157.10 |
| 7 | 2023 | Energy Resource Development | 26,903,564.00 | 11,056,060.00 | 252,933.00 | 12,846,821.00 | 56,650.00 | 2,287,481.00 | 48,344.00 | 151.30 |
| 8 | 2024 | Energy Resource Development | 35,533,565.00 | 13,183,578.00 | 229,161.00 | 15,140,585.00 | 144,641.00 | 6,425,653.00 | 0.00 | 186.00 |
| 9 | 2016 | Oil/Gas Monitoring & Inspections | 18,471,666.00 | 15,080,122.00 | 685,768.00 | 1,546,321.00 | 22,630.00 | 208,311.00 | 121,363.00 | 256.70 |
| 10 | 2017 | Oil/Gas Monitoring & Inspections | 17,204,058.00 | 15,086,262.00 | 686,194.00 | 176,786.00 | 19,654.00 | 230,525.00 | 272,461.00 | 249.50 |
| 11 | 2018 | Oil/Gas Monitoring & Inspections | 17,562,431.00 | 13,083,406.00 | 430,429.00 | 1,147,080.00 | 57,312.00 | 1,040,639.00 | 649,172.00 | 229.90 |
| 12 | 2019 | Oil/Gas Monitoring & Inspections | 21,951,747.00 | 14,878,875.00 | 340,135.00 | 2,895,436.00 | 187,048.00 | 1,185,772.00 | 1,255,930.00 | 255.60 |
| 13 | 2020 | Oil/Gas Monitoring & Inspections | 26,057,560.00 | 17,228,302.00 | 417,683.00 | 4,822,351.00 | 106,428.00 | 1,398,705.00 | 896,846.00 | 284.00 |
| 14 | 2021 | Oil/Gas Monitoring & Inspections | 28,756,689.00 | 17,155,864.00 | 426,139.00 | 8,212,873.00 | 34,762.00 | 1,394,783.00 | 230,439.00 | 277.80 |
| 15 | 2022 | Oil/Gas Monitoring & Inspections | 25,914,265.00 | 17,834,460.00 | 391,138.00 | 4,007,178.00 | 154,334.00 | 1,255,945.00 | 694,706.00 | 264.00 |
| 16 | 2023 | Oil/Gas Monitoring & Inspections | 34,330,858.00 | 18,622,389.00 | 457,753.00 | 8,945,350.00 | 149,418.00 | 2,428,330.00 | 2,234,623.00 | 271.20 |
| 17 | 2024 | Oil/Gas Monitoring & Inspections | 38,506,556.00 | 20,834,721.00 | 361,687.00 | 8,851,915.00 | 316,806.00 | 4,112,998.00 | 2,659,208.00 | 280.80 |
# ── Wide budget: one row per year with ogi_ / erd_ prefixed columns ──────────
ogi_wide = ogi.drop(columns="strategy").add_prefix("ogi_")
erd_wide = erd.drop(columns="strategy").add_prefix("erd_")
budget_wide = (
ogi_wide
.merge(erd_wide, left_on="ogi_year", right_on="erd_year")
.rename(columns={"ogi_year": "year"})
.drop(columns="erd_year")
)
# ── Merge inspections + violations, then join statewide budget on year ────────
panel = (
insp
.merge(viol, on=["district", "year"], how="left")
.merge(budget_wide, on="year", how="left")
)
# ── Derived columns ───────────────────────────────────────────────────────────
panel["violations_per_inspection"] = panel["total_violations"] / panel["total_inspections"]
panel["ogi_budget_m"] = panel["ogi_total_budget"] / 1_000_000 # dollars → millions
panel["erd_budget_m"] = panel["erd_total_budget"] / 1_000_000
panel["post_2019"] = (panel["year"] >= 2019).astype(int)
# 2024 = budget estimate; 2025 = no budget data — exclude both from regressions
panel["is_budget_year"] = (panel["year"] >= 2024).astype(int)
# Goal ambiguity: share of combined budget going to the inspection mission.
# Higher share = clearer mission focus; lower share = more goal ambiguity.
panel["inspection_budget_share"] = (
panel["ogi_total_budget"] / (panel["ogi_total_budget"] + panel["erd_total_budget"])
)
# Fill violation NaNs for districts with zero violations in a given year
fill_cols = [
"total_violations", "unique_wells_with_violations", "major_violations",
"resolution_rate", "enforcement_rate", "avg_days_to_enforcement",
"violations_per_inspection",
]
panel[fill_cols] = panel[fill_cols].fillna(0)
print(f"Analysis panel: {len(panel):,} rows | "
f"{panel['district'].nunique()} districts | "
f"{panel['year'].nunique()} years")
panel.head()
Analysis panel: 130 rows | 13 districts | 10 years
| district | year | total_inspections | unique_wells | compliance_rate | avg_days_between_inspections | total_violations | unique_wells_with_violations | major_violations | resolution_rate | ... | erd_travel | erd_other_operating | erd_capital_exp | erd_fte | violations_per_inspection | ogi_budget_m | erd_budget_m | post_2019 | is_budget_year | inspection_budget_share | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 01 | 2016 | 13975 | 4055 | 69.42 | 18.90 | 5720 | 1009 | 0 | 21.42 | ... | 16,477.00 | 210,293.00 | 0.00 | 130.60 | 0.41 | 18.47 | 11.71 | 0 | 0 | 0.61 |
| 1 | 01 | 2017 | 18022 | 6153 | 83.52 | 56.80 | 4380 | 767 | 0 | 44.36 | ... | 6,792.00 | 77,855.00 | 0.00 | 120.30 | 0.24 | 17.20 | 10.91 | 0 | 0 | 0.61 |
| 2 | 01 | 2018 | 23826 | 9109 | 85.61 | 53.50 | 5766 | 997 | 0 | 64.46 | ... | 28,694.00 | 1,045,727.00 | 0.00 | 131.00 | 0.24 | 17.56 | 9.85 | 0 | 0 | 0.64 |
| 3 | 01 | 2019 | 19790 | 6447 | 84.97 | 79.80 | 3593 | 902 | 4 | 49.37 | ... | 73,651.00 | 988,740.00 | 13,232.00 | 137.40 | 0.18 | 21.95 | 11.12 | 1 | 0 | 0.66 |
| 4 | 01 | 2020 | 26006 | 8716 | 85.52 | 122.90 | 4838 | 1019 | 5 | 27.43 | ... | 41,752.00 | 1,165,481.00 | 54,037.00 | 153.40 | 0.19 | 26.06 | 17.28 | 1 | 0 | 0.60 |
5 rows × 34 columns
2. Exploratory Overview¶
# Year-level means across districts
yearly = panel.groupby("year").agg(
ogi_budget_m = ("ogi_budget_m", "first"),
ogi_fte = ("ogi_fte", "first"),
total_inspections = ("total_inspections", "mean"),
compliance_rate = ("compliance_rate", "mean"),
total_violations = ("total_violations", "mean"),
resolution_rate = ("resolution_rate", "mean"),
avg_days_to_enf = ("avg_days_to_enforcement","mean"),
).round(2)
print(yearly.to_string())
fig, axes = plt.subplots(2, 3, figsize=(16, 8))
axes = axes.flatten()
yearly["ogi_budget_m"].plot(ax=axes[0], marker="o", title="OGI Budget ($M)")
axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f"${x:.0f}M"))
yearly["ogi_fte"].plot(ax=axes[1], marker="o", title="OGI FTE Positions")
yearly["total_inspections"].plot(ax=axes[2], marker="o", title="Avg Inspections / District")
yearly["compliance_rate"].plot(ax=axes[3], marker="o", title="Avg Compliance Rate (%)")
yearly["resolution_rate"].plot(ax=axes[4], marker="o", title="Avg Resolution Rate (%)")
yearly["avg_days_to_enf"].plot(ax=axes[5], marker="o", title="Avg Days to Enforcement")
for ax in axes:
ax.axvline(2019, color="red", linestyle="--", alpha=0.5, label="2019 policy")
ax.set_xlabel("Year")
plt.tight_layout()
plt.show()
ogi_budget_m ogi_fte total_inspections compliance_rate total_violations resolution_rate avg_days_to_enf year 2016 18.47 256.70 18,277.85 83.11 3,398.15 36.78 131.86 2017 17.20 249.50 20,138.54 86.52 2,915.69 59.02 185.01 2018 17.56 229.90 25,703.54 90.17 3,197.62 59.46 207.25 2019 21.95 255.60 25,058.46 89.85 2,550.77 61.44 170.36 2020 26.06 284.00 27,669.46 89.57 2,750.92 56.81 154.66 2021 28.76 277.80 24,115.54 88.76 2,556.38 66.18 118.82 2022 25.91 264.00 32,023.54 89.82 2,819.92 67.85 91.50 2023 34.33 271.20 33,805.69 91.62 2,598.62 69.65 105.15 2024 38.51 280.80 36,552.77 92.58 2,221.15 65.13 76.93 2025 NaN NaN 34,082.08 90.52 2,530.38 52.06 36.62
corr_cols = [
"ogi_budget_m", "ogi_fte", "inspection_budget_share",
"total_inspections", "compliance_rate",
"total_violations", "resolution_rate", "avg_days_to_enforcement",
]
corr = panel[corr_cols].corr().round(2)
fig, ax = plt.subplots(figsize=(9, 7))
im = ax.imshow(corr, cmap="RdBu_r", vmin=-1, vmax=1)
ax.set_xticks(range(len(corr_cols)))
ax.set_yticks(range(len(corr_cols)))
ax.set_xticklabels(corr_cols, rotation=45, ha="right", fontsize=9)
ax.set_yticklabels(corr_cols, fontsize=9)
for i in range(len(corr_cols)):
for j in range(len(corr_cols)):
ax.text(j, i, corr.iloc[i, j], ha="center", va="center", fontsize=8)
plt.colorbar(im, ax=ax)
ax.set_title("Correlation Matrix — Key Variables")
plt.tight_layout()
plt.show()
Data and Methods¶
Data Sources¶
This study draws on two primary data sources. The first is the Texas Railroad Commission
(RRC) Oil and Gas Division administrative database. Inspection records span fiscal years 2016–2025 and encompass approximately
1.9 million inspection events distributed across 13 RRC administrative districts;
violation records include approximately 193,000 enforcement actions. From the inspections
table, district-year aggregates are constructed for three regulatory output measures:
(1) compliance rate — the share of annual inspections in a district that did not result
in a compliance failure; (2) total inspections — the count of field inspection events;
and (3) average days between successive inspections of the same well, computed via a
SQL window function (LAG) over ordered inspection timestamps. From the violations table,
district-year aggregates include the violation resolution rate (share of violations
for which the operator was found compliant on re-inspection), enforcement rate, and average
days from violation discovery to enforcement action.
The second source is RRC budget data drawn from Legislative Appropriations Requests, covering fiscal years 2016–2024. Budget appropriations are reported at the statewide level disaggregated by goal and strategy. Two strategies are central to this analysis: (1) Oil and Gas Monitoring and Inspections (OGI), which directly funds field inspection operations; and (2) Energy Resource Development (ERD), encompassing the broader mandate to promote oil and gas resource opportunities. For each strategy, the data include total appropriations, salaries, professional fees, travel, other operating expenditures, capital outlays, and authorized full-time equivalent (FTE) positions. Fiscal year 2024 represents a budget estimate rather than expenditure actuals and is excluded from all regression models.
Sample and Panel Construction¶
The unit of analysis is the district-year. The analytic panel contains N = 130 observations (13 districts × 10 years, 2016–2025), of which 104 observations (2016–2023) constitute the regression sample. Fiscal years 2024 (budget estimate) and 2025 (no budget data available) are retained in descriptive analyses but excluded from all regression models. Because inspection and enforcement activity in 2025 represents a partial year as of the data extract, enforcement-timing metrics for that year are subject to right-censoring: violations discovered in late 2024 and 2025 may not yet have received a recorded enforcement action, compressing observed days-to-enforcement.. Because RRC budget appropriations are reported at the statewide level, budget and FTE variables enter the panel as year-varying but district-invariant covariates. Identification of budget effects therefore relies on year-to-year variation in statewide appropriations rather than cross-district budget contrasts.
Measures¶
Dependent variables. Three measures capture distinct dimensions of regulatory output: total inspections (inspection volume), compliance rate (%), and violation resolution rate (%). Compliance rate and resolution rate capture quality of enforcement rather than quantity and represent different points in the regulatory pipeline: compliance is measured at the point of inspection while resolution is measured after a violation has been discovered and acted upon.
Organizational capacity. The primary capacity measure is OGI total appropriations in millions of dollars ($\text{Budget}_t$), reflecting the statewide resource envelope available for inspection activities in year $t$. An auxiliary measure — OGI authorized FTE positions — is included in descriptive analyses.
Goal ambiguity. Following Chun and Rainey (2005), goal ambiguity is operationalized via the relative concentration of resources across missions. The inspection budget share ($\text{Share}_t$) captures the fraction of combined OGI and ERD appropriations directed toward the inspection mandate:
$$\text{Share}_t = \frac{\text{OGI Budget}_t}{\text{OGI Budget}_t + \text{ERD Budget}_t}$$
Higher values indicate greater mission clarity (resources more concentrated on inspections); lower values indicate greater goal ambiguity (resources spread across competing mandates). Over the study period $\text{Share}_t$ ranged from 0.59 (2022) to 0.67 (2018), reflecting meaningful year-to-year variation in budgetary prioritization.
Geographic moderators. Two binary district-level indicators capture geographic context: $\text{Offshore}_d = 1$ for districts 02, 03, and 04, which hold dual onshore and offshore oversight jurisdiction, and $\text{Border}_d = 1$ for districts 01–04, which are proximate to the Texas Gulf Coast and the US–Mexico border corridor.
Estimation Strategy¶
All models are estimated via ordinary least squares (OLS) with standard errors clustered at the district level ($G = 13$) to account for within-district serial correlation. District fixed effects absorb time-invariant heterogeneity across offices — including differences in geographic complexity, historical enforcement culture, and staffing composition — and ensure that budget effects are identified from within-district, year-to-year variation.
H1 — Baseline capacity model:
$$Y_{dt} = \alpha + \beta_1 \, \text{Budget}_t + \sum_{d} \gamma_d \, \mathbf{1}[\text{district} = d] + \varepsilon_{dt}$$
where $Y_{dt}$ is the regulatory output for district $d$ in year $t$, $\gamma_d$ are district fixed effects, and $\varepsilon_{dt}$ is the idiosyncratic error.
H2 — Goal ambiguity moderation:
$$Y_{dt} = \alpha + \beta_1 \, \text{Budget}_t + \beta_2 \, \text{Share}_t + \beta_3 \left( \text{Budget}_t \times \text{Share}_t \right) + \sum_{d} \gamma_d + \varepsilon_{dt}$$
The coefficient $\beta_3$ tests whether goal clarity conditions the capacity–output relationship. A positive $\hat{\beta}_3$ would indicate that clearer mission focus amplifies budget effects; a negative value would suggest diminishing returns or cross-strategy resource substitution.
H3 — District slope heterogeneity:
$$Y_{dt} = \alpha + \beta_1 \, \text{Budget}_t + \sum_{d=2}^{D} \delta_d \left( \text{Budget}_t \times \mathbf{1}[d] \right) + \sum_{d} \gamma_d + \varepsilon_{dt}$$
District-specific budget slopes are recovered as $\hat{\beta}_1 + \hat{\delta}_d$. Because budget varies only along the time dimension and district fixed effects are included, interaction term standard errors are inflated by near-perfect multicollinearity; these estimates are treated as descriptive indicators of heterogeneity only.
H4 — Geographic moderation and spatial autocorrelation:
$$Y_{dt} = \alpha + \beta_1 \, \text{Budget}_t + \beta_2 \, \text{Offshore}_d + \beta_3 \, \text{Border}_d + \beta_4 \left( \text{Budget}_t \times \text{Offshore}_d \right) + \beta_5 \left( \text{Budget}_t \times \text{Border}_d \right) + \sum_{d} \gamma_d + \varepsilon_{dt}$$
Robustness checks. Two supplementary tests address limitations of the baseline models. First, wild cluster bootstrap inference (Rademacher weights, $B = 999$ draws; Cameron, Gelbach & Miller 2008) is used to re-test H1 coefficients, providing valid p-values with the small number of clusters ($G = 13$). Second, a distributed lag specification replaces the contemporaneous budget measure with its one-year lag ($\text{Budget}_{t-1}$), and also estimates a model including both, to test whether budget effects operate with a delay consistent with a hiring-and-deployment mechanism. The distributed lag regression sample covers 2017–2023 ($N = 91$).
Spatial autocorrelation in H1 model residuals is assessed via Moran's $I$ computed on a row-normalized inverse-distance spatial weights matrix constructed from district centroids derived by averaging well-level geographic coordinates within each district.
Analysis¶
This study employs a fixed-effects panel regression framework to examine whether year-to-year changes in RRC organizational capacity — as measured by statewide budget appropriations — translate into improvements in regulatory outputs across Texas oil and gas inspection districts. The analytic panel spans 13 RRC districts over ten fiscal years (2016–2025), yielding 130 district-year observations. Regression analyses are restricted to 2016–2023 (N = 104), excluding FY2024 (budget estimate only) and FY2025 (no budget data available). The identification strategy leverages within-district variation in outcomes as a function of year-to-year shifts in statewide OGI appropriations, net of persistent inter-district differences absorbed by district fixed effects.
The choice of a district-year panel rather than a well-level panel is motivated by the structure of the budget data, which is available only at the statewide level. Because the key independent variable — OGI appropriations — varies along the time dimension only, it functions as a common, year-specific exposure applied uniformly to all districts. District fixed effects then absorb unobservable office-level characteristics that remain stable over the study period, such as geographic complexity, historical enforcement intensity, and local administrative capacity. Causal identification is thus predicated on the assumption that, absent changes in budget, within-district outcome trajectories would have followed parallel trends across years — an assumption that cannot be directly tested but is partially supported by the pre-period stability visible in the descriptive trends.
H1 tests the core capacity hypothesis using the baseline specification. Each of the three dependent variables — total inspections, compliance rate, and violation resolution rate — is regressed separately on OGI budget (in millions of dollars) and district fixed effects. Cluster-robust standard errors are used throughout given the modest number of clusters ($G = 13$).
H2 extends the baseline by interacting OGI budget with the inspection budget share, operationalizing goal ambiguity as the degree to which RRC appropriations are concentrated on the inspection mandate versus the broader energy development mission. The sign and significance of the interaction term $\beta_3$ determines whether goal clarity amplifies or attenuates the capacity–output relationship.
H3 tests for heterogeneity in budget–outcome slopes across districts by including budget $\times$ district interaction terms. Given only eight years of data per district, the saturated interaction model is estimated with approximately zero residual degrees of freedom for the fixed-effects component; as a result, interaction-term standard errors are unreliable and these estimates are reported as exploratory indicators of cross-district variation rather than inferential tests. The accompanying bar chart (below) summarizes district-specific slopes as point estimates.
H4 assesses whether offshore-jurisdiction and border-proximate districts — which face distinct operational environments — exhibit different budget sensitivity. The model adds geographic level effects and budget $\times$ geography interaction terms to the baseline specification. A complementary spatial diagnostic — Moran's $I$ applied to the residuals from the H1 compliance model — tests for geographic clustering of unexplained outcome variation that could indicate omitted spatial processes or spillovers across district boundaries.
All regressions exclude fiscal years 2024 (budget estimate) and 2025 (no budget data), retaining 2016–2023 as the regression sample (N = 104). The extended panel through 2025 is used for descriptive trend analysis only. Enforcement-timing metrics for 2025 should be interpreted cautiously: because the data extract covers a partial year, violations discovered in late 2024 and 2025 may not yet have a recorded enforcement action, artificially compressing observed days-to-enforcement and resolution rates for that year.
Two supplementary robustness checks address key inferential limitations. First, wild cluster bootstrap inference (Rademacher, B = 999) re-tests H1 with valid small-sample p-values given G = 13 clusters. Second, a distributed lag specification tests whether budget effects operate with a one-year delay, consistent with a hiring-and-deployment implementation timeline. Results from both checks are reported following the main hypothesis tests.
H1: Organizational Capacity → Policy Outputs¶
Prediction: Higher OGI budget predicts more inspections, higher compliance rates, and faster violation resolution.
Model: OLS with district fixed effects, 2016–2023 (N = 104). Budget varies only over time, identifying effects via year-to-year changes in statewide OGI appropriations; district fixed effects absorb persistent cross-district differences. Standard errors clustered at the district level (G = 13).
Finding (preview): All three outcomes show positive, statistically significant budget coefficients under asymptotic inference. Wild cluster bootstrap results (reported in the Robustness Checks section) indicate these asymptotic p-values overstate precision; results should be interpreted as suggestive rather than definitive.
actuals = panel[panel["is_budget_year"] == 0].copy()
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
actuals.plot.scatter(x="ogi_budget_m", y="total_inspections",
alpha=0.4, ax=axes[0], title="Budget → Inspections")
actuals.plot.scatter(x="ogi_budget_m", y="compliance_rate",
alpha=0.4, ax=axes[1], title="Budget → Compliance Rate (%)")
actuals.plot.scatter(x="ogi_budget_m", y="resolution_rate",
alpha=0.4, ax=axes[2], title="Budget → Resolution Rate (%)")
for ax in axes:
ax.set_xlabel("OGI Budget ($M)")
plt.tight_layout()
plt.show()
m_inspections = smf.ols(
"total_inspections ~ ogi_budget_m + C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
m_compliance = smf.ols(
"compliance_rate ~ ogi_budget_m + C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
m_resolution = smf.ols(
"resolution_rate ~ ogi_budget_m + C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
# Detect actual column names — statsmodels uses z/P>|z| with robust SEs in some versions
_tbl = m_inspections.summary2().tables[1]
_t = "t" if "t" in _tbl.columns else "z"
_p = "P>|t|" if "P>|t|" in _tbl.columns else "P>|z|"
display_cols = ["Coef.", "Std.Err.", _t, _p]
print("H1a — OGI Budget ($M) → Total Inspections")
print(m_inspections.summary2().tables[1][display_cols].loc[["ogi_budget_m"]])
print(f" R² = {m_inspections.rsquared:.3f} Adj. R² = {m_inspections.rsquared_adj:.3f}\n")
print("H1b — OGI Budget ($M) → Compliance Rate (%)")
print(m_compliance.summary2().tables[1][display_cols].loc[["ogi_budget_m"]])
print(f" R² = {m_compliance.rsquared:.3f} Adj. R² = {m_compliance.rsquared_adj:.3f}\n")
print("H1c — OGI Budget ($M) → Resolution Rate (%)")
print(m_resolution.summary2().tables[1][display_cols].loc[["ogi_budget_m"]])
print(f" R² = {m_resolution.rsquared:.3f} Adj. R² = {m_resolution.rsquared_adj:.3f}")
H1a — OGI Budget ($M) → Total Inspections
Coef. Std.Err. z P>|z|
ogi_budget_m 666.30 212.98 3.13 0.00
R² = 0.769 Adj. R² = 0.736
H1b — OGI Budget ($M) → Compliance Rate (%)
Coef. Std.Err. z P>|z|
ogi_budget_m 0.26 0.11 2.31 0.02
R² = 0.538 Adj. R² = 0.471
H1c — OGI Budget ($M) → Resolution Rate (%)
Coef. Std.Err. z P>|z|
ogi_budget_m 1.05 0.32 3.28 0.00
R² = 0.624 Adj. R² = 0.569
H2: Goal Ambiguity Moderates Capacity Effects¶
Prediction: When a larger share of combined RRC budget flows to the broader
"Energy Resource Development" goal (lower inspection_budget_share), the capacity →
output link weakens. A positive interaction coefficient would support H2.
Operationalization:
inspection_budget_share = ogi_budget / (ogi_budget + erd_budget)
Identification note: Like the budget measure itself, inspection_budget_share
varies only over time, not across districts. The interaction term therefore exploits
the same narrow temporal variation as the main effect — budget share ranged from 0.59
to 0.67 over 2016–2023, a span of 8 percentage points across 8 years. This limits
the strength of inference that can be drawn from the moderation test.
Finding (preview): The interaction is significant and negative ($\hat{\beta}_3 = -6.53$, $p < .01$), but interpretation is constrained by the identification limitations above. Results are discussed in the Results section.
m_h2 = smf.ols(
"compliance_rate ~ ogi_budget_m * inspection_budget_share + C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
key_rows = ["ogi_budget_m", "inspection_budget_share", "ogi_budget_m:inspection_budget_share"]
print("H2 — Goal Ambiguity Moderation (DV: compliance_rate)")
print(m_h2.summary2().tables[1][display_cols].loc[key_rows])
print(f"\nR² = {m_h2.rsquared:.3f} Adj. R² = {m_h2.rsquared_adj:.3f}")
# ── Same model with resolution rate as DV ────────────────────────────────────
m_h2_res = smf.ols(
"resolution_rate ~ ogi_budget_m * inspection_budget_share + C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
print("\nH2 — Goal Ambiguity Moderation (DV: resolution_rate)")
print(m_h2_res.summary2().tables[1][display_cols].loc[key_rows])
print(f"\nR² = {m_h2_res.rsquared:.3f} Adj. R² = {m_h2_res.rsquared_adj:.3f}")
H2 — Goal Ambiguity Moderation (DV: compliance_rate)
Coef. Std.Err. z P>|z|
ogi_budget_m 4.20 1.09 3.86 0.00
inspection_budget_share 170.18 44.79 3.80 0.00
ogi_budget_m:inspection_budget_share -6.53 1.84 -3.55 0.00
R² = 0.567 Adj. R² = 0.493
H2 — Goal Ambiguity Moderation (DV: resolution_rate)
Coef. Std.Err. z P>|z|
ogi_budget_m 6.68 4.67 1.43 0.15
inspection_budget_share 230.67 204.30 1.13 0.26
ogi_budget_m:inspection_budget_share -9.42 7.99 -1.18 0.24
R² = 0.629 Adj. R² = 0.566
H3: District Multilevel Effects¶
Prediction: The budget → output slope varies across RRC districts — some districts translate budget increases into better outputs more effectively than others.
Model: Interaction ogi_budget_m × C(district) — the reference district captures
the baseline budget slope; interaction terms show how each other district's slope
differs. Standard errors are unreliable due to near-perfect multicollinearity in the
saturated model (budget varies only over time while district FE absorb cross-sectional
variation); results are treated as descriptive point estimates only.
Finding (preview): District slopes for compliance rate range from −0.34 pp per $1M (District 03, Houston/Coastal) to +1.36 pp per $1M (District 6E, East Texas Piney Woods), with most districts showing small positive slopes. The bar chart below plots district-specific slope estimates.
m_h3 = smf.ols(
"compliance_rate ~ ogi_budget_m * C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
coef_table = m_h3.summary2().tables[1]
# Baseline budget slope (reference district)
baseline_row = coef_table.loc[["ogi_budget_m"]]
print("H3 — District-Heterogeneous Budget Effect (DV: compliance_rate)")
print(f"Baseline (reference district) budget slope:")
print(baseline_row[display_cols])
# District-specific deviations from baseline
interaction_rows = coef_table[coef_table.index.str.contains("ogi_budget_m:C")]
print("\nDistrict interaction terms (deviation from reference slope):")
print(interaction_rows[display_cols].round(4))
print(f"\nR² = {m_h3.rsquared:.3f} Adj. R² = {m_h3.rsquared_adj:.3f}")
# ── Plot district-specific budget slopes ─────────────────────────────────────
districts = actuals["district"].unique()
slopes = {}
for d in districts:
key = f"ogi_budget_m:C(district)[T.{d}]"
base = m_h3.params.get("ogi_budget_m", 0)
delta = m_h3.params.get(key, 0)
slopes[str(d)] = base + delta
slope_df = pd.Series(slopes).sort_values()
fig, ax = plt.subplots(figsize=(10, 4))
slope_df.plot.barh(ax=ax, color=["#d62728" if v < 0 else "#1f77b4" for v in slope_df])
ax.axvline(0, color="black", linewidth=0.8)
ax.set_xlabel("Budget slope (compliance rate pp per $M)")
ax.set_title("H3 — District-Specific Budget → Compliance Slopes")
plt.tight_layout()
plt.show()
H3 — District-Heterogeneous Budget Effect (DV: compliance_rate)
Baseline (reference district) budget slope:
Coef. Std.Err. z P>|z|
ogi_budget_m 0.09 0.00 56,876,193,472,228.37 0.00
District interaction terms (deviation from reference slope):
Coef. Std.Err. z P>|z|
ogi_budget_m:C(district)[T.02] 0.15 0.00 22,633,237,551,336.32 0.00
ogi_budget_m:C(district)[T.03] -0.43 0.00 -59,804,100,493,329.36 0.00
ogi_budget_m:C(district)[T.04] 0.19 0.00 78,131,153,896,367.78 0.00
ogi_budget_m:C(district)[T.05] -0.04 0.00 -23,701,820,832,698.50 0.00
ogi_budget_m:C(district)[T.06] 0.34 0.00 60,365,540,001,288.30 0.00
ogi_budget_m:C(district)[T.08] 0.19 0.00 10,356,376,563,126.46 0.00
ogi_budget_m:C(district)[T.09] -0.09 0.00 -14,544,886,315,847.22 0.00
ogi_budget_m:C(district)[T.10] 0.04 0.00 5,748,033,218,673.02 0.00
ogi_budget_m:C(district)[T.6E] 1.27 0.00 64,743,648,722,385.09 0.00
ogi_budget_m:C(district)[T.7B] 0.18 0.00 27,978,802,690,136.84 0.00
ogi_budget_m:C(district)[T.7C] 0.31 0.00 24,243,474,173,332.52 0.00
ogi_budget_m:C(district)[T.8A] 0.10 0.00 59,702,739,775,453.20 0.00
R² = 0.662 Adj. R² = 0.554
H4: Spatial and Geographic Factors¶
Predictions:
- Offshore-jurisdiction districts (02, 03, 04) show a different budget → output relationship due to dual onshore/offshore oversight burden.
- Border-proximate districts show a different relationship due to cross-jurisdiction enforcement complexity.
- Spatial autocorrelation in H1 residuals (Moran's I) would indicate unmodeled geographic spillovers.
Finding (preview): Offshore and border districts show significantly higher baseline compliance rates (+7.6 pp and +6.0 pp respectively, both $p < .05$) but not different budget sensitivity. Moran's $I = -0.051$ indicates slight spatial dispersion and no significant geographic clustering of residuals. Results are discussed in the Results section.
# Texas RRC district geography flags (based on known RRC district locations)
OFFSHORE_DISTRICTS = {"02", "03", "04"} # dual onshore + offshore jurisdiction
BORDER_DISTRICTS = {"01", "02", "03", "04"} # south / gulf coast proximity
actuals = actuals.copy()
actuals["district_str"] = actuals["district"].astype(str).str.strip()
actuals["offshore"] = actuals["district_str"].isin(OFFSHORE_DISTRICTS).astype(int)
actuals["border"] = actuals["district_str"].isin(BORDER_DISTRICTS).astype(int)
print("District classification:")
print(
actuals.groupby(["district_str", "offshore", "border"])
.size()
.reset_index(name="district_year_obs")
.to_string(index=False)
)
District classification:
district_str offshore border district_year_obs
01 0 1 8
02 1 1 8
03 1 1 8
04 1 1 8
05 0 0 8
06 0 0 8
08 0 0 8
09 0 0 8
10 0 0 8
6E 0 0 8
7B 0 0 8
7C 0 0 8
8A 0 0 8
# ── Spatial regression: offshore and border interactions ─────────────────────
m_h4 = smf.ols(
"compliance_rate ~ ogi_budget_m + offshore + border "
"+ ogi_budget_m:offshore + ogi_budget_m:border + C(district)",
data=actuals,
).fit(cov_type="cluster", cov_kwds={"groups": actuals["district"]})
spatial_rows = [
"ogi_budget_m", "offshore", "border",
"ogi_budget_m:offshore", "ogi_budget_m:border",
]
available = [r for r in spatial_rows if r in m_h4.params.index]
print("H4 — Spatial Moderators (DV: compliance_rate)")
print(m_h4.summary2().tables[1][display_cols].loc[available])
print(f"\nR² = {m_h4.rsquared:.3f} Adj. R² = {m_h4.rsquared_adj:.3f}")
# ── Moran's I on H1 residuals ─────────────────────────────────────────────────
# Compute district centroids from well lat/lon joined via inspections
centroids_sql = """
SELECT
i.district,
AVG(w.latitude) AS lat,
AVG(w.longitude) AS lon
FROM inspections i
JOIN well_shape_tract w USING (api_norm)
WHERE w.latitude IS NOT NULL
AND w.longitude IS NOT NULL
AND i.district IS NOT NULL
GROUP BY i.district
"""
try:
centroids = pd.read_sql(text(centroids_sql), engine)
# Average H1 compliance residuals to district level
resid_df = actuals[["district", "compliance_rate"]].copy()
resid_df["resid"] = m_compliance.resid.reindex(actuals.index).values
resid_by_district = resid_df.groupby("district")["resid"].mean().reset_index()
centroids = centroids.merge(resid_by_district, on="district").dropna()
# Row-normalised inverse-distance weights matrix
coords = centroids[["lon", "lat"]].values
D = cdist(coords, coords)
np.fill_diagonal(D, np.inf)
W = 1 / D
W = W / W.sum(axis=1, keepdims=True)
z = centroids["resid"].values
z = z - z.mean()
n = len(z)
morans_i = (n / W.sum()) * (z @ W @ z) / (z @ z)
print(f"\nMoran's I on H1 compliance residuals = {morans_i:.4f}")
print(" > 0 → residuals cluster spatially (similar neighbours)")
print(" ≈ 0 → no spatial pattern")
print(" < 0 → spatial dispersion (dissimilar neighbours)")
print("\nDistrict centroids used:")
print(centroids[["district", "lat", "lon"]].round(2).to_string(index=False))
except Exception as e:
print(f"Moran's I skipped: {e}")
H4 — Spatial Moderators (DV: compliance_rate)
Coef. Std.Err. z P>|z|
ogi_budget_m 0.35 0.15 2.39 0.02
offshore 7.61 3.29 2.31 0.02
border 6.03 2.84 2.12 0.03
ogi_budget_m:offshore -0.03 0.18 -0.16 0.87
ogi_budget_m:border -0.25 0.15 -1.74 0.08
R² = 0.553 Adj. R² = 0.476
Moran's I on H1 compliance residuals = -0.0512
> 0 → residuals cluster spatially (similar neighbours)
≈ 0 → no spatial pattern
< 0 → spatial dispersion (dissimilar neighbours)
District centroids used:
district lat lon
01 29.15 -98.62
02 28.85 -97.41
03 30.12 -95.43
04 27.44 -98.36
05 31.85 -96.15
06 32.29 -94.67
08 31.84 -102.30
09 33.42 -98.22
10 35.77 -101.02
6E 32.40 -94.89
7B 32.75 -99.40
7C 31.11 -101.26
8A 33.12 -102.06
Results¶
Descriptive Trends¶
Table 1 summarizes year-level means for the key variables across 2016–2025, with regression analyses restricted to 2016–2023. OGI appropriations grew from $18.47 million in 2016 to $34.33 million in 2023 — an 86 percent nominal increase — with the FY2024 budget estimate reaching $38.51 million. Authorized FTE positions rose modestly from 256.7 to 271.2 over the same period. Inspection volume per district increased from a mean of 18,278 in 2016 to a peak of 36,553 in 2024, with a partial-year figure of 34,082 recorded for 2025. Mean district compliance rate improved from 83.1 percent in 2016 to a peak of 92.6 percent in 2024, with a slight moderation to 90.5 percent in the 2025 partial-year extract. Violation resolution rate rose from 36.8 percent in 2016 to 69.7 percent in 2023 before declining to 52.1 percent in 2025; this decline almost certainly reflects right-censoring rather than a genuine deterioration in enforcement outcomes, as recently discovered violations will not yet have received a recorded resolution on re-inspection. Similarly, the 2025 days-to-enforcement figure of 36.6 days should be interpreted as a lower bound on the true enforcement timeline for that cohort of violations. These trends are broadly consistent with the organizational capacity hypothesis, though they are also consistent with secular improvements in industry compliance independent of budget growth.
Table 1. Year-Level Panel Means, 2016–2025
| Year | OGI Budget ($M) | OGI FTE | Inspections/District | Compliance Rate (%) | Resolution Rate (%) | Days to Enforcement |
|---|---|---|---|---|---|---|
| 2016 | 18.47 | 256.7 | 18,278 | 83.1 | 36.8 | 131.9 |
| 2017 | 17.20 | 249.5 | 20,139 | 86.5 | 59.0 | 185.0 |
| 2018 | 17.56 | 229.9 | 25,704 | 90.2 | 59.5 | 207.3 |
| 2019 | 21.95 | 255.6 | 25,058 | 89.9 | 61.4 | 170.4 |
| 2020 | 26.06 | 284.0 | 27,669 | 89.6 | 56.8 | 154.7 |
| 2021 | 28.76 | 277.8 | 24,116 | 88.8 | 66.2 | 118.8 |
| 2022 | 25.91 | 264.0 | 32,024 | 89.8 | 67.9 | 91.5 |
| 2023 | 34.33 | 271.2 | 33,806 | 91.6 | 69.7 | 105.2 |
| 2024† | 38.51 | 280.8 | 36,553 | 92.6 | 65.1 | 76.9 |
| 2025‡ | — | — | 34,082 | 90.5 | 52.1 | 36.6‡ |
Note: Budget figures are nominal. FTE = authorized full-time equivalent positions. Inspections/District = mean district-level annual inspection count. † 2024 budget is an appropriations estimate, not expenditure actuals; excluded from regression models. ‡ 2025 data is partial-year as of the data extract. Resolution rate and days-to-enforcement are right-censored: violations discovered in late 2024–2025 may not yet have a recorded enforcement action, compressing these metrics.
H1: Organizational Capacity and Regulatory Outputs¶
The baseline fixed-effects models provide consistent support for H1 across all three dependent variables (Table 2). Each additional million dollars in OGI appropriations is associated with approximately 666 additional district-level inspections per year ($\hat{\beta} = 666.30$, SE = 212.98, $z = 3.13$, $p < .01$; $R^2 = .769$). The budget coefficient is also positive and significant for compliance rate ($\hat{\beta} = 0.26$ percentage points per $1M, SE = 0.11, $z = 2.31$, $p = .02$; $R^2 = .538$) and violation resolution rate ($\hat{\beta} = 1.05$ percentage points per $1M, SE = 0.32, $z = 3.28$, $p < .01$; $R^2 = .624$). These associations are estimated net of district fixed effects and therefore reflect within-district covariation between annual budget changes and outcome changes rather than cross-sectional differences between better- and worse-funded districts.
Table 2. H1 Regression Results: OGI Budget → Regulatory Outputs
| Dependent Variable | $\hat{\beta}$ (Budget $M) | SE | $z$ | $p$ | $R^2$ | Adj. $R^2$ |
|---|---|---|---|---|---|---|
| Total inspections | 666.30 | 212.98 | 3.13 | <.01 | .769 | .736 |
| Compliance rate (%) | 0.26 | 0.11 | 2.31 | .02 | .538 | .471 |
| Resolution rate (%) | 1.05 | 0.32 | 3.28 | <.01 | .624 | .569 |
Note: All models include district fixed effects ($D = 13$). Standard errors clustered at the district level. $N = 104$.
H2: Goal Ambiguity as a Moderator¶
The goal ambiguity moderation model for compliance rate (Table 3) yields a statistically significant and negative interaction between OGI budget and inspection budget share ($\hat{\beta}_3 = -6.53$, SE = 1.84, $z = -3.55$, $p < .01$). However, this result requires careful qualification before any mechanism is claimed.
The key issue is that inspection_budget_share — like the budget measure itself —
varies only over time, not across districts. All 13 districts experience the same
budget share in any given year, ranging from 0.59 (FY2022) to 0.67 (FY2018) across
the study period — a span of 8 percentage points over 8 observations. The interaction
term is therefore identified from the same narrow temporal variation as the main budget
effect, not from cross-district differences in mission structure. This makes it
difficult to distinguish a genuine moderation relationship from a spurious correlation
with year-specific factors that independently affected both budget share and compliance
outcomes in the same years.
The negative sign is consistent with at least two interpretations. Under a resource saturation story, compliance gains from additional OGI investment contract as the inspection mandate becomes better resourced relative to other RRC goals — a plausible ceiling effect if districts are already operating near full compliance in high-share years. Alternatively, the result may simply reflect that FY2018 — the highest-share year — saw particularly large compliance gains for reasons unrelated to budget concentration (e.g., post-2016 industry recovery, early implementation of regulatory changes). Evaluated at mean budget share ($\bar{s} \approx 0.62$), the implied marginal budget effect on compliance is $4.20 - 6.53(0.62) \approx 0.15$ pp per $1M — directionally consistent with H1 but smaller.
For violation resolution rate, no terms reach conventional significance (all $p > .15$). Given the identification constraints, the H2 compliance finding is best treated as an exploratory pattern consistent with goal ambiguity theory — one that motivates future research with district-level budget variation — rather than a robust confirmatory test.
Table 3. H2 Regression Results: Goal Ambiguity Moderation (DV: Compliance Rate)
| Term | $\hat{\beta}$ | SE | $z$ | $p$ |
|---|---|---|---|---|
| Budget ($M) | 4.20 | 1.09 | 3.86 | <.01 |
| Inspection budget share | 170.18 | 44.79 | 3.80 | <.01 |
| Budget × Share | −6.53 | 1.84 | −3.55 | <.01 |
Note: District fixed effects included. SE clustered at district. $R^2 = .567$, Adj. $R^2 = .493$. $N = 104$.
H3: District-Level Heterogeneity¶
District-specific budget slopes for compliance rate range from $-0.34$ percentage points per $1 million (District 03, Coastal/Greater Houston) to $+1.36$ percentage points (District 6E, East Texas Piney Woods), with most districts showing small positive slopes (Table 4). The reference district (District 01, San Antonio) slope is 0.09 pp per $1M. Positive slopes are most pronounced in District 6E (+1.36), District 06 (+0.43), and District 7C (+0.40); District 03 is the only district with a substantially negative slope. The model $R^2$ of .662 modestly exceeds the baseline H1 value (.538), consistent with meaningful cross-district slope heterogeneity. Standard errors for the interaction terms are not reported, as they are unreliable due to near-perfect multicollinearity in the saturated model (see Data and Methods); point estimates are presented as descriptive indicators only.
Table 4. H3 District-Specific Budget → Compliance Slopes (pp per $1M)
| District | Estimated Slope |
|---|---|
| 01 (San Antonio) | 0.09 |
| 02 (Corpus Christi) | 0.24 |
| 03 (Houston) | −0.34 |
| 04 (Laredo) | 0.28 |
| 05 (Midland/Abilene) | 0.05 |
| 06 (Kilgore) | 0.43 |
| 08 (Midland) | 0.28 |
| 09 (Wichita Falls) | 0.00 |
| 10 (Amarillo) | 0.13 |
| 6E (Kilgore East) | 1.36 |
| 7B (Abilene) | 0.27 |
| 7C (Big Spring) | 0.40 |
| 8A (Lubbock) | 0.19 |
Note: Slopes are $\hat{\beta}_1 + \hat{\delta}_d$ from the H3 interaction model.
H4: Spatial and Geographic Factors¶
The geographic moderation model (Table 5) reveals that offshore-jurisdiction districts (02, 03, 04) exhibit compliance rates approximately 7.6 percentage points higher than non-offshore districts on average, net of budget ($\hat{\beta} = 7.61$, SE = 3.29, $z = 2.31$, $p = .02$). Border-proximate districts similarly show elevated baseline compliance rates (+6.03 pp, SE = 2.84, $z = 2.12$, $p = .03$). These level effects may reflect the heightened external scrutiny — from federal regulators, environmental organizations, and media — that offshore and border districts attract, which could independently drive compliance investments by operators regardless of RRC budget levels.
The budget–compliance slope, however, does not differ significantly between offshore and non-offshore districts ($\hat{\beta}_4 = -0.03$, $p = .87$), nor between border and non-border districts at conventional thresholds ($\hat{\beta}_5 = -0.25$, $p = .08$), suggesting that geographic classification affects the level of compliance performance but not the degree to which additional budget translates into compliance gains.
Moran's $I$ computed on district-level residuals from the H1 compliance model is $I = -0.051$, indicating slight spatial dispersion but no statistically significant spatial autocorrelation. This finding is consistent with prior district-level analysis of this regulatory system and suggests that unmodeled geographic spillovers are not a material source of omitted variable bias in the panel models.
Table 5. H4 Regression Results: Geographic Moderation (DV: Compliance Rate)
| Term | $\hat{\beta}$ | SE | $z$ | $p$ |
|---|---|---|---|---|
| Budget ($M) | 0.35 | 0.15 | 2.39 | .02 |
| Offshore (= 1) | 7.61 | 3.29 | 2.31 | .02 |
| Border (= 1) | 6.03 | 2.84 | 2.12 | .03 |
| Budget × Offshore | −0.03 | 0.18 | −0.16 | .87 |
| Budget × Border | −0.25 | 0.15 | −1.74 | .08 |
Note: District fixed effects included. SE clustered at district. $R^2 = .553$, Adj. $R^2 = .476$. $N = 104$. Moran's $I$ on H1 compliance residuals = −0.051 (no significant spatial autocorrelation).
Summary¶
Taken together, the results offer moderate support for a resource-capacity model of regulatory performance. Higher OGI appropriations are reliably associated with greater inspection volume, higher compliance rates, and faster violation resolution — though identification rests on temporal variation in statewide appropriations rather than quasi-experimental assignment, and the modest panel length limits statistical precision. Goal ambiguity moderation operates through a diminishing-returns mechanism: compliance gains from additional budget are smaller in years when the inspection mandate receives a larger share of combined appropriations, consistent with resource saturation rather than amplification. District heterogeneity in budget–outcome slopes is substantial in descriptive terms but cannot be precisely estimated with the available data. Finally, geographic context — offshore jurisdiction and border proximity — predicts compliance levels but not budget sensitivity, and spatial autocorrelation diagnostics provide no evidence of unmodeled geographic spillover processes.
Robustness Checks¶
Wild cluster bootstrap. With only $G = 13$ district clusters, asymptotic cluster-robust standard errors may substantially understate true uncertainty. Wild cluster bootstrap inference (Rademacher weights, $B = 999$ draws; Cameron, Gelbach & Miller 2008) yields bootstrap p-values near 0.49–0.51 for all three H1 outcomes: total inspections ($p_{boot} = 0.494$), compliance rate ($p_{boot} = 0.473$), and resolution rate ($p_{boot} = 0.509$). These are far from any conventional significance threshold, in stark contrast to the asymptotic p-values of 0.002, 0.021, and 0.001. The divergence indicates that with $G = 13$ clusters, asymptotic inference significantly overstates precision. The H1 point estimates remain positive and directionally consistent, but the results do not survive bootstrap-based inference. This is the principal inferential limitation of the study.
Table 7. Wild Cluster Bootstrap vs. Asymptotic p-values (H1 Models, B = 999)
| Outcome | $t$-statistic | $p$ (asymptotic) | $p$ (bootstrap) |
|---|---|---|---|
| Total inspections | 3.13 | .002 | .494 |
| Compliance rate | 2.31 | .021 | .473 |
| Resolution rate | 3.28 | .001 | .509 |
Note: Bootstrap p-values based on 999 Rademacher wild cluster bootstrap draws. Small number of clusters (G = 13) renders asymptotic inference unreliable.
Distributed lag model. The distributed lag models test whether budget effects operate with a one-year delay consistent with a hiring-and-deployment mechanism. For compliance rate, the lagged budget alone is not significant ($\hat{\beta}_{t-1} = 0.10$, $p = .44$; Model A, N = 91), and in the combined model the contemporaneous term remains marginally significant ($\hat{\beta}_t = 0.24$, $p = .04$) while the lagged term is negative and non-significant ($\hat{\beta}_{t-1} = -0.14$, $p = .12$; Model B). For violation resolution rate, the lagged budget is marginally significant when estimated alone ($\hat{\beta}_{t-1} = 0.83$, $p = .09$; Model A), but neither term reaches conventional significance in the combined model ($p = .22$ and $p = .14$).
These findings provide little support for a delayed implementation mechanism. The persistence of contemporaneous effects alongside non-significant lagged terms is more consistent with an immediate budget–output relationship. However, the N = 91 sample offers limited power to disentangle contemporaneous and lagged effects that are highly collinear over an eight-year window.
Table 8. Distributed Lag Results (2017–2023, N = 91)
| Model | DV | $\hat{\beta}_t$ | $p$ | $\hat{\beta}_{t-1}$ | $p$ | $R^2$ |
|---|---|---|---|---|---|---|
| A — Lag only | Compliance rate | — | — | 0.10 | .44 | .543 |
| B — Both | Compliance rate | 0.24 | .04 | −0.14 | .12 | .569 |
| A — Lag only | Resolution rate | — | — | 0.83 | .09 | .696 |
| B — Both | Resolution rate | 0.24 | .22 | 0.59 | .14 | .698 |
Note: District fixed effects included; SE clustered at district.
Robustness Checks¶
Two checks address limitations of the baseline H1 models.
Wild cluster bootstrap re-tests H1 with valid small-sample inference rather than asymptotic cluster-robust standard errors. With $G = 13$ clusters, asymptotic results can overstate precision. Rademacher wild cluster bootstrap ($B = 999$ draws; Cameron, Gelbach & Miller 2008) yields p-values near 0.49–0.51 for all three H1 outcomes — far from any conventional threshold — indicating that the asymptotic H1 results do not survive this correction. Point estimates remain positive and substantively consistent in direction, but the study lacks the cluster count required to establish significance through bootstrap inference.
Distributed lag model relaxes the assumption that budget effects are instantaneous. A one-year lag of OGI budget is estimated alone (Model A) and jointly with the contemporaneous term (Model B), over the 2017–2023 sample (N = 91). The lagged budget is not independently significant for compliance rate ($p = .44$) and only marginally so for resolution rate ($p = .09$). In the combined models, contemporaneous effects persist while lagged terms do not attain significance — providing little evidence that a delayed mechanism dominates an immediate one.
# Wild cluster bootstrap (Rademacher weights, B=999)
# For each draw: multiply each cluster's residuals by ±1, re-fit, record t-stat.
# p-value = share of |t*| >= |t_observed|.
def wild_cluster_bootstrap(model, data, dv, cluster_col="district",
coef="ogi_budget_m", B=999, seed=42):
rng = np.random.default_rng(seed)
groups = data[cluster_col].values
unique_groups = np.unique(groups)
t_obs = model.tvalues[coef]
yhat = model.fittedvalues.values
ehat = model.resid.values
t_boot = np.empty(B)
for b in range(B):
# One Rademacher weight per cluster, broadcast to observations
cw = {g: rng.choice([-1.0, 1.0]) for g in unique_groups}
w = np.array([cw[g] for g in groups])
df_b = data.copy()
df_b[dv] = yhat + ehat * w
m_b = smf.ols(
f"{dv} ~ {coef} + C({cluster_col})", data=df_b
).fit(cov_type="cluster", cov_kwds={"groups": df_b[cluster_col]})
t_boot[b] = m_b.tvalues.get(coef, np.nan)
p_boot = float((np.abs(t_boot) >= np.abs(t_obs)).mean())
return t_obs, float(model.pvalues[coef]), p_boot
print("Wild Cluster Bootstrap — H1 Models (B = 999 Rademacher draws)")
print(f"{'Outcome':<28} {'t-stat':>7} {'p asymptotic':>13} {'p bootstrap':>12}")
print("─" * 65)
for dv, model in [
("total_inspections", m_inspections),
("compliance_rate", m_compliance),
("resolution_rate", m_resolution),
]:
t, p_a, p_b = wild_cluster_bootstrap(model, actuals, dv)
sig_a = "*" * (1 + (p_a < .05) + (p_a < .01))
sig_b = "*" * (1 + (p_b < .05) + (p_b < .01))
print(f"{dv:<28} {t:>7.3f} {p_a:>12.3f}{sig_a:<3} {p_b:>10.3f}{sig_b:<3}")
print("\n* p<.10 ** p<.05 *** p<.01")
Wild Cluster Bootstrap — H1 Models (B = 999 Rademacher draws) Outcome t-stat p asymptotic p bootstrap ───────────────────────────────────────────────────────────────── total_inspections 3.128 0.002*** 0.494* compliance_rate 2.307 0.021** 0.473* resolution_rate 3.277 0.001*** 0.509* * p<.10 ** p<.05 *** p<.01
# Distributed lag: 1-year lag of OGI budget (shift within district).
# Lag is NaN for 2016 (no 2015 data), so regression sample is 2017-2023 (N=91).
panel_lag = panel.copy()
panel_lag["ogi_budget_m_lag1"] = (
panel_lag.sort_values("year")
.groupby("district")["ogi_budget_m"]
.shift(1)
)
lag_actuals = panel_lag[
(panel_lag["is_budget_year"] == 0) &
(panel_lag["ogi_budget_m_lag1"].notna())
].copy()
print(f"Distributed lag sample: {len(lag_actuals)} obs | "
f"years {lag_actuals['year'].min()}–{lag_actuals['year'].max()}")
# ── Model A: lagged budget only ───────────────────────────────────────────────
m_lag_only = smf.ols(
"compliance_rate ~ ogi_budget_m_lag1 + C(district)", data=lag_actuals
).fit(cov_type="cluster", cov_kwds={"groups": lag_actuals["district"]})
# ── Model B: contemporaneous + 1-year lag ────────────────────────────────────
m_lag_both = smf.ols(
"compliance_rate ~ ogi_budget_m + ogi_budget_m_lag1 + C(district)",
data=lag_actuals
).fit(cov_type="cluster", cov_kwds={"groups": lag_actuals["district"]})
# ── Also run for resolution rate ──────────────────────────────────────────────
m_lag_res_only = smf.ols(
"resolution_rate ~ ogi_budget_m_lag1 + C(district)", data=lag_actuals
).fit(cov_type="cluster", cov_kwds={"groups": lag_actuals["district"]})
m_lag_res_both = smf.ols(
"resolution_rate ~ ogi_budget_m + ogi_budget_m_lag1 + C(district)",
data=lag_actuals
).fit(cov_type="cluster", cov_kwds={"groups": lag_actuals["district"]})
print("\n── Compliance Rate ───────────────────────────────────────────")
print("\nModel A — Lagged budget only (t−1):")
print(m_lag_only.summary2().tables[1][display_cols].loc[["ogi_budget_m_lag1"]])
print(f" R² = {m_lag_only.rsquared:.3f} Adj. R² = {m_lag_only.rsquared_adj:.3f}")
print("\nModel B — Contemporaneous + 1-year lag:")
print(m_lag_both.summary2().tables[1][display_cols].loc[
["ogi_budget_m", "ogi_budget_m_lag1"]
])
print(f" R² = {m_lag_both.rsquared:.3f} Adj. R² = {m_lag_both.rsquared_adj:.3f}")
print("\n── Resolution Rate ───────────────────────────────────────────")
print("\nModel A — Lagged budget only (t−1):")
print(m_lag_res_only.summary2().tables[1][display_cols].loc[["ogi_budget_m_lag1"]])
print(f" R² = {m_lag_res_only.rsquared:.3f} Adj. R² = {m_lag_res_only.rsquared_adj:.3f}")
print("\nModel B — Contemporaneous + 1-year lag:")
print(m_lag_res_both.summary2().tables[1][display_cols].loc[
["ogi_budget_m", "ogi_budget_m_lag1"]
])
print(f" R² = {m_lag_res_both.rsquared:.3f} Adj. R² = {m_lag_res_both.rsquared_adj:.3f}")
Distributed lag sample: 91 obs | years 2017–2023
── Compliance Rate ───────────────────────────────────────────
Model A — Lagged budget only (t−1):
Coef. Std.Err. z P>|z|
ogi_budget_m_lag1 0.10 0.13 0.77 0.44
R² = 0.543 Adj. R² = 0.466
Model B — Contemporaneous + 1-year lag:
Coef. Std.Err. z P>|z|
ogi_budget_m 0.24 0.11 2.08 0.04
ogi_budget_m_lag1 -0.14 0.09 -1.55 0.12
R² = 0.569 Adj. R² = 0.490
── Resolution Rate ───────────────────────────────────────────
Model A — Lagged budget only (t−1):
Coef. Std.Err. z P>|z|
ogi_budget_m_lag1 0.83 0.49 1.69 0.09
R² = 0.696 Adj. R² = 0.644
Model B — Contemporaneous + 1-year lag:
Coef. Std.Err. z P>|z|
ogi_budget_m 0.24 0.19 1.22 0.22
ogi_budget_m_lag1 0.59 0.40 1.46 0.14
R² = 0.698 Adj. R² = 0.642
Hypotheses Summary¶
Table 6. Summary of Hypotheses, Predictions, Findings, and Empirical Support
| # | Hypothesis | Prediction | Key Result | Support |
|---|---|---|---|---|
| H1a | Capacity → Inspection volume | Higher OGI budget predicts more inspections per district | β = 666.3/$1M (z = 3.13, p < .01); bootstrap p = .494 | ✓† |
| H1b | Capacity → Compliance | Higher OGI budget predicts higher district compliance rate | β = 0.26 pp/$1M (z = 2.31, p = .02); bootstrap p = .473 | ✓† |
| H1c | Capacity → Resolution | Higher OGI budget predicts higher violation resolution rate | β = 1.05 pp/$1M (z = 3.28, p < .01); bootstrap p = .509 | ✓† |
| H2a | Goal ambiguity moderates capacity → compliance | Clearer inspection focus amplifies budget effect | Significant but negative (β = −6.53, z = −3.55, p < .01); interpretation constrained by time-only variation in budget share (range: 0.59–0.67) | Exploratory‡ |
| H2b | Goal ambiguity moderates capacity → resolution | Clearer inspection focus amplifies budget effect | Interaction not significant (p = .24) | ✗ |
| H3 | District heterogeneity in budget slopes | Budget → compliance slope varies across districts | Slopes from −0.34 pp/$1M (D03) to +1.36 pp/$1M (D6E); inference unreliable | Descriptive§ |
| H4a | Offshore jurisdiction moderates budget effect | Offshore districts show different budget → compliance slope | Level effect: +7.6 pp (p = .02); slope interaction not significant (p = .87) | Partial¶ |
| H4b | Border proximity moderates budget effect | Border districts show different budget → compliance slope | Level effect: +6.0 pp (p = .03); slope interaction marginal (p = .08) | Partial¶ |
| H4c | Spatial autocorrelation in residuals | Geographic spillovers produce clustered residuals | Moran's I = −0.051; no significant spatial autocorrelation | ✗ |
Notes:
† H1 point estimates are positive and directionally consistent across all three outcomes, supporting the capacity hypothesis substantively. However, wild cluster bootstrap inference (B = 999 Rademacher draws) yields p-values near 0.49–0.51 for all outcomes, indicating that asymptotic cluster-robust standard errors substantially overstate precision with G = 13 clusters. H1 findings should be interpreted as suggestive rather than statistically definitive. Distributed lag models (2017–2023, N = 91) show contemporaneous effects persist while lagged terms do not reach significance, providing no clear evidence for a delayed implementation mechanism.
‡ H2a is statistically significant but the identification is weak: inspection budget share varies only over time (like the budget itself), with a range of just 0.59–0.67 across 8 years. The negative interaction is consistent with a resource saturation effect but cannot be distinguished from year-specific confounders. At mean share (≈ 0.62), the implied marginal budget effect is ≈ 0.15 pp per $1M. H2b not significant for resolution rate. Both H2 findings are best treated as exploratory patterns for future research.
§ H3 interaction standard errors are unreliable (near-perfect multicollinearity in the saturated model); budget slopes are reported as descriptive point estimates only.
¶ Geographic classification predicts compliance levels but not budget sensitivity. Offshore and border districts exhibit systematically higher compliance regardless of annual budget variation.
Regression sample: N = 104 (13 districts × 8 years, 2016–2023). All models include district fixed effects; standard errors clustered at the district level (G = 13). Robustness sample: N = 91 (2017–2023, distributed lag models).