diff --git a/analysis/analysis4.ipynb b/analysis/analysis4.ipynb index f95d33a..8de18e0 100644 --- a/analysis/analysis4.ipynb +++ b/analysis/analysis4.ipynb @@ -355,6 +355,58 @@ "print(recent_model.summary())" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## **Notes on Regression Results: Historical vs. Recent Spills**\n", + "\n", + "### **Historical Spills Regression Results**\n", + "- **R² = 0.021** → The model explains **2.1%** of the variance in reporting delay. \n", + "- **Key Findings:**\n", + " - **Income is negatively correlated with reporting delay**: \n", + " - **(-0.0009, p < 0.001)** → Lower **median household income** is associated with **longer reporting delays**.\n", + " - **Percent White has a strong negative effect**: \n", + " - **(-5.55, p < 0.001)** → Spills in **whiter areas** are reported **faster**.\n", + " - **Percent Hispanic also has a strong negative effect**: \n", + " - **(-3.16, p < 0.001)** → Spills in **Hispanic areas** have longer delays.\n", + " - **Percent Poverty and Unemployment Rate are not significant**: \n", + " - Poverty: **(-0.83, p = 0.21)** \n", + " - Unemployment: **(3.31, p = 0.12)**\n", + "- **Implications:** \n", + " - Spills discovered late (historical) tend to be in **lower-income and more Hispanic areas**.\n", + " - There may be **under-monitoring or regulatory neglect** in these communities.\n", + "\n", + "---\n", + "\n", + "### **Recent Spills Regression Results**\n", + "- **R² = 0.001** → The model explains almost **none** of the variance in reporting delay.\n", + "- **Key Findings:**\n", + " - **Most variables are NOT statistically significant**: \n", + " - Income: **(6.59e-06, p = 0.79)** \n", + " - Poverty: **(-0.095, p = 0.29)** \n", + " - Unemployment: **(0.082, p = 0.82)**\n", + " - **Percent White has a weak positive effect**: \n", + " - **(0.134, p = 0.046)** → In recent spills, whiter areas have **slightly longer delays**.\n", + " - **Percent Hispanic is borderline significant**: \n", + " - **(0.086, p = 0.072)** → Suggests **slightly longer delays** in Hispanic areas, but not as strong as in historical spills.\n", + "- **Implications:** \n", + " - **Socioeconomic factors do not strongly influence recent spill reporting.**\n", + " - This suggests that **recent spills are reported more uniformly** across different demographic areas.\n", + " - **Improvements in real-time monitoring may have reduced disparities** in reporting speed.\n", + "\n", + "---\n", + "\n", + "### **Comparative Takeaways**\n", + "| Variable | Historical Spills (p-value) | Recent Spills (p-value) | Interpretation |\n", + "|-------------------------|---------------------------|-------------------------|---------------|\n", + "| **Median Income** | -0.0009 (**p < 0.001**) | 6.59e-06 (p = 0.79) | Income disparities matter **only for historical spills**. |\n", + "| **Percent Poverty** | -0.835 (p = 0.21) | -0.095 (p = 0.29) | Poverty does not significantly impact reporting delay. |\n", + "| **Unemployment Rate** | 3.307 (p = 0.12) | 0.082 (p = 0.82) | No meaningful relationship with reporting delay. |\n", + "| **Percent White** | -5.548 (**p < 0.001**) | 0.134 (*p = 0.046*) | Whiter areas had **faster historical reporting**, but slightly **slower** recent reporting. |\n", + "| **Percent Hispanic** | -3.158 (**p < 0.001**) | 0.086 (p = 0.072) | Hispanic areas experienced **longer delays historically**, but recent disparities are weaker. |\n" + ] + }, { "cell_type": "code", "execution_count": 18, diff --git a/analysis/analysis4.pdf b/analysis/analysis4.pdf index df567fe..cdded5d 100644 Binary files a/analysis/analysis4.pdf and b/analysis/analysis4.pdf differ