Files
california-equity-git/README.md
2026-03-09 08:21:16 -07:00

115 lines
4.4 KiB
Markdown

# California Climate Investments (CCI) Analysis
A data analysis project examining collaboration patterns in California's climate funding programs and their impact on greenhouse gas (GHG) reduction efficiency and equity outcomes — particularly in disadvantaged communities (DACs).
## Overview
This project analyzes the [California Climate Investments](https://www.caclimateinvestments.ca.gov/) dataset to answer questions like:
- How does multi-agency collaboration affect program effectiveness?
- What are the regional variations in climate investment patterns?
- How do EV voucher/rebate programs (CARB) compare to other programs?
- Where do GHG efficiency and equity goals align or conflict?
- How have collaboration and funding patterns changed over time?
The dataset covers **146,305 projects** across **21 agencies** and **39 programs**, representing **$11.59 billion** in total funding and **112.7 million metric tons** of GHG reductions.
## Project Structure
```
california-equity-git/
├── run_cci_analysis.py # Main entry point — orchestrates full workflow
├── cci_analyzer.py # CCIDataAnalyzer class (data loading/cleaning)
├── cci_collaboration_analysis.py # Collaboration pattern analysis
├── research_analysis_script.py # Research question analysis
├── regional_analysis_script.py # Regional distribution analysis
├── collaboration_detection_script.py # Collaboration pattern detection
├── data_cleaning_script.py # Data cleaning utilities
├── 01_analyzer.ipynb # Interactive Jupyter notebook
├── data/
│ └── cci_programs_data_reduced.csv # Processed dataset (~40MB)
├── data_raw/
│ └── cci_programs_data.csv # Original CCI dataset (~242MB)
├── california_enviroscreen/ # CalEnviroScreen 4.0 data (geodatabase + shapefiles)
├── assembly_district_shapefile/ # CA State Assembly Districts 2020 shapefiles
└── output/ # Generated analysis outputs and visualizations
```
## Tech Stack
- **Python 3.12+**
- **pandas / numpy** — data manipulation
- **matplotlib / seaborn** — visualization
- **geopandas / shapely / pyproj** — geospatial analysis
- **scipy** — statistical testing
- **scikit-learn** — data preprocessing
## Setup
```bash
# Clone the repo
git clone <repo-url>
cd california-equity-git
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install pandas numpy matplotlib seaborn geopandas shapely pyproj scipy scikit-learn
```
## Usage
Run the full analysis pipeline:
```bash
python run_cci_analysis.py --data_path data/cci_programs_data_reduced.csv --output_dir output
```
**Optional flags:**
| Flag | Description |
|------|-------------|
| `--skip_cleaning` | Skip data cleaning step (use pre-cleaned data) |
| `--skip_analysis` | Skip collaboration analysis step |
| `--skip_research` | Skip research question analysis |
**Pipeline stages:**
1. **Data cleaning** — standardizes columns, parses dates, extracts coordinates, calculates derived metrics
2. **Collaboration analysis** — identifies multi-agency programs and collaboration patterns
3. **Research analysis** — answers core research questions with statistical tests and visualizations
Output files (charts, CSVs, summaries) are written to the `--output_dir` directory.
## Data
### Primary Dataset
- **Source:** [California Climate Investments Open Data Portal](https://www.caclimateinvestments.ca.gov/cci-mapping)
- **File:** `data_raw/cci_programs_data.csv` (242MB) / `data/cci_programs_data_reduced.csv` (40MB processed)
### Supplementary Data
- **CalEnviroScreen 4.0** — used to identify and score disadvantaged communities (DACs)
- **CA State Assembly Districts 2020** — shapefiles for regional geographic analysis
> **Note:** Large raw data files and shapefiles are tracked with Git LFS.
## Key Findings (from data summary)
| Metric | Value |
|--------|-------|
| Total projects | 146,305 |
| Total funding | $11.59 billion |
| GHG reductions | 112.7 million metric tons |
| CARB projects | 125,581 (85.8%) |
| EV voucher projects | 109,270 (74.7%) |
| Median GHG efficiency | $312.5 / metric ton |
| Avg. DAC benefit multiplier | 1.3x |
## License
This project is for research and educational purposes. The underlying CCI data is publicly available from the California Air Resources Board.