115 lines
4.4 KiB
Markdown
115 lines
4.4 KiB
Markdown
# California Climate Investments (CCI) Analysis
|
|
|
|
A data analysis project examining collaboration patterns in California's climate funding programs and their impact on greenhouse gas (GHG) reduction efficiency and equity outcomes — particularly in disadvantaged communities (DACs).
|
|
|
|
## Overview
|
|
|
|
This project analyzes the [California Climate Investments](https://www.caclimateinvestments.ca.gov/) dataset to answer questions like:
|
|
|
|
- How does multi-agency collaboration affect program effectiveness?
|
|
- What are the regional variations in climate investment patterns?
|
|
- How do EV voucher/rebate programs (CARB) compare to other programs?
|
|
- Where do GHG efficiency and equity goals align or conflict?
|
|
- How have collaboration and funding patterns changed over time?
|
|
|
|
The dataset covers **146,305 projects** across **21 agencies** and **39 programs**, representing **$11.59 billion** in total funding and **112.7 million metric tons** of GHG reductions.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
california-equity-git/
|
|
├── run_cci_analysis.py # Main entry point — orchestrates full workflow
|
|
├── cci_analyzer.py # CCIDataAnalyzer class (data loading/cleaning)
|
|
├── cci_collaboration_analysis.py # Collaboration pattern analysis
|
|
├── research_analysis_script.py # Research question analysis
|
|
├── regional_analysis_script.py # Regional distribution analysis
|
|
├── collaboration_detection_script.py # Collaboration pattern detection
|
|
├── data_cleaning_script.py # Data cleaning utilities
|
|
├── 01_analyzer.ipynb # Interactive Jupyter notebook
|
|
│
|
|
├── data/
|
|
│ └── cci_programs_data_reduced.csv # Processed dataset (~40MB)
|
|
├── data_raw/
|
|
│ └── cci_programs_data.csv # Original CCI dataset (~242MB)
|
|
│
|
|
├── california_enviroscreen/ # CalEnviroScreen 4.0 data (geodatabase + shapefiles)
|
|
├── assembly_district_shapefile/ # CA State Assembly Districts 2020 shapefiles
|
|
│
|
|
└── output/ # Generated analysis outputs and visualizations
|
|
```
|
|
|
|
## Tech Stack
|
|
|
|
- **Python 3.12+**
|
|
- **pandas / numpy** — data manipulation
|
|
- **matplotlib / seaborn** — visualization
|
|
- **geopandas / shapely / pyproj** — geospatial analysis
|
|
- **scipy** — statistical testing
|
|
- **scikit-learn** — data preprocessing
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
# Clone the repo
|
|
git clone <repo-url>
|
|
cd california-equity-git
|
|
|
|
# Create and activate a virtual environment
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
|
|
# Install dependencies
|
|
pip install pandas numpy matplotlib seaborn geopandas shapely pyproj scipy scikit-learn
|
|
```
|
|
|
|
## Usage
|
|
|
|
Run the full analysis pipeline:
|
|
|
|
```bash
|
|
python run_cci_analysis.py --data_path data/cci_programs_data_reduced.csv --output_dir output
|
|
```
|
|
|
|
**Optional flags:**
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--skip_cleaning` | Skip data cleaning step (use pre-cleaned data) |
|
|
| `--skip_analysis` | Skip collaboration analysis step |
|
|
| `--skip_research` | Skip research question analysis |
|
|
|
|
**Pipeline stages:**
|
|
1. **Data cleaning** — standardizes columns, parses dates, extracts coordinates, calculates derived metrics
|
|
2. **Collaboration analysis** — identifies multi-agency programs and collaboration patterns
|
|
3. **Research analysis** — answers core research questions with statistical tests and visualizations
|
|
|
|
Output files (charts, CSVs, summaries) are written to the `--output_dir` directory.
|
|
|
|
## Data
|
|
|
|
### Primary Dataset
|
|
- **Source:** [California Climate Investments Open Data Portal](https://www.caclimateinvestments.ca.gov/cci-mapping)
|
|
- **File:** `data_raw/cci_programs_data.csv` (242MB) / `data/cci_programs_data_reduced.csv` (40MB processed)
|
|
|
|
### Supplementary Data
|
|
- **CalEnviroScreen 4.0** — used to identify and score disadvantaged communities (DACs)
|
|
- **CA State Assembly Districts 2020** — shapefiles for regional geographic analysis
|
|
|
|
> **Note:** Large raw data files and shapefiles are tracked with Git LFS.
|
|
|
|
## Key Findings (from data summary)
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Total projects | 146,305 |
|
|
| Total funding | $11.59 billion |
|
|
| GHG reductions | 112.7 million metric tons |
|
|
| CARB projects | 125,581 (85.8%) |
|
|
| EV voucher projects | 109,270 (74.7%) |
|
|
| Median GHG efficiency | $312.5 / metric ton |
|
|
| Avg. DAC benefit multiplier | 1.3x |
|
|
|
|
## License
|
|
|
|
This project is for research and educational purposes. The underlying CCI data is publicly available from the California Air Resources Board.
|