4.4 KiB
California Climate Investments (CCI) Analysis
A data analysis project examining collaboration patterns in California's climate funding programs and their impact on greenhouse gas (GHG) reduction efficiency and equity outcomes — particularly in disadvantaged communities (DACs).
Overview
This project analyzes the California Climate Investments dataset to answer questions like:
- How does multi-agency collaboration affect program effectiveness?
- What are the regional variations in climate investment patterns?
- How do EV voucher/rebate programs (CARB) compare to other programs?
- Where do GHG efficiency and equity goals align or conflict?
- How have collaboration and funding patterns changed over time?
The dataset covers 146,305 projects across 21 agencies and 39 programs, representing $11.59 billion in total funding and 112.7 million metric tons of GHG reductions.
Project Structure
california-equity-git/
├── run_cci_analysis.py # Main entry point — orchestrates full workflow
├── cci_analyzer.py # CCIDataAnalyzer class (data loading/cleaning)
├── cci_collaboration_analysis.py # Collaboration pattern analysis
├── research_analysis_script.py # Research question analysis
├── regional_analysis_script.py # Regional distribution analysis
├── collaboration_detection_script.py # Collaboration pattern detection
├── data_cleaning_script.py # Data cleaning utilities
├── 01_analyzer.ipynb # Interactive Jupyter notebook
│
├── data/
│ └── cci_programs_data_reduced.csv # Processed dataset (~40MB)
├── data_raw/
│ └── cci_programs_data.csv # Original CCI dataset (~242MB)
│
├── california_enviroscreen/ # CalEnviroScreen 4.0 data (geodatabase + shapefiles)
├── assembly_district_shapefile/ # CA State Assembly Districts 2020 shapefiles
│
└── output/ # Generated analysis outputs and visualizations
Tech Stack
- Python 3.12+
- pandas / numpy — data manipulation
- matplotlib / seaborn — visualization
- geopandas / shapely / pyproj — geospatial analysis
- scipy — statistical testing
- scikit-learn — data preprocessing
Setup
# Clone the repo
git clone <repo-url>
cd california-equity-git
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install pandas numpy matplotlib seaborn geopandas shapely pyproj scipy scikit-learn
Usage
Run the full analysis pipeline:
python run_cci_analysis.py --data_path data/cci_programs_data_reduced.csv --output_dir output
Optional flags:
| Flag | Description |
|---|---|
--skip_cleaning |
Skip data cleaning step (use pre-cleaned data) |
--skip_analysis |
Skip collaboration analysis step |
--skip_research |
Skip research question analysis |
Pipeline stages:
- Data cleaning — standardizes columns, parses dates, extracts coordinates, calculates derived metrics
- Collaboration analysis — identifies multi-agency programs and collaboration patterns
- Research analysis — answers core research questions with statistical tests and visualizations
Output files (charts, CSVs, summaries) are written to the --output_dir directory.
Data
Primary Dataset
- Source: California Climate Investments Open Data Portal
- File:
data_raw/cci_programs_data.csv(242MB) /data/cci_programs_data_reduced.csv(40MB processed)
Supplementary Data
- CalEnviroScreen 4.0 — used to identify and score disadvantaged communities (DACs)
- CA State Assembly Districts 2020 — shapefiles for regional geographic analysis
Note: Large raw data files and shapefiles are tracked with Git LFS.
Key Findings (from data summary)
| Metric | Value |
|---|---|
| Total projects | 146,305 |
| Total funding | $11.59 billion |
| GHG reductions | 112.7 million metric tons |
| CARB projects | 125,581 (85.8%) |
| EV voucher projects | 109,270 (74.7%) |
| Median GHG efficiency | $312.5 / metric ton |
| Avg. DAC benefit multiplier | 1.3x |
License
This project is for research and educational purposes. The underlying CCI data is publicly available from the California Air Resources Board.