Files
california-equity-git/README.md
2026-03-09 08:21:16 -07:00

4.4 KiB

California Climate Investments (CCI) Analysis

A data analysis project examining collaboration patterns in California's climate funding programs and their impact on greenhouse gas (GHG) reduction efficiency and equity outcomes — particularly in disadvantaged communities (DACs).

Overview

This project analyzes the California Climate Investments dataset to answer questions like:

  • How does multi-agency collaboration affect program effectiveness?
  • What are the regional variations in climate investment patterns?
  • How do EV voucher/rebate programs (CARB) compare to other programs?
  • Where do GHG efficiency and equity goals align or conflict?
  • How have collaboration and funding patterns changed over time?

The dataset covers 146,305 projects across 21 agencies and 39 programs, representing $11.59 billion in total funding and 112.7 million metric tons of GHG reductions.

Project Structure

california-equity-git/
├── run_cci_analysis.py                 # Main entry point — orchestrates full workflow
├── cci_analyzer.py                     # CCIDataAnalyzer class (data loading/cleaning)
├── cci_collaboration_analysis.py       # Collaboration pattern analysis
├── research_analysis_script.py         # Research question analysis
├── regional_analysis_script.py         # Regional distribution analysis
├── collaboration_detection_script.py   # Collaboration pattern detection
├── data_cleaning_script.py             # Data cleaning utilities
├── 01_analyzer.ipynb                   # Interactive Jupyter notebook
│
├── data/
│   └── cci_programs_data_reduced.csv   # Processed dataset (~40MB)
├── data_raw/
│   └── cci_programs_data.csv           # Original CCI dataset (~242MB)
│
├── california_enviroscreen/            # CalEnviroScreen 4.0 data (geodatabase + shapefiles)
├── assembly_district_shapefile/        # CA State Assembly Districts 2020 shapefiles
│
└── output/                             # Generated analysis outputs and visualizations

Tech Stack

  • Python 3.12+
  • pandas / numpy — data manipulation
  • matplotlib / seaborn — visualization
  • geopandas / shapely / pyproj — geospatial analysis
  • scipy — statistical testing
  • scikit-learn — data preprocessing

Setup

# Clone the repo
git clone <repo-url>
cd california-equity-git

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install pandas numpy matplotlib seaborn geopandas shapely pyproj scipy scikit-learn

Usage

Run the full analysis pipeline:

python run_cci_analysis.py --data_path data/cci_programs_data_reduced.csv --output_dir output

Optional flags:

Flag Description
--skip_cleaning Skip data cleaning step (use pre-cleaned data)
--skip_analysis Skip collaboration analysis step
--skip_research Skip research question analysis

Pipeline stages:

  1. Data cleaning — standardizes columns, parses dates, extracts coordinates, calculates derived metrics
  2. Collaboration analysis — identifies multi-agency programs and collaboration patterns
  3. Research analysis — answers core research questions with statistical tests and visualizations

Output files (charts, CSVs, summaries) are written to the --output_dir directory.

Data

Primary Dataset

Supplementary Data

  • CalEnviroScreen 4.0 — used to identify and score disadvantaged communities (DACs)
  • CA State Assembly Districts 2020 — shapefiles for regional geographic analysis

Note: Large raw data files and shapefiles are tracked with Git LFS.

Key Findings (from data summary)

Metric Value
Total projects 146,305
Total funding $11.59 billion
GHG reductions 112.7 million metric tons
CARB projects 125,581 (85.8%)
EV voucher projects 109,270 (74.7%)
Median GHG efficiency $312.5 / metric ton
Avg. DAC benefit multiplier 1.3x

License

This project is for research and educational purposes. The underlying CCI data is publicly available from the California Air Resources Board.