# California Climate Investments (CCI) Analysis A data analysis project examining collaboration patterns in California's climate funding programs and their impact on greenhouse gas (GHG) reduction efficiency and equity outcomes — particularly in disadvantaged communities (DACs). ## Overview This project analyzes the [California Climate Investments](https://www.caclimateinvestments.ca.gov/) dataset to answer questions like: - How does multi-agency collaboration affect program effectiveness? - What are the regional variations in climate investment patterns? - How do EV voucher/rebate programs (CARB) compare to other programs? - Where do GHG efficiency and equity goals align or conflict? - How have collaboration and funding patterns changed over time? The dataset covers **146,305 projects** across **21 agencies** and **39 programs**, representing **$11.59 billion** in total funding and **112.7 million metric tons** of GHG reductions. ## Project Structure ``` california-equity-git/ ├── run_cci_analysis.py # Main entry point — orchestrates full workflow ├── cci_analyzer.py # CCIDataAnalyzer class (data loading/cleaning) ├── cci_collaboration_analysis.py # Collaboration pattern analysis ├── research_analysis_script.py # Research question analysis ├── regional_analysis_script.py # Regional distribution analysis ├── collaboration_detection_script.py # Collaboration pattern detection ├── data_cleaning_script.py # Data cleaning utilities ├── 01_analyzer.ipynb # Interactive Jupyter notebook │ ├── data/ │ └── cci_programs_data_reduced.csv # Processed dataset (~40MB) ├── data_raw/ │ └── cci_programs_data.csv # Original CCI dataset (~242MB) │ ├── california_enviroscreen/ # CalEnviroScreen 4.0 data (geodatabase + shapefiles) ├── assembly_district_shapefile/ # CA State Assembly Districts 2020 shapefiles │ └── output/ # Generated analysis outputs and visualizations ``` ## Tech Stack - **Python 3.12+** - **pandas / numpy** — data manipulation - **matplotlib / seaborn** — visualization - **geopandas / shapely / pyproj** — geospatial analysis - **scipy** — statistical testing - **scikit-learn** — data preprocessing ## Setup ```bash # Clone the repo git clone cd california-equity-git # Create and activate a virtual environment python3 -m venv .venv source .venv/bin/activate # Install dependencies pip install pandas numpy matplotlib seaborn geopandas shapely pyproj scipy scikit-learn ``` ## Usage Run the full analysis pipeline: ```bash python run_cci_analysis.py --data_path data/cci_programs_data_reduced.csv --output_dir output ``` **Optional flags:** | Flag | Description | |------|-------------| | `--skip_cleaning` | Skip data cleaning step (use pre-cleaned data) | | `--skip_analysis` | Skip collaboration analysis step | | `--skip_research` | Skip research question analysis | **Pipeline stages:** 1. **Data cleaning** — standardizes columns, parses dates, extracts coordinates, calculates derived metrics 2. **Collaboration analysis** — identifies multi-agency programs and collaboration patterns 3. **Research analysis** — answers core research questions with statistical tests and visualizations Output files (charts, CSVs, summaries) are written to the `--output_dir` directory. ## Data ### Primary Dataset - **Source:** [California Climate Investments Open Data Portal](https://www.caclimateinvestments.ca.gov/cci-mapping) - **File:** `data_raw/cci_programs_data.csv` (242MB) / `data/cci_programs_data_reduced.csv` (40MB processed) ### Supplementary Data - **CalEnviroScreen 4.0** — used to identify and score disadvantaged communities (DACs) - **CA State Assembly Districts 2020** — shapefiles for regional geographic analysis > **Note:** Large raw data files and shapefiles are tracked with Git LFS. ## Key Findings (from data summary) | Metric | Value | |--------|-------| | Total projects | 146,305 | | Total funding | $11.59 billion | | GHG reductions | 112.7 million metric tons | | CARB projects | 125,581 (85.8%) | | EV voucher projects | 109,270 (74.7%) | | Median GHG efficiency | $312.5 / metric ton | | Avg. DAC benefit multiplier | 1.3x | ## License This project is for research and educational purposes. The underlying CCI data is publicly available from the California Air Resources Board.