From 4374df3f39ec02522f1aff806f5dae2054de4a42 Mon Sep 17 00:00:00 2001 From: dadams Date: Mon, 9 Mar 2026 08:21:16 -0700 Subject: [PATCH] added README.md --- README.md | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 113 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 41a75a0a..60228b07 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,114 @@ -# california_equity_git +# California Climate Investments (CCI) Analysis +A data analysis project examining collaboration patterns in California's climate funding programs and their impact on greenhouse gas (GHG) reduction efficiency and equity outcomes — particularly in disadvantaged communities (DACs). + +## Overview + +This project analyzes the [California Climate Investments](https://www.caclimateinvestments.ca.gov/) dataset to answer questions like: + +- How does multi-agency collaboration affect program effectiveness? +- What are the regional variations in climate investment patterns? +- How do EV voucher/rebate programs (CARB) compare to other programs? +- Where do GHG efficiency and equity goals align or conflict? +- How have collaboration and funding patterns changed over time? + +The dataset covers **146,305 projects** across **21 agencies** and **39 programs**, representing **$11.59 billion** in total funding and **112.7 million metric tons** of GHG reductions. + +## Project Structure + +``` +california-equity-git/ +├── run_cci_analysis.py # Main entry point — orchestrates full workflow +├── cci_analyzer.py # CCIDataAnalyzer class (data loading/cleaning) +├── cci_collaboration_analysis.py # Collaboration pattern analysis +├── research_analysis_script.py # Research question analysis +├── regional_analysis_script.py # Regional distribution analysis +├── collaboration_detection_script.py # Collaboration pattern detection +├── data_cleaning_script.py # Data cleaning utilities +├── 01_analyzer.ipynb # Interactive Jupyter notebook +│ +├── data/ +│ └── cci_programs_data_reduced.csv # Processed dataset (~40MB) +├── data_raw/ +│ └── cci_programs_data.csv # Original CCI dataset (~242MB) +│ +├── california_enviroscreen/ # CalEnviroScreen 4.0 data (geodatabase + shapefiles) +├── assembly_district_shapefile/ # CA State Assembly Districts 2020 shapefiles +│ +└── output/ # Generated analysis outputs and visualizations +``` + +## Tech Stack + +- **Python 3.12+** +- **pandas / numpy** — data manipulation +- **matplotlib / seaborn** — visualization +- **geopandas / shapely / pyproj** — geospatial analysis +- **scipy** — statistical testing +- **scikit-learn** — data preprocessing + +## Setup + +```bash +# Clone the repo +git clone +cd california-equity-git + +# Create and activate a virtual environment +python3 -m venv .venv +source .venv/bin/activate + +# Install dependencies +pip install pandas numpy matplotlib seaborn geopandas shapely pyproj scipy scikit-learn +``` + +## Usage + +Run the full analysis pipeline: + +```bash +python run_cci_analysis.py --data_path data/cci_programs_data_reduced.csv --output_dir output +``` + +**Optional flags:** + +| Flag | Description | +|------|-------------| +| `--skip_cleaning` | Skip data cleaning step (use pre-cleaned data) | +| `--skip_analysis` | Skip collaboration analysis step | +| `--skip_research` | Skip research question analysis | + +**Pipeline stages:** +1. **Data cleaning** — standardizes columns, parses dates, extracts coordinates, calculates derived metrics +2. **Collaboration analysis** — identifies multi-agency programs and collaboration patterns +3. **Research analysis** — answers core research questions with statistical tests and visualizations + +Output files (charts, CSVs, summaries) are written to the `--output_dir` directory. + +## Data + +### Primary Dataset +- **Source:** [California Climate Investments Open Data Portal](https://www.caclimateinvestments.ca.gov/cci-mapping) +- **File:** `data_raw/cci_programs_data.csv` (242MB) / `data/cci_programs_data_reduced.csv` (40MB processed) + +### Supplementary Data +- **CalEnviroScreen 4.0** — used to identify and score disadvantaged communities (DACs) +- **CA State Assembly Districts 2020** — shapefiles for regional geographic analysis + +> **Note:** Large raw data files and shapefiles are tracked with Git LFS. + +## Key Findings (from data summary) + +| Metric | Value | +|--------|-------| +| Total projects | 146,305 | +| Total funding | $11.59 billion | +| GHG reductions | 112.7 million metric tons | +| CARB projects | 125,581 (85.8%) | +| EV voucher projects | 109,270 (74.7%) | +| Median GHG efficiency | $312.5 / metric ton | +| Avg. DAC benefit multiplier | 1.3x | + +## License + +This project is for research and educational purposes. The underlying CCI data is publicly available from the California Air Resources Board.