## Title: Evaluating Equity and Impact
### Author: David P. Adams

%pip install pandas numpy matplotlib seaborn

Requirement already satisfied: pandas in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (2.2.3)
Requirement already satisfied: numpy in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (2.1.2)
Requirement already satisfied: matplotlib in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (3.9.2)
Requirement already satisfied: seaborn in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (0.13.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from pandas) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from pandas) (2024.2)
Requirement already satisfied: contourpy>=1.0.1 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=8 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from matplotlib) (3.2.0)
Requirement already satisfied: six>=1.5 in /home/dadams/Repos/california_equity_git/.venv/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

# Importing the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

## set directory
import os
os.chdir('/home/dadams/Repos/california_equity_git')

# read in the data
data = pd.read_csv('data_raw/cci_programs_data.csv', low_memory=False)

columns_data = pd.DataFrame(data.columns)

# export the 'DataFrame' to a csv file
columns_data.to_csv('data_raw/columns.csv', index=False)

# check the data types
data.dtypes

# save the data types to a csv file
data.dtypes.to_csv('data_raw/data_types.csv', header = False)

# describe the data
data.describe()

import geopandas as gpd

# Load the shapefile
shapefile_path = '/home/dadams/Repos/california_equity_git/california_enviroscreen/calif_enviroscreen_shape/CES4 Final Shapefile.shp'
gdf = gpd.read_file(shapefile_path)

# Print the head of the GeoDataFrame
print(gdf.head())

          Tract    ZIP         County    ApproxLoc  TotPop19    CIscore  \
0  6.083002e+09  93454  Santa Barbara  Santa Maria      4495  36.019653   
1  6.083002e+09  93455  Santa Barbara  Santa Maria     13173  37.030667   
2  6.083002e+09  93454  Santa Barbara  Santa Maria      2398  31.213140   
3  6.083002e+09  93455  Santa Barbara       Orcutt      4496   6.639331   
4  6.083002e+09  93455  Santa Barbara       Orcutt      4008  14.022852   

    CIscoreP     Ozone     OzoneP     PM2_5  ...  Elderly65  Hispanic  \
0  69.162885  0.034190  10.566273  7.567724  ...    12.5028   68.9210   
1  70.637922  0.035217  11.561917  7.624775  ...     5.3519   78.6229   
2  61.069087  0.034190  10.566273  7.548835  ...    12.8857   65.7214   
3   5.988401  0.036244  13.615432  7.660570  ...    14.4128   22.9537   
4  23.121533  0.036244  13.615432  7.663210  ...    18.8872   33.4082   

     White  AfricanAm  NativeAm  OtherMult    Shape_Leng    Shape_Area  \
0  20.8899     0.4004    0.2670     1.3126   6999.357689  2.847611e+06   
1  13.2240     2.5051    0.0000     0.9489  19100.578232  1.635292e+07   
2  30.6088     0.9591    0.0000     2.1685   4970.985897  1.352329e+06   
3  69.1948     0.9342    0.7117     2.5356   6558.956012  2.417717e+06   
4  59.7804     0.6986    1.4721     1.3723   6570.368730  2.608422e+06   

     AAPI                                           geometry  
0  8.2091  POLYGON ((-39795.07 -341919.191, -38126.384 -3...  
1  4.6990  POLYGON ((-39795.07 -341919.191, -39803.632 -3...  
2  0.5421  POLYGON ((-38115.747 -341130.248, -38126.384 -...  
3  3.6699  POLYGON ((-37341.662 -348530.437, -37252.307 -...  
4  3.2685  POLYGON ((-39465.107 -348499.262, -38244.305 -...  

[5 rows x 67 columns]

# Let's look at the actual column names in the merged dataset
print("Columns containing 'County':")
print([col for col in data.columns if 'County' in col])

# Then modify our analysis using the correct county column name:
geographic_analysis = data.groupby(['County', 'project_category']).agg({
    'Total Project Cost': 'sum',
    'Total Project GHGReductions': 'sum',
    'Total GGRFDisadvantaged Community Funding': 'sum'
}).reset_index()

# Rest of the analysis:
# 1. First look at program distributions
print("\nProgram Types:")
print(data['Program Name'].value_counts().head(10))
print("\nProject Types:")
print(data['Project Type'].value_counts())
print("\nFunding Recipient Types:")
print(data['PRIMARY_FUNDING_RECIPIENT_TYPE'].value_counts())

Columns containing 'County':
['County']

Program Types:
Program Name
Low Carbon Transportation                                                    105965
Funding Agricultural Replacement Measures for Emission Reductions Program      8554
Low-Income Weatherization Program                                              6468
Community Air Protection                                                       5514
Water-Energy Efficiency                                                        5111
Climate Smart Agriculture                                                      1688
Low Carbon Transit Operations Program                                          1003
Transformative Climate Communities                                              973
Urban and Community Forestry Program                                            967
Woodsmoke Reduction Program                                                     959
Name: count, dtype: int64

Project Types:
Project Type
Voucher                                                2808
Voucher Incentive                                      1730
Air Filtration (Air Filtration)                        1570
Water Use and Energy Efficiency                        1217
Off-Road Equipment - Mobile Agricultural (Tractors)     891
                                                       ... 
Procurement Requirement                                   1
Education and outreach and enforcement/inspections        1
Personnel, education and outreach, and procurement        1
Education & Outreach and Planning                         1
Equipment, Education and Outreach                         1
Name: count, Length: 1007, dtype: int64

Funding Recipient Types:
PRIMARY_FUNDING_RECIPIENT_TYPE
Business                            423
Government Agency                   285
Nonprofit                           113
Educational institution              21
California Native American Tribe      4
Government agency                     2
Name: count, dtype: int64

# 1. Fix the classification function
def classify_project_type_v2(row):
    program = str(row['Program Name']).lower()
    project_type = str(row['Project Type']).lower()
    
    # Individual incentive programs
    if program == 'low carbon transportation':
        if 'voucher' in project_type.lower():
            return 'Individual - Vehicle'
        return 'Transportation - Other'
    elif program in ['low-income weatherization program', 'woodsmoke reduction program']:
        return 'Individual - Home Energy'
    elif program == 'water-energy efficiency':
        return 'Water-Energy Projects'
    elif program == 'community air protection':
        return 'Air Quality Projects'
    elif program == 'climate smart agriculture':
        return 'Agricultural Projects'
    elif 'voucher' in project_type.lower():
        return 'Individual - Other'
    elif program == 'transformative climate communities':
        return 'Community Projects'
    
    # Institutional by recipient type
    elif pd.notna(row['PRIMARY_FUNDING_RECIPIENT_TYPE']):
        recipient = str(row['PRIMARY_FUNDING_RECIPIENT_TYPE'])
        if recipient in ['Government Agency', 'Government agency', 'California Native American Tribe']:
            return 'Institutional - Government'
        elif recipient == 'Educational institution':
            return 'Institutional - Education'
        elif recipient == 'Nonprofit':
            return 'Institutional - Nonprofit'
        elif recipient == 'Business':
            if row['Total Project Cost'] > 100000:
                return 'Institutional - Large Business'
            return 'Business - Small'
    
    return 'Other'

# 2. Add safe division function for metrics
def safe_divide(a, b):
    if b == 0:
        return 0
    return a / b

# 3. Recalculate metrics
data['project_category'] = data.apply(classify_project_type_v2, axis=1)
data['dac_funding_ratio'] = data.apply(lambda x: safe_divide(x['Total GGRFDisadvantaged Community Funding'], x['Total Project Cost']), axis=1)
data['ghg_per_dollar'] = data.apply(lambda x: safe_divide(x['Total Project GHGReductions'], x['Total Project Cost']), axis=1)

# 4. Rerun analysis
updated_analysis = data.groupby('project_category').agg({
    'Total Project Cost': ['count', 'sum', 'mean'],
    'Total Project GHGReductions': ['sum', 'mean'],
    'dac_funding_ratio': ['mean', 'median'],
    'ghg_per_dollar': ['mean', 'median']
}).round(4)

print("\nUpdated Project Category Distribution:")
print(data['project_category'].value_counts())
print("\nUpdated Analysis:")
print(updated_analysis)

Updated Project Category Distribution:
project_category
Transportation - Other            101427
Other                              14340
Individual - Home Energy            7427
Air Quality Projects                5514
Water-Energy Projects               5111
Individual - Vehicle                4538
Agricultural Projects               1688
Community Projects                   973
Institutional - Government           269
Institutional - Nonprofit            100
Institutional - Large Business        23
Institutional - Education             17
Business - Small                       2
Name: count, dtype: int64

Updated Analysis:
                               Total Project Cost                              \
                                            count           sum          mean   
project_category                                                                
Agricultural Projects                        1688     886514974  5.251866e+05   
Air Quality Projects                         5514    1211608583  2.197331e+05   
Business - Small                                2         65376  3.268800e+04   
Community Projects                            973     508817727  5.229370e+05   
Individual - Home Energy                     7427     320008647  4.308720e+04   
Individual - Vehicle                         4538    1286418141  2.834769e+05   
Institutional - Education                      17       5561672  3.271572e+05   
Institutional - Government                    269   16939974420  6.297388e+07   
Institutional - Large Business                 23     513921827  2.234443e+07   
Institutional - Nonprofit                     100    1402247011  1.402247e+07   
Other                                       14340  104572013338  7.292330e+06   
Transportation - Other                     101427    2473090031  2.438296e+04   
Water-Energy Projects                        5111      85125916  1.665543e+04   

                               Total Project GHGReductions              \
                                                       sum        mean   
project_category                                                         
Agricultural Projects                             24249378  14365.7453   
Air Quality Projects                                250801     45.4844   
Business - Small                                         0      0.0000   
Community Projects                                  154010    158.2837   
Individual - Home Energy                            693914     93.4313   
Individual - Vehicle                                993992    219.0375   
Institutional - Education                              457     26.8824   
Institutional - Government                         6580289  24462.0409   
Institutional - Large Business                      388663  16898.3913   
Institutional - Nonprofit                          4791834  47918.3400   
Other                                             67313019   4694.0738   
Transportation - Other                             3312930     32.6632   
Water-Energy Projects                               425225     83.1980   

                               dac_funding_ratio         ghg_per_dollar  \
                                            mean  median           mean   
project_category                                                          
Agricultural Projects                     0.2388  0.0000         0.0106   
Air Quality Projects                         NaN     NaN         0.0002   
Business - Small                             NaN     NaN         0.0000   
Community Projects                           NaN     NaN         0.0004   
Individual - Home Energy                  0.6504  0.7405         0.0038   
Individual - Vehicle                      0.1660  0.0245         0.0017   
Institutional - Education                    NaN     NaN         0.0008   
Institutional - Government                0.0000  0.0000         0.0022   
Institutional - Large Business               NaN     NaN         0.0038   
Institutional - Nonprofit                 0.0000  0.0000         0.0182   
Other                                     0.1483  0.0000         0.0023   
Transportation - Other                    0.4190  0.0000         0.0027   
Water-Energy Projects                     0.4359  0.0000         0.1027   

                                        
                                median  
project_category                        
Agricultural Projects           0.0051  
Air Quality Projects            0.0000  
Business - Small                0.0000  
Community Projects              0.0000  
Individual - Home Energy        0.0026  
Individual - Vehicle            0.0010  
Institutional - Education       0.0000  
Institutional - Government      0.0001  
Institutional - Large Business  0.0005  
Institutional - Nonprofit       0.0004  
Other                           0.0001  
Transportation - Other          0.0032  
Water-Energy Projects           0.0096

import matplotlib.pyplot as plt
import seaborn as sns

# Prepare data for plotting
plot_data = data.groupby('project_category').agg({
    'Total Project Cost': ['count', 'mean'],
    'dac_funding_ratio': 'mean',
    'ghg_per_dollar': 'mean'
}).reset_index()

# Flatten column names
plot_data.columns = ['project_category', 'count', 'avg_cost', 'dac_ratio', 'ghg_efficiency']

# Create figure with multiple subplots
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(15, 20))

# 1. Project Counts
sns.barplot(data=plot_data, 
           x='count', 
           y='project_category', 
           ax=ax1,
           palette='viridis')
ax1.set_title('Number of Projects by Category')
ax1.set_xlabel('Number of Projects')

# 2. DAC Funding Ratio
mask = plot_data['dac_ratio'].notna()  # Filter out NaN values
sns.barplot(data=plot_data[mask], 
           x='dac_ratio', 
           y='project_category', 
           ax=ax2,
           palette='viridis')
ax2.set_title('Disadvantaged Community Funding Ratio by Category')
ax2.set_xlabel('DAC Funding Ratio')

# 3. GHG Efficiency
mask = plot_data['ghg_efficiency'].notna()  # Filter out NaN values
sns.barplot(data=plot_data[mask], 
           x='ghg_efficiency', 
           y='project_category', 
           ax=ax3,
           palette='viridis')
ax3.set_title('GHG Reduction Efficiency by Category (tons CO2e per dollar)')
ax3.set_xlabel('GHG Reduction per Dollar')

plt.tight_layout()
plt.show()

# Create a scatter plot to show relationship between project size and efficiency
plt.figure(figsize=(12, 8))
plt.scatter(plot_data['avg_cost'], 
           plot_data['ghg_efficiency'], 
           alpha=0.6)

# Add labels for each point
for i, row in plot_data.iterrows():
    plt.annotate(row['project_category'], 
                (row['avg_cost'], row['ghg_efficiency']),
                xytext=(5, 5), 
                textcoords='offset points')

plt.xscale('log')  # Use log scale for cost due to wide range
plt.xlabel('Average Project Cost (log scale)')
plt.ylabel('GHG Reduction Efficiency (tons CO2e per dollar)')
plt.title('Project Cost vs GHG Reduction Efficiency')
plt.grid(True)
plt.show()

/tmp/ipykernel_738944/3534785368.py:18: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(data=plot_data,
/tmp/ipykernel_738944/3534785368.py:28: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(data=plot_data[mask],
/tmp/ipykernel_738944/3534785368.py:38: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(data=plot_data[mask],

# First let's see how many counties we have data for and their distribution
county_analysis = data.groupby(['County', 'project_category']).agg({
    'Total Project Cost': ['count', 'sum'],
    'Total Project GHGReductions': 'sum',
    'dac_funding_ratio': 'mean'
}).round(2)

# Reset index for easier manipulation
county_analysis = county_analysis.reset_index()

# Flatten column names
county_analysis.columns = ['County', 'project_category', 'project_count', 'total_cost', 'total_ghg', 'dac_ratio']

print("\nTop 10 Counties by Project Count:")
print(data['County'].value_counts().head(10))

print("\nProject Categories by County (top 5 counties):")
top_counties = data['County'].value_counts().head(5).index
for county in top_counties:
    print(f"\n{county}:")
    county_cats = data[data['County'] == county]['project_category'].value_counts()
    print(county_cats)

# Calculate percentage of projects in each county by category
county_category_pcts = pd.crosstab(data['County'], 
                                 data['project_category'], 
                                 values=data['Total Project Cost'],
                                 aggfunc='sum',
                                 normalize='index') * 100

print("\nPercentage of Funding by Category in Top 5 Counties:")
print(county_category_pcts.loc[top_counties].round(2))

Top 10 Counties by Project Count:
County
Los Angeles       38032
Orange            10694
San Diego          8468
Fresno             7854
Santa Clara        7154
Alameda            6381
Riverside          6190
San Bernardino     5159
Sacramento         4781
San Joaquin        4202
Name: count, dtype: int64

Project Categories by County (top 5 counties):

Los Angeles:
project_category
Transportation - Other            30411
Individual - Home Energy           2233
Individual - Vehicle               2187
Water-Energy Projects              1945
Other                               794
Air Quality Projects                352
Community Projects                   79
Institutional - Government           19
Institutional - Nonprofit             6
Agricultural Projects                 3
Institutional - Large Business        3
Name: count, dtype: int64

Orange:
project_category
Transportation - Other        9866
Individual - Vehicle           290
Individual - Home Energy       184
Water-Energy Projects          169
Other                          129
Air Quality Projects            51
Institutional - Government       4
Community Projects               1
Name: count, dtype: int64

San Diego:
project_category
Transportation - Other        7571
Other                          375
Individual - Vehicle           294
Air Quality Projects           105
Individual - Home Energy        83
Agricultural Projects           30
Institutional - Government       6
Institutional - Nonprofit        3
Community Projects               1
Name: count, dtype: int64

Fresno:
project_category
Transportation - Other            2541
Other                             2144
Air Quality Projects              1747
Individual - Home Energy           769
Water-Energy Projects              338
Agricultural Projects              142
Individual - Vehicle                96
Community Projects                  64
Institutional - Government           7
Institutional - Education            3
Institutional - Large Business       2
Institutional - Nonprofit            1
Name: count, dtype: int64

Santa Clara:
project_category
Transportation - Other            6214
Water-Energy Projects              530
Individual - Vehicle               177
Other                               89
Individual - Home Energy            74
Air Quality Projects                50
Agricultural Projects               14
Institutional - Government           3
Institutional - Nonprofit            2
Institutional - Large Business       1
Name: count, dtype: int64

Percentage of Funding by Category in Top 5 Counties:
project_category  Agricultural Projects  Air Quality Projects  \
County                                                          
Los Angeles                        0.00                  0.36   
Orange                             0.00                  1.44   
San Diego                          0.05                  1.61   
Fresno                             6.12                 10.44   
Santa Clara                        0.03                  0.61   

project_category  Business - Small  Community Projects  \
County                                                   
Los Angeles                    0.0                0.14   
Orange                         0.0                0.15   
San Diego                      0.0                0.00   
Fresno                         0.0               17.55   
Santa Clara                    0.0                0.00   

project_category  Individual - Home Energy  Individual - Vehicle  \
County                                                             
Los Angeles                           0.06                  0.91   
Orange                                0.09                  2.90   
San Diego                             0.10                  1.45   
Fresno                                4.74                  2.50   
Santa Clara                           0.12                  2.07   

project_category  Institutional - Education  Institutional - Government  \
County                                                                    
Los Angeles                            0.00                        5.35   
Orange                                 0.00                       37.65   
San Diego                              0.00                        2.90   
Fresno                                 0.02                        0.73   
Santa Clara                            0.00                       15.35   

project_category  Institutional - Large Business  Institutional - Nonprofit  \
County                                                                        
Los Angeles                                 0.26                       0.71   
Orange                                      0.00                       0.00   
San Diego                                   0.00                       2.54   
Fresno                                      4.07                       0.01   
Santa Clara                                 3.88                       0.04   

project_category  Other  Transportation - Other  Water-Energy Projects  
County                                                                  
Los Angeles       91.48                    0.73                   0.00  
Orange            51.20                    6.46                   0.12  
San Diego         88.01                    3.33                   0.00  
Fresno            47.73                    5.73                   0.36  
Santa Clara       73.37                    4.43                   0.09

# Create a stacked bar chart for project distribution across top counties
plt.figure(figsize=(15, 8))
county_category_pcts.loc[top_counties].plot(kind='bar', stacked=True)
plt.title('Distribution of Project Funding by Category Across Top Counties')
plt.xlabel('County')
plt.ylabel('Percentage of Total Funding')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Create a visualization for DAC funding ratio by county
plt.figure(figsize=(15, 8))
sns.boxplot(data=data[data['dac_funding_ratio'].notna()], 
            x='County', 
            y='dac_funding_ratio',
            order=top_counties)
plt.title('DAC Funding Ratio Distribution by County')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

<Figure size 1500x800 with 0 Axes>

# First, let's create a clearer categorization focused on our question
def classify_for_ej_analysis(row):
    program = str(row['Program Name']).lower()
    project_type = str(row['Project Type']).lower()
    
    # Individual incentives
    if program == 'low carbon transportation' and 'voucher' in project_type.lower():
        return 'Individual - Vehicle'
    elif program == 'low-income weatherization program':
        return 'Individual - Solar/Energy'
    elif program == 'woodsmoke reduction program':
        return 'Individual - Solar/Energy'
    # Large projects
    elif row['Total Project Cost'] > 100000:  # threshold for "large" projects
        return 'Large Projects'
    else:
        return 'Other'

data['ej_category'] = data.apply(classify_for_ej_analysis, axis=1)

# Analyze environmental justice metrics by category
ej_analysis = data.groupby('ej_category').agg({
    'Total Project Cost': ['count', 'sum', 'mean'],
    'dac_funding_ratio': ['mean', 'median'],
    'Is Benefit Disadvantaged Communities': 'mean',
    'Is Low Income Communities': 'mean',
    'Total Project GHGReductions': 'sum'
}).round(4)

print("Environmental Justice Analysis by Category:")
print(ej_analysis)

# Calculate percentage of funds going to different income communities
ej_distribution = data.groupby('ej_category').agg({
    'Total Project Cost': 'sum',
    'Total GGRFDisadvantaged Community Funding': 'sum',
    'Low Income Amount': 'sum'
})

ej_distribution['DAC_percentage'] = (ej_distribution['Total GGRFDisadvantaged Community Funding'] / 
                                   ej_distribution['Total Project Cost'] * 100)
ej_distribution['LowIncome_percentage'] = (ej_distribution['Low Income Amount'] / 
                                         ej_distribution['Total Project Cost'] * 100)

print("\nPercentage of Funding to Disadvantaged and Low-Income Communities:")
print(ej_distribution[['DAC_percentage', 'LowIncome_percentage']].round(2))

# Visualize the distribution
plt.figure(figsize=(12, 6))
bar_width = 0.35
categories = ej_distribution.index

x = np.arange(len(categories))
plt.bar(x - bar_width/2, ej_distribution['DAC_percentage'], 
        bar_width, label='DAC Funding %', alpha=0.8)
plt.bar(x + bar_width/2, ej_distribution['LowIncome_percentage'], 
        bar_width, label='Low Income Funding %', alpha=0.8)

plt.xlabel('Project Category')
plt.ylabel('Percentage of Total Funding')
plt.title('Distribution of Funding to Disadvantaged and Low-Income Communities')
plt.xticks(x, categories, rotation=45)
plt.legend()
plt.tight_layout()
plt.show()

# Let's also look at the per-project impact
impact_analysis = data.groupby('ej_category').agg({
    'Total Project GHGReductions': 'mean',
    'Total Project Cost': 'mean'
}).round(2)

impact_analysis['GHG_reduction_per_dollar'] = (impact_analysis['Total Project GHGReductions'] / 
                                             impact_analysis['Total Project Cost']).round(4)

print("\nImpact Analysis:")
print(impact_analysis)

Environmental Justice Analysis by Category:
                          Total Project Cost                              \
                                       count           sum          mean   
ej_category                                                                
Individual - Solar/Energy               7427     320008647  4.308720e+04   
Individual - Vehicle                    4538    1286418141  2.834769e+05   
Large Projects                         10009  127060954384  1.269467e+07   
Other                                 119455    1537986491  1.287503e+04   

                          dac_funding_ratio          \
                                       mean  median   
ej_category                                           
Individual - Solar/Energy            0.6504  0.7405   
Individual - Vehicle                 0.1660  0.0245   
Large Projects                       0.2761  0.0000   
Other                                0.4129  0.0000   

                          Is Benefit Disadvantaged Communities  \
                                                          mean   
ej_category                                                      
Individual - Solar/Energy                               0.7824   
Individual - Vehicle                                    0.3455   
Large Projects                                          0.0664   
Other                                                   0.3275   

                          Is Low Income Communities  \
                                               mean   
ej_category                                           
Individual - Solar/Energy                    0.9338   
Individual - Vehicle                         0.4125   
Large Projects                               0.5602   
Other                                        0.5708   

                          Total Project GHGReductions  
                                                  sum  
ej_category                                            
Individual - Solar/Energy                      693914  
Individual - Vehicle                           993992  
Large Projects                              103527784  
Other                                         3938822  

Percentage of Funding to Disadvantaged and Low-Income Communities:
                           DAC_percentage  LowIncome_percentage
ej_category                                                    
Individual - Solar/Energy           41.46                  4.97
Individual - Vehicle                 5.13                  2.01
Large Projects                       0.91                  1.47
Other                               10.11                 16.38

Impact Analysis:
                           Total Project GHGReductions  Total Project Cost  \
ej_category                                                                  
Individual - Solar/Energy                        93.43            43087.20   
Individual - Vehicle                            219.04           283476.89   
Large Projects                                10343.47         12694670.24   
Other                                            32.97            12875.03   

                           GHG_reduction_per_dollar  
ej_category                                          
Individual - Solar/Energy                    0.0022  
Individual - Vehicle                         0.0008  
Large Projects                               0.0008  
Other                                        0.0026

# 1. First, let's analyze the overall spatial distribution of investments
county_analysis = data.groupby('County').agg({
    'Total Project Cost': ['count', 'sum'],
    'Total Project GHGReductions': 'sum',
    'dac_funding_ratio': 'mean',
    'Is Benefit Disadvantaged Communities': 'mean',
    'Is Low Income Communities': 'mean'
}).round(4)

# Add per capita metrics (we would need to merge with county population data)
# Add GHG reduction per dollar by county

# 2. Let's analyze vulnerable communities specifically
vulnerability_analysis = pd.DataFrame({
    'Total Projects': [
        len(data),
        len(data[data['Is Benefit Disadvantaged Communities'] == 1]),
        len(data[data['Is Low Income Communities'] == 1])
    ],
    'Total Investment': [
        data['Total Project Cost'].sum(),
        data[data['Is Benefit Disadvantaged Communities'] == 1]['Total Project Cost'].sum(),
        data[data['Is Low Income Communities'] == 1]['Total Project Cost'].sum()
    ],
    'GHG Reduction': [
        data['Total Project GHGReductions'].sum(),
        data[data['Is Benefit Disadvantaged Communities'] == 1]['Total Project GHGReductions'].sum(),
        data[data['Is Low Income Communities'] == 1]['Total Project GHGReductions'].sum()
    ]
}, index=['All Communities', 'Disadvantaged Communities', 'Low Income Communities'])

# 3. Project type distribution in vulnerable communities
project_vulnerability = data.groupby(['project_category', 'Is Benefit Disadvantaged Communities']).agg({
    'Total Project Cost': ['count', 'sum'],
    'Total Project GHGReductions': 'sum'
}).round(4)

print("Overall Project Distribution by County:")
print(county_analysis.sort_values(('Total Project Cost', 'sum'), ascending=False).head(10))

print("\nVulnerable Communities Analysis:")
print(vulnerability_analysis)

print("\nProject Types in Vulnerable Communities:")
print(project_vulnerability)

# Visualizations
plt.figure(figsize=(15, 8))
# Plot investment distribution...
plt.subplot(1, 2, 1)
county_analysis[('Total Project Cost', 'sum')].sort_values(ascending=False).head(10).plot(kind='bar')
plt.title('Top 10 Counties by Total Investment')
plt.ylabel('Total Investment')
plt.xlabel('County')

# ...and GHG reductions
plt.subplot(1, 2, 2)
county_analysis[('Total Project GHGReductions', 'sum')].sort_values(ascending=False).head(10).plot(kind='bar')
plt.title('Top 10 Counties by GHG Reductions')
plt.ylabel('Total GHG Reductions')
plt.xlabel('County')

plt.tight_layout()
plt.show()

Overall Project Distribution by County:
                                                   Total Project Cost  \
                                                                count   
County                                                                  
Los Angeles                                                     38032   
Alameda                                                          6381   
Alameda, Contra Costa, Fresno, Kern, Kings, Mad...                  9   
Sacramento                                                       4781   
San Diego                                                        8468   
San Francisco                                                    2748   
Santa Clara                                                      7154   
San Francisco, Santa Clara                                          6   
Orange                                                          10694   
San Diego, Santa Barbara, Ventura                                  23   

                                                                 \
                                                            sum   
County                                                            
Los Angeles                                         69936060480   
Alameda                                             10379456877   
Alameda, Contra Costa, Fresno, Kern, Kings, Mad...   7333200000   
Sacramento                                           5858784824   
San Diego                                            4870580185   
San Francisco                                        3813460158   
Santa Clara                                          3178166479   
San Francisco, Santa Clara                           2998442533   
Orange                                               2383485407   
San Diego, Santa Barbara, Ventura                    2235315130   

                                                   Total Project GHGReductions  \
                                                                           sum   
County                                                                           
Los Angeles                                                           11307708   
Alameda                                                               10518799   
Alameda, Contra Costa, Fresno, Kern, Kings, Mad...                      546406   
Sacramento                                                             1975903   
San Diego                                                              2233426   
San Francisco                                                          1850982   
Santa Clara                                                            1793848   
San Francisco, Santa Clara                                             1538067   
Orange                                                                 1180528   
San Diego, Santa Barbara, Ventura                                       791432   

                                                   dac_funding_ratio  \
                                                                mean   
County                                                                 
Los Angeles                                                   0.5751   
Alameda                                                       0.2983   
Alameda, Contra Costa, Fresno, Kern, Kings, Mad...               NaN   
Sacramento                                                    0.4194   
San Diego                                                     0.1004   
San Francisco                                                 0.1057   
Santa Clara                                                   0.2247   
San Francisco, Santa Clara                                    0.0101   
Orange                                                        0.3994   
San Diego, Santa Barbara, Ventura                                NaN   

                                                   Is Benefit Disadvantaged Communities  \
                                                                                   mean   
County                                                                                    
Los Angeles                                                                      0.5420   
Alameda                                                                          0.2612   
Alameda, Contra Costa, Fresno, Kern, Kings, Mad...                               0.0000   
Sacramento                                                                       0.3403   
San Diego                                                                        0.0993   
San Francisco                                                                    0.1121   
Santa Clara                                                                      0.2200   
San Francisco, Santa Clara                                                       0.1667   
Orange                                                                           0.3571   
San Diego, Santa Barbara, Ventura                                                0.0000   

                                                   Is Low Income Communities  
                                                                        mean  
County                                                                        
Los Angeles                                                           0.6534  
Alameda                                                               0.5557  
Alameda, Contra Costa, Fresno, Kern, Kings, Mad...                    1.0000  
Sacramento                                                            0.5162  
San Diego                                                             0.5265  
San Francisco                                                         0.5761  
Santa Clara                                                           0.4911  
San Francisco, Santa Clara                                            0.8333  
Orange                                                                0.6461  
San Diego, Santa Barbara, Ventura                                     1.0000  

Vulnerable Communities Analysis:
                           Total Projects  Total Investment  GHG Reduction
All Communities                    141429      130205367663      109154512
Disadvantaged Communities           47167       14111685888       12626682
Low Income Communities              82595      103691488875       55739585

Project Types in Vulnerable Communities:
                                                                    Total Project Cost  \
                                                                                 count   
project_category               Is Benefit Disadvantaged Communities                      
Agricultural Projects          False                                              1490   
                               True                                                198   
Air Quality Projects           False                                              5514   
Business - Small               False                                                 2   
Community Projects             False                                               973   
Individual - Home Energy       False                                              1616   
                               True                                               5811   
Individual - Vehicle           False                                              2970   
                               True                                               1568   
Institutional - Education      False                                                17   
Institutional - Government     False                                               269   
Institutional - Large Business False                                                23   
Institutional - Nonprofit      False                                               100   
Other                          False                                             13788   
                               True                                                552   
Transportation - Other         False                                             64674   
                               True                                              36753   
Water-Energy Projects          False                                              2826   
                               True                                               2285   

                                                                                  \
                                                                             sum   
project_category               Is Benefit Disadvantaged Communities                
Agricultural Projects          False                                   824640269   
                               True                                     61874705   
Air Quality Projects           False                                  1211608583   
Business - Small               False                                       65376   
Community Projects             False                                   508817727   
Individual - Home Energy       False                                   117426101   
                               True                                    202582546   
Individual - Vehicle           False                                   839347111   
                               True                                    447071030   
Institutional - Education      False                                     5561672   
Institutional - Government     False                                 16939974420   
Institutional - Large Business False                                   513921827   
Institutional - Nonprofit      False                                  1402247011   
Other                          False                                 91784138193   
                               True                                  12787875145   
Transportation - Other         False                                  1888149934   
                               True                                    584940097   
Water-Energy Projects          False                                    57783551   
                               True                                     27342365   

                                                                    Total Project GHGReductions  
                                                                                            sum  
project_category               Is Benefit Disadvantaged Communities                              
Agricultural Projects          False                                                   22903912  
                               True                                                     1345466  
Air Quality Projects           False                                                     250801  
Business - Small               False                                                          0  
Community Projects             False                                                     154010  
Individual - Home Energy       False                                                     203507  
                               True                                                      490407  
Individual - Vehicle           False                                                     699758  
                               True                                                      294234  
Institutional - Education      False                                                        457  
Institutional - Government     False                                                    6580289  
Institutional - Large Business False                                                     388663  
Institutional - Nonprofit      False                                                    4791834  
Other                          False                                                   57935224  
                               True                                                     9377795  
Transportation - Other         False                                                    2392628  
                               True                                                      920302  
Water-Energy Projects          False                                                     226747  
                               True                                                      198478

/tmp/ipykernel_738944/3838036285.py:63: UserWarning: Tight layout not applied. The bottom and top margins cannot be made large enough to accommodate all Axes decorations.
  plt.tight_layout()

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np

# First, summarize the data at tract level
tract_summary = data.groupby('Census Tract').agg({
    'Total Project Cost': ['count', 'sum'],
    'Total Project GHGReductions': 'sum',
    'Is Benefit Disadvantaged Communities': 'mean',
    'Is Low Income Communities': 'mean'
}).round(4)

# Reset the index to make Census Tract a column
tract_summary = tract_summary.reset_index()

# Create a few different maps to show different aspects
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(20, 20))

# Function to create a map with proper legend
def create_map(data, column, title, ax, cmap):
    im = ax.scatter(data['Census Tract'], 
                   data[column],
                   c=data[column],
                   cmap=cmap)
    plt.colorbar(im, ax=ax, label=title)
    ax.set_title(title)
    ax.axis('off')

# 1. Project Count Map
create_map(tract_summary, 
          ('Total Project Cost', 'count'),
          'Number of CCI Projects by Census Tract',
          ax1,
          'YlOrRd')

# 2. Total Investment Map
create_map(tract_summary,
          ('Total Project Cost', 'sum'),
          'Total CCI Investment by Census Tract',
          ax2,
          'viridis')

# 3. DAC Benefits Map
create_map(tract_summary,
          ('Is Benefit Disadvantaged Communities', 'mean'),
          'Proportion of Projects Benefiting DACs',
          ax3,
          'RdYlBu')

# 4. GHG Reductions Map
create_map(tract_summary,
          ('Total Project GHGReductions', 'sum'),
          'Total GHG Reductions by Census Tract',
          ax4,
          'YlGn')

plt.tight_layout()
plt.show()

# Let's also look at the basic statistics
print("\nSpatial Distribution Statistics:")
print("\nTop 10 Census Tracts by Total Investment:")
print(tract_summary.nlargest(10, ('Total Project Cost', 'sum')))

print("\nTop 10 Census Tracts by Project Count:")
print(tract_summary.nlargest(10, ('Total Project Cost', 'count')))

print("\nTop 10 Census Tracts by GHG Reductions:")
print(tract_summary.nlargest(10, ('Total Project GHGReductions', 'sum')))

# Additional visualization of investment intensity
plt.figure(figsize=(15, 10))
tract_summary['investment_per_project'] = tract_summary[('Total Project Cost', 'sum')] / tract_summary[('Total Project Cost', 'count')]
plt.scatter(tract_summary[('Is Benefit Disadvantaged Communities', 'mean')],
           tract_summary['investment_per_project'],
           alpha=0.5)
plt.xlabel('Proportion of Projects Benefiting DACs')
plt.ylabel('Investment per Project ($)')
plt.title('Investment Intensity vs DAC Benefits')
plt.show()

Spatial Distribution Statistics:

Top 10 Census Tracts by Total Investment:
      Census Tract Total Project Cost            Total Project GHGReductions  \
                                count        sum                         sum   
1686  6.037206e+09                369  128838282                       71147   
1701  6.037208e+09                246   79854037                       21955   
1250  6.037113e+09                115   67551314                       45625   
5715  6.073006e+09                 38   65309556                        1546   
3340  6.037901e+09                 61   37506825                       44030   
2391  6.037408e+09                207   34961176                       31158   
3482  6.037980e+09                 67   30672161                       23886   
6642  6.081602e+09                 61   26686653                       14972   
7164  6.085512e+09                 49   24118370                       17130   
2268  6.037403e+09                 81   23674333                       11183   

     Is Benefit Disadvantaged Communities Is Low Income Communities  
                                     mean                      mean  
1686                               0.5176                    0.0108  
1701                               0.7886                    0.0447  
1250                               0.0087                    0.0435  
5715                               0.0000                    0.5789  
3340                               0.0000                    0.9672  
2391                               0.8164                    0.7923  
3482                               0.0000                    0.0149  
6642                               0.0000                    0.9508  
7164                               0.0000                    0.0204  
2268                               0.3333                    0.8148  

Top 10 Census Tracts by Project Count:
      Census Tract Total Project Cost            Total Project GHGReductions  \
                                count        sum                         sum   
1686  6.037206e+09                369  128838282                       71147   
1701  6.037208e+09                246   79854037                       21955   
2391  6.037408e+09                207   34961176                       31158   
6443  6.077002e+09                132     616605                         650   
6448  6.077002e+09                121     401633                         284   
1250  6.037113e+09                115   67551314                       45625   
6452  6.077003e+09                113     438571                         580   
2670  6.037504e+09                103   13744994                       12914   
6449  6.077002e+09                102     534104                         625   
2276  6.037403e+09                 99    5585697                       12839   

     Is Benefit Disadvantaged Communities Is Low Income Communities  
                                     mean                      mean  
1686                               0.5176                    0.0108  
1701                               0.7886                    0.0447  
2391                               0.8164                    0.7923  
6443                               0.0985                    0.9848  
6448                               0.0661                    0.9835  
1250                               0.0087                    0.0435  
6452                               0.0885                    0.9469  
2670                               0.1942                    0.0583  
6449                               0.1275                    0.9804  
2276                               0.0000                    0.1818  

Top 10 Census Tracts by GHG Reductions:
      Census Tract Total Project Cost            Total Project GHGReductions  \
                                count        sum                         sum   
419   6.009000e+09                 13    6718556                      735067   
7330  6.091010e+09                  8    1842014                      730641   
7963  6.113011e+09                 21    9324110                      297675   
7411  6.095253e+09                 17    2631219                      163765   
4432  6.061022e+09                 13    1199979                       78245   
1686  6.037206e+09                369  128838282                       71147   
3677  6.053011e+09                 26    5634176                       53453   
1250  6.037113e+09                115   67551314                       45625   
3340  6.037901e+09                 61   37506825                       44030   
4166  6.059086e+09                 84   18951369                       32458   

     Is Benefit Disadvantaged Communities Is Low Income Communities  
                                     mean                      mean  
419                                0.0000                    0.1538  
7330                               0.0000                    0.7500  
7963                               0.0000                    0.1429  
7411                               0.0000                    0.2353  
4432                               0.0000                    0.0000  
1686                               0.5176                    0.0108  
3677                               0.7692                    0.3462  
1250                               0.0087                    0.0435  
3340                               0.0000                    0.9672  
4166                               0.3214                    0.2262

# 1. Create program efficiency metrics
program_efficiency = data.groupby('project_category').agg({
    'Total Project GHGReductions': 'sum',
    'Total Project Cost': 'sum'
}).assign(
    efficiency = lambda x: x['Total Project GHGReductions'] / x['Total Project Cost']
)

# 2. Analyze spatial equity
spatial_equity = merged_data.groupby('CES_quartile').agg({
    'Total Project Cost': ['sum', 'mean', 'count'],
    'Total Project GHGReductions': ['sum', 'mean']
})

# 3. Create temporal analysis
temporal_patterns = data.groupby(['project_category', 'Date Operational']).agg({
    'Total Project Cost': 'sum',
    'Is Benefit Disadvantaged Communities': 'mean'
})

	Census Tract	Total Project Cost	Total Program GGRFFunding	Total Project GHGReductions	Annual Project GHGReductions	Project Count	Total GGRFDisadvantaged Community Funding	Funding Benefiting Disadvantaged Communities	Estimated Num Vehicles In Service	Funding Within Disadvantage Communities	...	Indirect Jobs Fte	Induced Jobs Fte	Compost Produced Tons	Compost Produced Tons Yr	Net Density DUA	Applicants Assisted	Invasive Cover 12 Months	Invasive Cover 36 Months	Project Acreage	Intermediary Admin Expenses Calc
count	1.193700e+05	1.414290e+05	1.414290e+05	1.414290e+05	141429.000000	141429.000000	5.518700e+04	55187.000000	141429.000000	5.518700e+04	...	141429.000000	141429.000000	141429.000000	141429.0	141429.000000	141429.000000	141429.000000	141429.000000	1.414290e+05	1.414290e+05
mean	6.053889e+09	9.206412e+05	7.791664e+04	7.717972e+02	0.205389	4.090872	2.736820e+04	110.217551	0.047331	2.030028e+04	...	0.064567	0.117056	0.440977	0.0	0.082260	0.019642	0.010656	0.010578	9.771087e+00	1.911114e+03
std	2.641870e+07	3.736191e+07	1.011645e+06	2.371604e+04	3.361723	18.381861	6.327936e+05	1738.772195	1.972262	5.590536e+05	...	1.444316	2.261146	45.712955	0.0	4.382593	1.406914	0.758250	0.757677	3.669526e+03	1.196317e+05
min	6.001400e+09	0.000000e+00	0.000000e+00	-2.785930e+05	0.000000	0.000000	0.000000e+00	0.000000	0.000000	0.000000e+00	...	0.000000	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00
25%	6.037271e+09	4.000000e+03	3.500000e+03	8.000000e+00	0.000000	1.000000	0.000000e+00	0.000000	0.000000	0.000000e+00	...	0.000000	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00
50%	6.059022e+09	8.000000e+03	7.500000e+03	1.500000e+01	0.000000	1.000000	0.000000e+00	0.000000	0.000000	0.000000e+00	...	0.000000	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00
75%	6.073016e+09	2.150000e+04	1.900000e+04	4.000000e+01	0.000000	3.000000	5.500000e+03	0.000000	0.000000	1.500000e+03	...	0.000000	0.000000	0.000000	0.0	0.000000	0.000000	0.000000	0.000000	0.000000e+00	0.000000e+00
max	6.115041e+09	5.767700e+09	1.412670e+08	4.748924e+06	336.000000	2072.000000	6.443700e+07	102348.000000	503.000000	6.443700e+07	...	110.170000	151.000000	10365.000000	0.0	706.000000	320.000000	85.000000	85.000000	1.380000e+06	2.000000e+07

Patterns of Equity in California's Climate Investments¶

Analysis of Spatial Distribution and Environmental Justice Implications¶

Key Findings:¶