Presentation on Chi2 with a categorical ordinal ranking (0-9) and a Categorical-Binary Target
Presentation
Code
Code (case 3 is enabled others commented out)
# -*- coding: utf-8 -*-
"""
Created on Sat Jan 20 16:23:59 2024
@author: aleja
A table with categorical ranking(num) and target categ binary
Need to change cat rank(num) to cat nominal
"""
import numpy as np
from sklearn.feature_selection import chi2
import pandas as pd
#let make the randomness repeatable for presentation
np.random.seed(123)
num_of_rows=1000
#x data 0..9 represent ranking of some kind
array = np.random.randint(0, 10, size=(num_of_rows, 3))
print(array)
#Target binary int 0,1 so cat nominal
#=== >comment out< the target case change you dont want
#Case1:first random all random
#Target = np.random.randint(0, 2,size=(num_of_rows, 1))
#Case3:force dependency on ONE value ranking in col0 all other random
Target =np.where(array[:,0]==2,1,(np.random.randint(0, 2,size=array[:,0].shape)))
#another way but you need to keep case 1 ,Target[array[:,0]==2] = 1
#Case2:now i what to force a column0 to be totally affecting the target to verify chi2 is working, should be 100 check ranking
#Target[:,0] =np.where(array[:,0]>4,1,0)
#add Target to the array
array = np.column_stack((array, Target))
#turn into dataframe
my_col_hdr=['Cat_ord0','Cat_ord1','Cat_ord2','Target']
df=pd.DataFrame(array,columns=my_col_hdr)
print(df)
#yes the get_dummies understands number ranking just help it with what column
df = pd.get_dummies(df,columns=['Cat_ord0'],prefix='CO0',dtype='int8')
df = pd.get_dummies(df,columns=['Cat_ord1'],prefix='CO1',dtype='int8')
df = pd.get_dummies(df,columns=['Cat_ord2'],prefix='CO2',dtype='int8')
#move target column to end, seems traditional to do so.
#all the dummies removed the original column which made target the first column
col_to_end=df.columns[0]
df=df[[c for c in df if c != col_to_end] + [col_to_end]]
print(df)
y = np.array(df['Target'])
X = np.array(df.drop(['Target'], axis=1))
#let to a chi2 test now since categorical now
chi2_score,p_value=chi2(X,y)
# want to see it vertical form
for i,col_name in enumerate(df.columns[:-1]):
if i%10==0: print("column")
print("%4.0f"%i,col_name," chi2:","%6.2f" % chi2_score[i]," p_val:","%0.4f" % p_value[i])
if i%10==9: print("\n")
Notes
- Presentation: Using Python and Sklearn Chi2 with a created ranking Categorical data and using get-dummies to convert to Categorical Binary Columns for Chi2 processing.
- Presentation Title:Python-ChI2-Categorical ordinal-number Ranking
- Programming Language used: Python 3.11.3 in Spyder5.4.3, Presentation app: Microsoft’s PowerPoint
- Helpful resource: Stackoverflow, Search Engines , Chat AIs, and Python communities.
- Python, SkLearn,Pandas ,Stackoverflow, Numpy are from respective companies.
- Presentation shown to spark ideas of use.
- This presentation is not connected to or endorsed by any company.
- Use at your own risk.
- Tags: Python, Python3.11, SkLearn, Chi2 , get-dummies, Pandas, Numpy, array
- Other Info: Chi2 is typically used for feature selection when dealing with categorical type data (=what columns are most important)