Python – Sklearn – Chi- Categorical Ordinal

Presentation on Chi2 with a categorical ordinal ranking (0-9) and a Categorical-Binary Target

Presentation

Code

Code (case 3 is enabled others commented out)

# -*- coding: utf-8 -*-
"""
Created on Sat Jan 20 16:23:59 2024

@author: aleja

A table with categorical ranking(num) and target categ binary
Need to change cat rank(num) to cat nominal
"""

import numpy as np
from sklearn.feature_selection import chi2
import pandas as pd

#let make the randomness repeatable for presentation
np.random.seed(123)
num_of_rows=1000
#x data 0..9 represent ranking of some kind
array = np.random.randint(0, 10, size=(num_of_rows, 3))
print(array)
#Target binary int 0,1 so cat nominal

#=== >comment out< the target case change you dont want
#Case1:first random all random
#Target = np.random.randint(0, 2,size=(num_of_rows, 1))

#Case3:force dependency on ONE value ranking in col0 all other random
Target =np.where(array[:,0]==2,1,(np.random.randint(0, 2,size=array[:,0].shape)))
#another way but you need to keep case 1 ,Target[array[:,0]==2] = 1

#Case2:now i what to force a column0 to be totally affecting the target to verify chi2 is working, should be 100 check ranking
#Target[:,0] =np.where(array[:,0]>4,1,0)


#add Target to the array
array = np.column_stack((array, Target))

#turn into dataframe
my_col_hdr=['Cat_ord0','Cat_ord1','Cat_ord2','Target']
df=pd.DataFrame(array,columns=my_col_hdr)
print(df)


#yes the get_dummies understands number ranking just help it with what column
df = pd.get_dummies(df,columns=['Cat_ord0'],prefix='CO0',dtype='int8')
df = pd.get_dummies(df,columns=['Cat_ord1'],prefix='CO1',dtype='int8')
df = pd.get_dummies(df,columns=['Cat_ord2'],prefix='CO2',dtype='int8')

#move target column to end, seems traditional to do so.
#all the dummies removed the original column which made target the first column
col_to_end=df.columns[0]

df=df[[c for c in df if c != col_to_end] + [col_to_end]]
print(df)

y = np.array(df['Target'])

X = np.array(df.drop(['Target'], axis=1))


#let to a chi2 test now since categorical now
chi2_score,p_value=chi2(X,y)

# want to see it vertical form
for i,col_name in enumerate(df.columns[:-1]):
    if i%10==0: print("column")
    print("%4.0f"%i,col_name," chi2:","%6.2f" % chi2_score[i]," p_val:","%0.4f" % p_value[i])
    if i%10==9: print("\n")

Notes

  • Presentation: Using Python and Sklearn Chi2 with a created ranking Categorical data and using get-dummies to convert to Categorical Binary Columns for Chi2 processing.
  • Presentation Title:Python-ChI2-Categorical ordinal-number Ranking
  • Programming Language used: Python 3.11.3 in Spyder5.4.3, Presentation app: Microsoft’s PowerPoint
  • Helpful resource: Stackoverflow, Search Engines , Chat AIs, and Python communities.
  • Python, SkLearn,Pandas ,Stackoverflow, Numpy are from respective companies.
  • Presentation shown to spark ideas of use.
  • This presentation is not connected to or endorsed by any company.
  • Use at your own risk.
  • Tags: Python, Python3.11, SkLearn, Chi2 , get-dummies, Pandas, Numpy, array
  • Other Info: Chi2 is typically used for feature selection when dealing with categorical type data (=what columns are most important)
Unknown's avatar

About LV_TS_Test_Engineer_3000_VI

Automated Test Equipment Software
This entry was posted in Python, Test Sector and tagged , , , , . Bookmark the permalink.

Leave a comment