Β
Β
κ°λ
- λ²μ£Όν λ³μλ₯Ό λ³νν λ μ¬μ©νλ μΈμ½λ© λ°©λ²μΌλ‘, 0 λλ 1μ κ°μ κ°μ§ νλ μ΄μμ μλ‘μ΄ νΉμ±μΌλ‘ λ³ννλ€.
- νμ‘νμ λ°λ₯Έ λΆλ₯ν λ, 0~3μ΄ μλλΌ (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)μ ννλ‘ λλλ κ²μ μλ―Ένλ€.
- μ«μμ ν¬κ³ μμμ λ°λ₯Έ μ€μλλ₯Ό μμ κΈ° μνμ¬ μ¬μ©λλ€.
- λ€μ€κ³΅μ μ± λ¬Έμ κ° μμ΄μ, nκ°μ λ³μ보λ€λ n-1κ°μ λ³μλ₯Ό λ§λλ κ²μ΄ μ’λ€.
- νμ‘νμ λ³μλ₯Ό 4κ°κ° μλλΌ 3κ°λ‘ μ²λ¦¬νμ¬, (1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 0, 0)μ ννλ‘ λνλ΄λ κ²μ΄λ€.
Β
μ½λ
- λΌμ΄λΈλ¬λ¦¬ λΆλ¬μ€κΈ°
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import OneHotEncoder from sklearn.linear_model import LinearRegression
- λ°μ΄ν° λΆλ¬μ€κΈ°
DF = sns.load_dataset('mpg') DF.info()
- λ²μ£ΌνμΌλ‘ μΆμ λλ λ°μ΄ν° λΆν¬ νμΈ
DF['origin'].unique()
- OneHotEncoder μ μ©ν΄λ³΄κΈ°
encoder = OneHotEncoder(sparse=False) df = encoder.fit_transform(DF[['origin']]) df
- OneHotEncoder μ μ© ν λ°μ΄ν° νλ μμ ν©μ³λ³΄κΈ°
def set_onehotencoding(df, column_name): encoder = OneHotEncoder(sparse=False) df1 = encoder.fit_transform(DF[[column_name]])[:, :-1] df = pd.concat([df, pd.DataFrame(df1, columns=["%s%i" % ("origin", i) for i in range(df1.shape[1])])], axis=1) df = df.drop(columns=[column_name], axis=1) return df
- OneHotEncoder μ»¬λΌ λΆν¬ νμΈ ν¨μ
def show_columns_unique(df, search_word): columns = [column for column in df.columns if search_word in column] return df[columns].value_counts().reset_index(name='count')
- originμ ν¨μ μ μ©ν΄λ³΄κΈ°
DF = set_onehotencoding(DF, 'origin') DF.head()
- OneHotEncoder μ μ©λ 컬λΌμ λΆν¬ νμΈ
show_columns_unique(DF, "origin")
Β