πŸ€

One Hot Encoding

Tags
Python
ID matched
Created
Jan 6, 2023 09:41 AM
Last Updated
Last updated July 15, 2023
Β 
Β 

κ°œλ…

  • λ²”μ£Όν˜• λ³€μˆ˜λ₯Ό λ³€ν™˜ν• λ•Œ μ‚¬μš©ν•˜λŠ” 인코딩 λ°©λ²•μœΌλ‘œ, 0 λ˜λŠ” 1의 값을 가진 ν•˜λ‚˜ μ΄μƒμ˜ μƒˆλ‘œμš΄ νŠΉμ„±μœΌλ‘œ λ³€ν™˜ν•œλ‹€.
    • ν˜ˆμ•‘ν˜•μ— λ”°λ₯Έ λΆ„λ₯˜ν•  λ•Œ, 0~3이 μ•„λ‹ˆλΌ (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)의 ν˜•νƒœλ‘œ λ‚˜λˆ„λŠ” 것을 μ˜λ―Έν•œλ‹€.
  • 숫자의 크고 μž‘μŒμ— λ”°λ₯Έ μ€‘μš”λ„λ₯Ό μ—†μ• κΈ° μœ„ν•˜μ—¬ μ‚¬μš©λœλ‹€.
  • 닀쀑곡선성 λ¬Έμ œκ°€ μžˆμ–΄μ„œ, n개의 λ³€μˆ˜λ³΄λ‹€λŠ” n-1개의 λ³€μˆ˜λ₯Ό λ§Œλ“œλŠ” 것이 μ’‹λ‹€.
    • ν˜ˆμ•‘ν˜•μ˜ λ³€μˆ˜λ₯Ό 4κ°œκ°€ μ•„λ‹ˆλΌ 3개둜 μ²˜λ¦¬ν•˜μ—¬, (1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 0, 0)의 ν˜•νƒœλ‘œ λ‚˜νƒ€λ‚΄λŠ” 것이닀.
Β 

μ½”λ“œ

  • 라이브러리 뢈러였기
    • import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import OneHotEncoder from sklearn.linear_model import LinearRegression
  • 데이터 뢈러였기
    • DF = sns.load_dataset('mpg') DF.info()
      notion image
  • λ²”μ£Όν˜•μœΌλ‘œ μΆ”μ •λ˜λŠ” 데이터 뢄포 확인
    • DF['origin'].unique()
      notion image
  • OneHotEncoder μ μš©ν•΄λ³΄κΈ°
    • encoder = OneHotEncoder(sparse=False) df = encoder.fit_transform(DF[['origin']]) df
      notion image
  • OneHotEncoder 적용 ν›„ 데이터 ν”„λ ˆμž„μ— 합쳐보기
    • def set_onehotencoding(df, column_name): encoder = OneHotEncoder(sparse=False) df1 = encoder.fit_transform(DF[[column_name]])[:, :-1] df = pd.concat([df, pd.DataFrame(df1, columns=["%s%i" % ("origin", i) for i in range(df1.shape[1])])], axis=1) df = df.drop(columns=[column_name], axis=1) return df
  • OneHotEncoder 컬럼 뢄포 확인 ν•¨μˆ˜
    • def show_columns_unique(df, search_word): columns = [column for column in df.columns if search_word in column] return df[columns].value_counts().reset_index(name='count')
  • origin에 ν•¨μˆ˜ μ μš©ν•΄λ³΄κΈ°
    • DF = set_onehotencoding(DF, 'origin') DF.head()
      notion image
  • OneHotEncoder 적용된 컬럼의 뢄포 확인
    • show_columns_unique(DF, "origin")
      notion image
Β