分類問題を解く ~データの用意と可視化~
どうも,筆者です.
久しぶりの更新である.今回は,気象庁が公開している過去の気象データを用いて,分類問題を解く.まず,分類用データがないため,データの用意をする.
とある都市の気象データを以下に示す.
気温(℃),湿度(%),日照時間(h) 平均,平均,平均 20.9,47,9 20.5,40,14 20.7,47,13 21.5,65,3.3 19.3,75,4.2 21.6,68,8.3 21,74,0 22.6,70,10.5 22.8,79,1.4 24.5,62,13.4 24.3,62,10.8 23.2,75,0.2 21.7,87,0 24,70,7.1 24.2,70,3.7 21.7,84,0 24.1,70,3.8 25.9,54,13.4 22.7,81,0.2 25.1,71,5.3 25.2,74,6.3 24.4,83,0.6 24.8,76,7.6 22.5,79,0.4 23,86,0 23.3,66,8.6 24.2,60,8 20.7,83,0.2 22.3,89,0 25,81,1 27.7,70,11.2 29,67,9.3 27.8,75,3.1 27.9,73,7.3 27.9,68,9.5 25.9,77,3 29.9,63,11.6 27.1,74,0 22.3,94,0 26.7,74,9.4 29.3,51,12.4 28.2,64,0.6 27.1,83,0.5 28.1,69,9 25.5,66,5.5 25.6,72,0.9 27.1,75,0.5 28.2,60,13.5 28.7,53,13.8 28.5,58,13.5 27.7,65,7 26.3,71,3.5 25.1,68,2.1 25.4,70,3.7 26.3,64,4.8 22.7,87,0 24.9,85,1.7 26.8,74,3.2 27.2,69,6.8 28,63,5.8 29.1,61,9.8 27.9,74,8.8 26.5,79,6.4 27.9,73,7.5 29,70,10.1 29.3,69,11.2 29.8,67,10.1 29.5,67,8.7 31,56,12.8 30.1,55,11.9 28.9,51,11 28.5,62,11.7 28.3,66,6.2 28.4,63,5.6 28,73,3.5 27.9,74,2.8 29.8,61,10.2 30.9,62,7.5 29.3,73,5.6 29.6,69,10.5 29.6,68,8.5 29.1,68,9.3 30.9,62,5 30.4,61,7.7 27.8,76,2.3 28.1,71,4.2 28.9,65,9.2 26.6,73,1.4 25.3,84,0.8 26.9,75,6.1 24.3,67,7 26.8,55,12.1 23.8,64,5.3 19.9,57,8.9 19.3,47,12.8 20.2,43,13.8 20.9,41,14 21.1,58,8.9 19.4,78,0 19.3,82,2.6 22.6,57,10.7 23.1,57,8.6 21.7,39,6.2 20.7,44,12.7 20.7,47,13.6 21.9,47,14 23.3,46,13.8 22.8,48,13 22.3,55,8.9 22,65,0 25.6,48,13.7 25,57,7 21,86,0.2 23.5,59,4 24.7,63,12.6 24.4,70,5.4 21.9,90,0 25,60,6.8 24.6,72,1.5 24,81,3.2 24.8,76,0.7 22.9,90,0.5 26.6,79,2.7 27.8,74,5.5 29.1,67,7.2 26.9,78,0.8 26,73,1.7 26.9,61,12.7 28.4,55,13.6 28.9,56,11 27.6,73,0.1 27.5,77,3.6 28.2,73,6.7 28.6,76,3.5 27.9,79,5.6 27.9,74,6.3 28.8,66,9.2 29.7,63,8.6 27.8,72,2.6 29.3,61,10.8 28.2,67,8.8 28.5,69,9.1 28.9,67,7.5 29,68,8 27,81,0.4 28.6,75,3.4 27,89,0.1 29.3,78,4.9 26.3,82,0 28.5,73,5.5 29,72,4.3 27,81,0 29.2,72,8.7 29.8,65,7 27.8,66,3.3 27.7,69,3.3 29.3,69,6.1 29.2,75,3.1 29.9,70,10.2 27.3,83,0.1 27.4,73,6.6 29.7,57,12.5 29.5,67,3.5 27.1,89,0.9 27.9,74,7.3 28,62,4.8 27.5,66,0.6 24.9,85,0 25.9,85,1.2 28,74,4.6 26.7,85,3.7 28.2,70,9.8 28.1,73,6 28.5,72,6.3 28,74,2.3 27.8,77,5.5 30.8,62,10.9 30.2,65,8 27.5,58,2.8 26.8,62,10 27.3,72,1 28.7,74,3.5 28.9,64,9.2 27.3,51,11.5
このデータを下記の条件に合致する場合は1,合致しない場合は0とラベルを振る. 【条件】「気温が24℃以上」かつ「湿度が50%以上」かつ「日照時間が3h以上」
また,ここで利用するデータは「気温」と「湿度」のみとする.気温,湿度,上記の条件を適用した結果を以下に示す.
Temperature,Humidity,Status 20.9,47,0 20.5,40,0 20.7,47,0 21.5,65,0 19.3,75,0 21.6,68,0 21,74,0 22.6,70,0 22.8,79,0 24.5,62,1 24.3,62,1 23.2,75,0 21.7,87,0 24,70,1 24.2,70,1 21.7,84,0 24.1,70,1 25.9,54,1 22.7,81,0 25.1,71,1 25.2,74,1 24.4,83,0 24.8,76,1 22.5,79,0 23,86,0 23.3,66,0 24.2,60,1 20.7,83,0 22.3,89,0 25,81,0 27.7,70,1 29,67,1 27.8,75,1 27.9,73,1 27.9,68,1 25.9,77,1 29.9,63,1 27.1,74,0 22.3,94,0 26.7,74,1 29.3,51,1 28.2,64,0 27.1,83,0 28.1,69,1 25.5,66,1 25.6,72,0 27.1,75,0 28.2,60,1 28.7,53,1 28.5,58,1 27.7,65,1 26.3,71,1 25.1,68,0 25.4,70,1 26.3,64,1 22.7,87,0 24.9,85,0 26.8,74,1 27.2,69,1 28,63,1 29.1,61,1 27.9,74,1 26.5,79,1 27.9,73,1 29,70,1 29.3,69,1 29.8,67,1 29.5,67,1 31,56,1 30.1,55,1 28.9,51,1 28.5,62,1 28.3,66,1 28.4,63,1 28,73,1 27.9,74,0 29.8,61,1 30.9,62,1 29.3,73,1 29.6,69,1 29.6,68,1 29.1,68,1 30.9,62,1 30.4,61,1 27.8,76,0 28.1,71,1 28.9,65,1 26.6,73,0 25.3,84,0 26.9,75,1 24.3,67,1 26.8,55,1 23.8,64,0 19.9,57,0 19.3,47,0 20.2,43,0 20.9,41,0 21.1,58,0 19.4,78,0 19.3,82,0 22.6,57,0 23.1,57,0 21.7,39,0 20.7,44,0 20.7,47,0 21.9,47,0 23.3,46,0 22.8,48,0 22.3,55,0 22,65,0 25.6,48,0 25,57,1 21,86,0 23.5,59,0 24.7,63,1 24.4,70,1 21.9,90,0 25,60,1 24.6,72,0 24,81,1 24.8,76,0 22.9,90,0 26.6,79,0 27.8,74,1 29.1,67,1 26.9,78,0 26,73,0 26.9,61,1 28.4,55,1 28.9,56,1 27.6,73,0 27.5,77,1 28.2,73,1 28.6,76,1 27.9,79,1 27.9,74,1 28.8,66,1 29.7,63,1 27.8,72,0 29.3,61,1 28.2,67,1 28.5,69,1 28.9,67,1 29,68,1 27,81,0 28.6,75,1 27,89,0 29.3,78,1 26.3,82,0 28.5,73,1 29,72,1 27,81,0 29.2,72,1 29.8,65,1 27.8,66,1 27.7,69,1 29.3,69,1 29.2,75,1 29.9,70,1 27.3,83,0 27.4,73,1 29.7,57,1 29.5,67,1 27.1,89,0 27.9,74,1 28,62,1 27.5,66,0 24.9,85,0 25.9,85,0 28,74,1 26.7,85,1 28.2,70,1 28.1,73,1 28.5,72,1 28,74,0 27.8,77,1 30.8,62,1 30.2,65,1 27.5,58,0 26.8,62,1 27.3,72,0 28.7,74,1 28.9,64,1 27.3,51,1
これを「train_data.csv」として保存する.また,テスト用のデータも同様に用意する.下記にとある時期の気象データを示す.
21.3,62,13.4 22.4,61,13.1 22.9,62,10.6 23.9,60,12.9 23.6,64,7 19.4,93,0 22.6,81,5.5 22.6,82,1.4 24.1,69,10.8 21.3,78,0 24.1,76,0.3 23.7,63,11.5 22.7,56,13.8 22.6,62,8.7 21.6,77,0 21.2,60,9.5 21.8,67,4 21,77,4.6 23.1,73,5.4 20.9,96,0 23.3,71,2.5 24.5,51,12.3 20.3,79,0.1 24.3,62,8.5 27.4,52,13 26.6,69,9.2 27.1,72,1.6 26.8,75,2.8 26.4,78,2.3 27.3,75,5.2 27.9,68,10.5 28,66,7.4 27.2,68,5.8 25.7,84,0 24.6,93,0 24.3,93,0 25.5,84,0.2 26.9,73,6 27.7,68,8.8 28.5,65,11 29.7,59,7.1 29.4,58,3.3 29.7,58,8.7 30.8,54,13 31.2,57,12.8 31.5,60,12.2 31,64,10.1 32,61,10.6 31,65,8 30.8,63,11.1 30.5,63,10.6 32.7,51,13.5 33.3,46,13.1 32.2,52,12.6 30.8,63,6.7 29.3,62,5.3 28.4,58,5 29.2,59,3.7 28.4,68,5.8 29.6,58,11.6 29.9,61,7.2 31,58,11.5 33.1,51,12.5 32.8,45,12.9 31.7,52,10.7 32.7,53,9.8 33,49,12.9 29.6,58,3 30.2,52,10.9 31.5,54,11.7 30.8,61,3.6 31.3,61,7.5 28.2,74,3 29.6,70,6.5 30.2,65,8.1 26.9,79,0.4 27.9,74,2.4 26.5,38,13.1 25,50,9.2 25.5,52,10 26.2,65,2.3 28.9,70,6.8 30.4,60,10.2 28.8,67,6.6 28,79,0.2 29.8,70,5.5 31.1,57,12.3 31.1,52,11.8 29.5,58,2.9 30.1,59,9.5 29.7,61,7.9 28.6,69,4
これを下記の条件に従いラベルを振る.合致する場合は1,合致しない場合は0とする. 【条件】「気温が24℃以上」かつ「湿度が50%以上」かつ「日照時間が3h以上」
また,ここで利用するデータは「気温」と「湿度」のみとする.気温,湿度,上記の条件を適用した結果を以下に示す.
Temperature,Humidity,Status 21.3,62,0 22.4,61,0 22.9,62,0 23.9,60,0 23.6,64,0 19.4,93,0 22.6,81,0 22.6,82,0 24.1,69,1 21.3,78,0 24.1,76,0 23.7,63,0 22.7,56,0 22.6,62,0 21.6,77,0 21.2,60,0 21.8,67,0 21,77,0 23.1,73,0 20.9,96,0 23.3,71,0 24.5,51,1 20.3,79,0 24.3,62,1 27.4,52,1 26.6,69,1 27.1,72,0 26.8,75,0 26.4,78,0 27.3,75,1 27.9,68,1 28,66,1 27.2,68,1 25.7,84,0 24.6,93,0 24.3,93,0 25.5,84,0 26.9,73,1 27.7,68,1 28.5,65,1 29.7,59,1 29.4,58,1 29.7,58,1 30.8,54,1 31.2,57,1 31.5,60,1 31,64,1 32,61,1 31,65,1 30.8,63,1 30.5,63,1 32.7,51,1 33.3,46,0 32.2,52,1 30.8,63,1 29.3,62,1 28.4,58,1 29.2,59,1 28.4,68,1 29.6,58,1 29.9,61,1 31,58,1 33.1,51,1 32.8,45,0 31.7,52,1 32.7,53,1 33,49,0 29.6,58,1 30.2,52,1 31.5,54,1 30.8,61,1 31.3,61,1 28.2,74,1 29.6,70,1 30.2,65,1 26.9,79,0 27.9,74,0 26.5,38,0 25,50,1 25.5,52,1 26.2,65,0 28.9,70,1 30.4,60,1 28.8,67,1 28,79,0 29.8,70,1 31.1,57,1 31.1,52,1 29.5,58,0 30.1,59,1 29.7,61,1 28.6,69,1
これを「test_data.csv」として保存する.
下記のスクリプトでデータを可視化する.
#!/usr/bin/python # -*- coding: utf-8 -*- import pandas as pd import numpy as np import matplotlib.pyplot as plt def get_data(target_df): X = np.array([target_df['Temperature'].tolist(), target_df['Humidity'].tolist()], dtype=np.float64).T y = np.array(target_df['Status'].tolist(), dtype=np.int32) return X, y def visualization(X, y, marker='o'): fig, ax = plt.subplots(figsize=(10, 10)) x1_min, x1_max = X[:, 0].min()-1, X[:, 0].max()+1 x2_min, x2_max = X[:, 1].min()-1, X[:, 1].max()+1 x1_mesh, x2_mesh = np.meshgrid(np.arange(x1_min, x1_max, 0.1),np.arange(x2_min, x2_max, 0.1)) ax.scatter(X[y==0, 0], X[y==0, 1], c='blue', marker=marker, cmap='jet') ax.scatter(X[y==1, 0], X[y==1, 1], c='red', marker=marker, cmap='jet') ax.set_xlim(x1_mesh.min(), x1_mesh.max()) ax.set_ylim(x2_mesh.min(), x2_mesh.max()) ax.set_xlabel('Temperature') ax.set_ylabel('Humidity') return fig, ax if __name__ == '__main__': train_df = pd.read_csv('train_data.csv', header=0) test_df = pd.read_csv('test_data.csv', header=0) train_X, train_y = get_data(train_df) test_X, test_y = get_data(test_df) for (X, y, name, marker) in [(train_X, train_y, 'train', 'o'), (test_X, test_y, 'test', '^')]: fig, ax = visualization(X, y, marker=marker) ax.set_title('{} data'.format(name)) plt.savefig('visualization_{}_data.png'.format(name)) plt.close(fig)
結果は下記のようになる.
今回はここまでとする.以降,時間を見つけて分類問題を解く.