Vectorizing images in Python for Machine Learning image classification
I'm doing the Astrozoo competition on Kaggle right now. The goal is to create an algorithm to accurately classify galaxies into different types. The training set is a zip file consisting of over 50,000 images. Before you do anything, the first step would be to convert the images into a usable format for analysis, such as resizing each image to 100x100 and converting it into a vector of data points. Here is the code I've pieced together to do that. Running this code will convert all jpeg files in a specified folder into vectors and dump it into a .txt file. #take all files in one directory and move it into another directory import os import glob #allows you to look at file names import PIL #allows you to manipulate images (for resize) #main script variables orig_dir = '/home/luwei/Desktop/Dropbox/Kaggle/astrozoo/testing' #where the images are located crop_dimensions = (140, 140, 284, 284) #dimensions to crop each image by pixelsize = 5 #set s...