Briana grew up in Seattle, WA before moving to the Big Island in July 2017. She is a rising senior enrolled in the computer science program at the University of Hawai’i at Hilo. After graduating, she plans to attend graduate school to further her education in computer science. When she is not writing code she enjoys exploring the Big Island, cooking for her family and designing video games.

Home Island: Big Island

Institution when accepted: University of Hawaii at Hilo

Akamai Project: Microscopy Image Classification Using Machine Learning

Project Site: Cyanotech – Kona, Hawai’i Island HI

Mentors: Court Warr & Charles J. O’Kelly

Project Abstract:

Imaging flow cytometry (FlowCam: Fluid Imaging Technologies, Portland, ME) provides image analysis of particles found in aquatic samples. By correctly identifying the particles and their relative proportions within the samples, researchers and cultivators at Cyanotech Corporation can find trends and patterns that help to increase the productivity of spirulina (Limnospira fusiformis), a cyanobacterium commonly used as a dietary supplement. Cyanotech would like to know if applying a classification algorithm will perform more accurately than the existing classification method and reduce the manual classification process which can take up to 30 minutes per sample to complete. The FlowCam maps image samples and converts the images into numerical data, producing a spreadsheet of each image’s unique properties. To build an accurate model that contained little bias and overfitting, it was important that I extract and build a balanced training data set. I exported historical data from multiple spreadsheets and developed a training set by randomly pulling annotated data from the most common particles, then reduced the dimensionality of the feature space by discarding redundant and unnecessary variables from the dataframe. Each remaining variable was then scaled to prevent the model’s failure to generalize data due to high variance. Several classification methods were built with the training set then tested with a separate dataset. Out of these models the top three performers, support vector machine, k-nearest neighbor, and random forest, were selected for further analysis. Each model was then tuned in the attempts to improve accuracy. Once the top performing model is determined, it will automatically write classification data to the existing FlowCam database. Cyanotech will utilize the classification model if it performs better than the original classification method. Results of the best performing classification model will be presented.