Week 10 – Image Classification with HPCC GNN on Cloud

DAY 3+4: TESTING VARIOUS THOR SLAVES

Posted on August 12, 2021August 13, 2021 by carinawang26 in Week 10

With the MobileNetV2 model, 224x224x3 images, and 5 epochs, I ran various number of thor slaves with default CPU and memory to evaluate differences in accuracy and timing.

Since only 4 thor slaves were compatible under default settings, I manually changed the CPU to 8 and the memory to 16G.

It is expected that more thor slaves = shorter running time

DAY 1: README.MD

Posted on August 9, 2021August 9, 2021 by carinawang26 in Week 10

As I continued working with the HPCC GNN model, I began creating a README.md file to document the steps so that later on, I could turn it into “instructions” for someone to recreate my project.

So far, the README.md file outlines how to setup Kubernetes, use AKS created by others, complete the Azure storage setup, create the storage account and shares, deploy HPCC, spray data, and change the number of thor slaves.

DAY 1: TESTING DIFFERENT MODELS

Posted on August 9, 2021 by carinawang26 in Week 10

Dataset size: 4839 images

I changed the model name in Jupyter Notebook to popular image classification models.

List of models: https://keras.io/api/applications/

MobileNetV2

Between 37 and 48 seconds per epoch
Reached 100% accuracy on the first epoch

InceptionResNetV2

This model link didn’t work:

So, I tested a different link for the same model and it worked

Between 95 and 96 seconds per epoch
Reached 100% accuracy after 1 epoch

NASNetMobile

Between 74 and 84 seconds per epoch
Reached 100% accuracy on the first epoch

EfficientNet_V2

Between 51 and 56 seconds per epoch
Reached 100% accuracy on the first epoch

bit

Between 288 and 319 seconds per epoch
Reached 100% on the first epoch

Ultimately, the MobileNet V2 model I originally used was the most time-efficient at 37 to 48 seconds for each epoch. All of the models reached 100% accuracy on the first epoch.

WEEKEND + DAY 1: OPENING JIRAS

Posted on August 9, 2021 by carinawang26 in Week 10

JIRA: HPCC-26359

Create storage account secret error on Mac

After trying to run the create-secret.sh file with the command ./create-secret.sh, an error came up saying “exactly one NAME is required, got 2”.

JIRA: ML-496

While experimenting with different models to see how changing the model affects the accuracy/time taken, I got this error message for all of the EfficientNet models: “TypeError: ‘numpy.float64’ object cannot be interpreted as an integer.”

# of Thor Slaves (with default CPU and Memory)	End Time (Total Cluster Time)
1	*error-terminated
2	*error-terminated
4 (default # of thor slaves)	28:12 Second trial: 28:13
8	Failed Second trial: Failed Third trial: still failed Reason why it failed the third time: not enough memory Reason why it failed the first 2 times: the gnncarina resource group on Azure must be set to selected networks, so each time I create a new aks cluster, I must add the network

# of Thor Slaves (with 8 CPU and 16G Memory)	End Time (Total Cluster Time)
1	1:38:12
2	48:10 ^^ 1 vs 2 epochs