DAY 3+4: TESTING VARIOUS THOR SLAVES

Posted on August 12, 2021August 13, 2021 by carinawang26 in Week 10

With the MobileNetV2 model, 224x224x3 images, and 5 epochs, I ran various number of thor slaves with default CPU and memory to evaluate differences in accuracy and timing.

Since only 4 thor slaves were compatible under default settings, I manually changed the CPU to 8 and the memory to 16G.

It is expected that more thor slaves = shorter running time

Published by carinawang26

View all posts by carinawang26

# of Thor Slaves (with default CPU and Memory)	End Time (Total Cluster Time)
1	*error-terminated
2	*error-terminated
4 (default # of thor slaves)	28:12 Second trial: 28:13
8	Failed Second trial: Failed Third trial: still failed Reason why it failed the third time: not enough memory Reason why it failed the first 2 times: the gnncarina resource group on Azure must be set to selected networks, so each time I create a new aks cluster, I must add the network

# of Thor Slaves (with 8 CPU and 16G Memory)	End Time (Total Cluster Time)
1	1:38:12
2	48:10 ^^ 1 vs 2 epochs

Share this:

Related

Published by carinawang26

Leave a comment Cancel reply