DAY 5 + WEEKEND: PLAN

  1. Confirm that the dataset size is 4,839 images
  2. Run the full dataset with different models in Jupyter Notebook
    • MobileNetV2
    • InceptionResNetV2
    • NASNetMobile
    • EfficientNet_V2
    • bit
  3. Run GPU on the desktop
  4. On the HPCC GNN model, change the number of thor slaves (1, 2, 4, 8, 12, 20) and document how this variable affects the total cluster time (using MobileNetV2, 224x224x3 images, and 5 epochs)
  5. On the HPCC GNN model, change the CPU and memory. Run the various number of thor slaves again (1, 2, 4, 8, 12, 20) with 8 CPU and 16g memory)
  6. Write documentation for how to spray 4,000+ images into HPCC
  7. Write documentation for how to recreate this project using Jupyter Notebook and HPCC GNN
  8. Load this model onto my phone
  9. Write README.md files
  10. Create presentation with the following information
    • What is a neural network
    • What is CNN
    • What is GNN
    • What is image classification
    • How did I get my data
    • Results/how I trained my local device
    • Results/how I used Jupyter Notebook (and link for others to recreate)
    • Which model I used
    • Move this job to HPCC (how we set it up) (tables with 1 thor, 2 thors, 4, 8, 12, and 20 in default)
      • specify which model yielded what result
    • Change to 8 CPU and 16g memory
      • affect on time and accuracy
    • Results from GPU
    • GitHub link
    • Jiras that I opened and which ones got resolved
    • Future work

DAY 4+5: JIRA SOLUTION

Following Lili and Roger’s advice, I changed the code from UNSIGNED1 (which is 1 byte) to UNSIGNED4 (4 bytes). Previously, the maximum was 255 images before the model wouldn’t run anymore. Today, I tested 256 images and it successfully worked.

The next step was to spray all 4,000+ images and run the model with the complete dataset.

With 4,839 images and 4 thor slaves, the model took 1 hour and 13 minutes to run 20 epochs, reaching 100% accuracy on the student images after 2 epochs. When I tested 5 epochs, the training used 31 minutes and 22 seconds. Next week, I will keep retraining the model to see if/how much the speed increases when continuously processing the same images.

This was a huge step for this internship project and the HPCC GNN model as it can now process exponentially more images for multiple purposes.

DAY 2+3: GNN MODEL

The main priority this week has been communicating with other LexisNexis employees to fix the GNN Model 255 images constraint. As of right now, we are examining issues in the code itself rather than Azure/the cloud. While working on that, I have also been testing different models (besides the TensorFlow Transfer Learning one) to see what changes it will reflect on the accuracy percentages. Overall, the TensorFlow Models run smoothly on Jupyter Notebook with consistently high accuracy rates.

With the GNN HPCC Model image count limitation, I would only be able to train 255 out of close to 5,000 student/non-student images. Even with this limitation, I will still be able to bring it past the “proof of concept” stage by running the full dataset successfully on Jupyter Notebook using the Transfer Learning Model. However, ideally, the HPCC model will be processing the images on the cloud. Fixing the 255 image limitation will open the doors for more practical applications of the HPCC GNN model in the future even beyond the scope of my internship project, which makes it a top priority for this week.