Latest from the Blog
FINAL PRESENTATION + POSTER
Slide 7 source links: 1. Diagram 2. Diagrams 3. Book Slide 25: 1. GitHub 2. Blog 3. Flowcharts Digital Takeaway: 1. Robot Inspiration
Read MoreDAY 1: FLOWCHART + JIRA
New JIRA Tickets I opened: I created to summarize the findings of my internship regarding steps to process and classify images with HPCC GNN. In the future, anyone wishing to use their own images to train the GNN model can refer to this.
Read MoreDAY 3+4: TESTING VARIOUS THOR SLAVES
With the MobileNetV2 model, 224x224x3 images, and 5 epochs, I ran various number of thor slaves with default CPU and memory to evaluate differences in accuracy and timing. # of Thor Slaves (with default CPU and Memory) End Time (Total Cluster Time) 1 *error-terminated 2 *error-terminated 4 (default # of thor slaves) 28:12 Second trial:28:13 8 Failed Second trial: FailedThird trial: still failed Reason why it failed the third time: not enough memory Reason why it failed the first 2 times: the gnncarina resource group…
Read MoreDAY 1: README.MD
As I continued working with the HPCC GNN model, I began creating a README.md file to document the steps so that later on, I could turn it into “instructions” for someone to recreate my project. So far, the README.md file outlines how to setup Kubernetes, use AKS created by others, complete the Azure storage setup, create the storage account and shares, deploy HPCC, spray data, and change the number of thor slaves.
Read MoreDAY 1: TESTING DIFFERENT MODELS
Dataset size: 4839 images I changed the model name in Jupyter Notebook to popular image classification models. List of models: https://keras.io/api/applications/ MobileNetV2 Between 37 and 48 seconds per epoch Reached 100% accuracy on the first epoch InceptionResNetV2 This model link didn’t work: So, I tested a different link for the same model and it worked Between 95 and 96 seconds per epoch Reached 100% accuracy after 1 epoch NASNetMobile Between 74 and 84 seconds per epoch Reached 100% accuracy on the first epoch EfficientNet_V2 Between…
Read MoreWEEKEND + DAY 1: OPENING JIRAS
JIRA: HPCC-26359 Create storage account secret error on Mac After trying to run the create-secret.sh file with the command ./create-secret.sh, an error came up saying “exactly one NAME is required, got 2”. JIRA: ML-496 While experimenting with different models to see how changing the model affects the accuracy/time taken, I got this error message for all of the EfficientNet models: “TypeError: ‘numpy.float64’ object cannot be interpreted as an integer.”
Read MoreDAY 5 + WEEKEND: PLAN
Confirm that the dataset size is 4,839 images Run the full dataset with different models in Jupyter Notebook MobileNetV2 InceptionResNetV2 NASNetMobile EfficientNet_V2 bit Run GPU on the desktop On the HPCC GNN model, change the number of thor slaves (1, 2, 4, 8, 12, 20) and document how this variable affects the total cluster time (using MobileNetV2, 224x224x3 images, and 5 epochs) On the HPCC GNN model, change the CPU and memory. Run the various number of thor slaves again (1, 2, 4, 8, 12,…
Read MoreDAY 4+5: JIRA SOLUTION
Following Lili and Roger’s advice, I changed the code from UNSIGNED1 (which is 1 byte) to UNSIGNED4 (4 bytes). Previously, the maximum was 255 images before the model wouldn’t run anymore. Today, I tested 256 images and it successfully worked. The next step was to spray all 4,000+ images and run the model with the complete dataset. With 4,839 images and 4 thor slaves, the model took 1 hour and 13 minutes to run 20 epochs, reaching 100% accuracy on the student images after 2…
Read MoreDAY 2+3: GNN MODEL
The main priority this week has been communicating with other LexisNexis employees to fix the GNN Model 255 images constraint. As of right now, we are examining issues in the code itself rather than Azure/the cloud. While working on that, I have also been testing different models (besides the TensorFlow Transfer Learning one) to see what changes it will reflect on the accuracy percentages. Overall, the TensorFlow Models run smoothly on Jupyter Notebook with consistently high accuracy rates. With the GNN HPCC Model image count…
Read MoreDAY 5: CONDENSED MODEL
Based on the TensorFlow Transfer Learning Model, I created a condensed version that is shorter, but fits the same purpose. Since the first time I used this model, the accuracy has always reached 100% after the first epoch. I was curious as to what would happen if I added a new image with a different background to see if it would lower the accuracy, but it didn’t change. Then, I ran the condensed model with animal images first, then the student images so that it…
Read MoreDAY 2+3: OPENING JIRAS
While troubleshooting ML-490, I ran into some new issues and thus created 2 Jiras to document them. The first Jira was created as an improvement. The ECL Workunit status only displays the “loss” value, and doesn’t show information about accuracy. The second Jira was created because kubernetes was not terminating after GNN training, even when the workunit was aborted.
Read MoreDAY 1: DEBUG JIRA
Last week, I opened a Jira (ML-490) and I’m continuing to get it resolved. The HPCC GNN model will not run more than 255 images on Azure, but I have over 4,000 images that need to be tested. I made a Kubernetes cluster on Azure then ran the model. I also updated GitHub with all of my recent code. Plan for the week: Debug ML-490 Train with GPU on Azure Update Jupyter Notebook (with all 4,577 images and MobileNet V2 model) Configure model to predict…
Read MoreDAY 3+4: HPCC ON AZURE
I sprayed 4,577 images onto Azure with dfuplus images in 22 groups. Then, I created a superfile to include all of the sprayed images. The training produced an error message, so I will modify the code and try again with a smaller dataset. Yesterday, the build failed at 41%. To identify what caused this crash, I tested a few different variables/scenarios. I ran the model on Azure with 600 images (instead of 4,000) to see if the size of the dataset was causing the issue.…
Read MoreDAY 1+2: TRAINING ON AZURE
After making some adjustments in the code, I ran the model on Azure which reached 100% accuracy after 6 epochs.
Read MoreDAY 5: CNN TRANSFER LEARNING
CNN Transfer Learning with TensorFlow Hub sample code for my project: The next step is to adapt this to GNN, which may be very challenging because it is not yet compatible. Over the weekend, I also tested what would happen if I took two photos from s123450 and incorrectly put their photos in the s123451 folder to see how much the accuracy would go down/how many epochs it takes. It took a few seconds longer, but the accuracy was still at 100%. I tested a…
Read MoreDAY 4: OPEN JIRAS
To expand my dataset, I took more student videos with the security robot’s webcam and also took non-student videos to develop “negative” data. I processed these videos into still images and converted/resized the new photos. Then, I was able to test the TensorFlow model with a full dataset (positive and negative data) to achieve these results: The re-trained model correctly identified students (s12345#) and non-students (x#). Next, I tested/ran the HPCC GNN Model and sprayed animal data to HPCC. After training the 80×80 images on…
Read MoreDAY 4: MIDTERM – END PLAN
References Transfer Learning with TensorFlow Hub Tutorials https://www.tensorflow.org/tutorials/images/transfer_learning_with_hub MobileNet V2 (Lightweight models for use in mobile application): https://arxiv.org/pdf/1801.04381.pdf Hands-on Machine Learning with Scikit-Learn, Keras, & TensorFlow Chapter 14 TensorFlow Keras API: https://www.tensorflow.org/api_docs/python/tf/ TensorFlow Hub API: https://www.tensorflow.org/hub/api_docs/python/hub Training Image Data with GNN Image classification Categories: AHS students/staff or Not AHS students/staff Graduation year (Class-2022) Each group’s images can be saved in a directory. The directory name is labeled as the category’s name Formatted for TensorFlow Hub (JPG, BMP, PNG, etc). Put all image files in a…
Read MoreDAYS 2+3: GNN MODEL
I am currently trying to create a model similar to the one in the GNN Tutorial using 12 student bmp files. Each image is 80×80 with 3 colors. When attempting to train the model, the process never finished. I’m in the process of troubleshooting this error message to run my images using the model and achieve a higher accuracy.
Read MoreDAY 1: HPCC GNN PHOTOS
Today I ran the HPCC tutorial images on my Jupyter Notebook pre-trained model (https://www.tensorflow.org/tutorials/images/transfer_learning_with_hub). After resizing the animal images to 224 x 224, I created two folders (“dog” and “not dog”) to classify the images. Upon running the model, I received a “SavedModel does not exist” error message. To troubleshoot, I referred to forums that suggested to delete the path the error was referring to (https://github.com/tensorflow/hub/issues/646). The final solution that fixed my error message was simply rebooting my computer. I ran the model with 3…
Read MoreDAY 4+5: TENSORFLOW LITE
Xcode successfully recognized my phone and the TensorFlow image classification model now runs on my phone. I began researching how to get my model/images loaded onto the phone application. Right now, the object recognition runs successfully but it does not have the dataset of student photos. I saved my model to a .tflite file, but now I need to add my model to the Xcode program that can run on my phone. I also converted JPG to BMP as the file output in my java…
Read MoreDAY 3: MODEL TESTING
I retrained the “transfer_learning_with_hub” with 12 of my sample images. I manually changed the image sizes for the 12 images to match the model image dimension requirements. With the small dataset, the model achieved a 100% accuracy. After organizing 3 folders for “cropped”, “original”, and “others”, the next step was to use a java program to automatically change the size and dimensions of the photos so that I could test the model with over 1,000 photos instead of just 12. All files are currently either…
Read MoreDAY 2: TENSORFLOW LITE
Following the TensorFlow Lite Image Classification iOS tutorial (https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/ios), I set up my workstation to build and run the example model. The initial set up was successful, but Xcode showed an error message when I attempted to connect my phone. The goal is for the model to run on the TFL Classify app on my phone so that when the phone camera is pointed to an object, the app will display a result for what it thinks the object is. Xcode displays this error message…
Read MoreDAY 1: DATASET
I decided on “s123450, s123451, s123452, etc.” as the consistent naming convention. The filename prefix must be renamed before closing VLC and running it again with a new video. Videos saved as WEBM on my desktop → open VLC → Help → Preferences → show all → video → filters → scene filter I testing the resizing feature with a tutorial (https://www.makeuseof.com/tag/batch-convert-resize-images-mac/) but then decided to keep the original 1280×720. Next, I began looking into tutorials to get TensorFlow Lite on a phone. https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/ios In…
Read MoreDAY 5: DATASET
Using various converter softwares, I turned WEBM videos into JPG images. The initial software I used successfully turned the video into 100 photos, but the problem was that it couldn’t resize the photos. I tried the NCH Photo Resize software instead, but it didn’t successfully output consistent dimensions. Then, I used BatchPhoto, which successfully resized, but requires a subscription to get rid of the watermark. Since BatchPhoto works with Windows, I sent the sample photos from my Windows desktop to my Mac laptop (where I…
Read MoreDAY 4: WEBCAM
During the daily meeting, we talked about the security measures necessary when working with data, images, and personal identifiable information in order to ensure the safety of all parties involved. There are precautions and procedures we must follow to stay compliant with data security requirements. After the webcam was mounted on the surface of the robot, I tested the procedures for turning a video into still images. Using Google Meets, I configured the camera input to the webcam, and screen-recorded the window with Screencastify, which…
Read MoreDAY 3: GNN
The goal for today was the try and spray sample animal images from a database by following the ECL GNN Tutorial. When I first tested it, I got this error message in the landing zone: Solution: change target name to “imgdb::cw” instead of “animal_images” The next goal was to get to this page (screenshot taken from the tutorial video) In order to do that, I tried running BWR_ImageGNN in VS Code (View → Command Palette → ECL Submit) It showed an error message: “No ECL…
Read MoreDAY 2: GNN + CAMERA
I continued following the “Machine Learning with HPCC GNN Tutorial” on Docker Desktop. • In ECL, shapes (ex. [n,n] is a 2-dimensional Tensor, [n,n,n,n] is a 4-dimensional Tensor) are expressed using a RECORD-oriented Tensor • Always starts at 0. (ex. 50 x 50 colored image = [0,50,50,3] where 3=the 3 colors RGB) Two steps for creating an ECL Tensor: Create the data for the TensData RECORD Create a Tensor with that data. Pack it into a block oriented form called “slices” GNN Documentation: https://cdn.hpccsystems.com/pdf/ml/GNN.pdf VS…
Read MoreDAY 1: PROCEDURE CHANGE
During my daily meeting with my mentor David, we talked about an advancement in the application of the facial recognition program onto the robot in order to incorporate HPCC. Instead of using a program to augment the data (ex. turn 1 photo of student A into 100 photos), we will try to take a 10-second video of each student moving their head around, and turn the video into still, jpg images. This will allow me to train the model with real data, instead of “manufactured”…
Read MoreDAY 5: HPCC IN DOCKER DESKTOP
Today I followed the “Machine Learning with HPCC GNN Tutorial Part 1” on my local Docker Desktop. First, I checked terminal to make sure that I had the ML_Core bundle installed, along with the other prerequisites below. Instead of using the Virtual Machine that the demo used, I’m using Docker Desktop. Instead of using ECL IDE, I’m using VS Code. Deploying HPCC in Docker Desktop uses the same containerized documentation I used before, but I changed “helm repo add HPCC…” to “git clone https://github.com/hpcc-systems/HPCC-Platform” because…
Read MoreDAY 4: DATA AUGMENTATION
After completing the Data Augmentation tutorial on Jupyter Notebook (Keras Preprocessing Layers and tf.images), I concluded that tf.images would be a better fit for my project. Keras preprocessing layers Resizing (for consistency) and rescaling pixel values Data augmentation Apply preprocessing layers multiple times to the same image (switch the orientation, rotate it, etc.) layers.RandomContrast, layers.RandomCrop, layers.RandomZoom *different for my model: instead of flipping the images, it would be best for the variation to be in lighting, shadows, location, background distractions, etc. to simulate a real…
Read MoreDAY 3: RETRAINING AN IMAGE CLASSIFIER
I opened the image retraining guide via the terminal and ran the code which consisted of the following steps that I will be replicating on my facial recognition model: Select the TF2 SavedModel Resize images in the dataset for the module, and augment the photos Put a linear classifier on top of the feature_extractor_layer Start with non-trainable speed layer Train the model Test the model’s accuracy using the validation dataset Trained model can be saved for deployment to TF Serving or TF Lite (mobile) The…
Read MoreDAY 2: TENSORFLOW
The main objective for today was to follow the TensorFlow 2 quickstart code and the TensorFlow Convolutional Neural Network (CNN) tutorial. To successfully compile the code, I upgraded my TensorFlow version to 2.5 and installed matplotlib (using conda) for the image retraining documentation I will finish following tomorrow. The Jupyter Notebook for Retraining an Image Classifier will be especially useful because I will be following very similar steps when I retrain an existing model. I ran into an issue with the image retraining code -…
Read MoreDAY 1: HPCC IN AZURE
I started deploying HPCC in Azure after installing Azure Cli. Halfway through the process, I ran into an ECL error because the helm chart’s version didn’t match the image version. My first attempted solution involved deleting “mycluster” and deploying a matched version (rc4 instead of rc2), but I still received an error message. Ultimately, the issue was fixed by changing the helm chart version to 8.0.20 to match the docker image 8.0.20-1. helm install mycluster hpcc/hpcc –set global.image.version=8.0.20-1 -f examples/azure/values-auto-azurefile.yaml I also finished reading the…
Read MoreDAY 3: Hands-On Machine Learning
I continued reading the Hands-On Machine Learning book while taking notes on skills applicable to my internship project Chapter 4 Gradient descent → algorithm that finds the optimal solution Polynomial regression → using a linear model to fit nonlinear data Logistic regression → estimates the probability that X belongs to a positive class (1 if positive, 0 if negative) Chapter 5 Decision function → predict if its a positive class (1) or negative (0) Support vector machine → machine learning model → supports linear and…
Read MoreDAY 2: Machine Learning Training
Today, I continued reading the Hands on Machine Learning book. Chapter 2 went over a simulation of a housing market machine learning model that identifies worthwhile investments based on the location, number of rooms, size, etc. Chapter 3 went into specifics about measuring performance through cross-validation, a confusion matrix, precision and recall, and the ROC curve. I also learned about multiclass (multinomial) classifiers and binary classifier training.
Read MoreDAY 1: Machine Learning Tutorials
This week, my main goal is to read through the applicable chapters of the Hands on Machine Learning book, TensorFlow 2 Quickstart, and TensorFlow Tutorials Convolutional Neural Network while simultaneously running the corresponding code on Jupyter. Today, I worked through Chapter 1 and Chapter 2 of the Machine Learning book, which walked through building a sample machine learning model through utilizing housing information to simulate a real-world scenario.
Read MoreDAY 4+5: PRELIMINARY SETUP
As I close out the first week of my internship, today’s goals were to deploy HPCC on Docker desktop and install the VS Code ECL Plug-in. I followed a tutorial online, opened the ECL file, and created the launch configuration. To prepare my environment for following the “Containerized HPCC Systems Platform documentation”, I enabled Kubernetes in Docker Desktop. The next step is to follow the Machine Learning GNN tutorial. The following tasks have been completed: Download Docker Desktop Update → reconfigure settings Check that VS…
Read MoreDAY 2+3: PRELIMINARY SETUP
To kick-off my project, I created GitHub repositories (HPCC-GNN-Cloud and TensorFlowStudy) and cloned them to my local system. In preparation for the next few days, I found the VS Code ECL Plug-In installation guideline, and the GNN tutorial. With these resources, I will be able to download/set up necessary prerequisites for my project’s platform. I have also obtained the Containerized HPCC Systems Platform documentation guideline that I will follow next week. The main goal for these two days was to lay down the foundational work,…
Read MoreDay 1: Preliminary Setup
Today was the first official start date of my Summer 2021 internship with LexisNexis. My goals for the day are to deploy HPCC on Docker Desktop, start the Azure Intern Training, complete the Cyber Defense Onboarding Curriculum Training, start the blog, and start reading “Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow” By the end of the first day, I had completed the following: • Started the blog • Obtained sample photo database for augmentation later on • Accessed Azure Portal tutorial • Started “Hands-on…
Read MoreGet new content delivered directly to your inbox.