Today I followed the “Machine Learning with HPCC GNN Tutorial Part 1” on my local Docker Desktop. First, I checked terminal to make sure that I had the ML_Core bundle installed, along with the other prerequisites below.
Instead of using the Virtual Machine that the demo used, I’m using Docker Desktop. Instead of using ECL IDE, I’m using VS Code.
Deploying HPCC in Docker Desktop uses the same containerized documentation I used before, but I changed “helm repo add HPCC…” to “git clone https://github.com/hpcc-systems/HPCC-Platform” because “helm-chart” only has gold release, but the latest platform-GNN docker image is in 8.2.0.rc build
Next, I set up GitHub Pull Requests and Issues in VS Code.
I watched a video on tensors, then ran the code in VS Code and got this result:
After completing the Data Augmentation tutorial on Jupyter Notebook (Keras Preprocessing Layers and tf.images), I concluded that tf.images would be a better fit for my project.
Keras preprocessing layers
Resizing (for consistency) and rescaling pixel values
Data augmentation
Apply preprocessing layers multiple times to the same image (switch the orientation, rotate it, etc.)
*different for my model: instead of flipping the images, it would be best for the variation to be in lighting, shadows, location, background distractions, etc. to simulate a real photo dataset
Options for using the preprocessing layers
First option: make the layers part of the model
Second option: apply the preprocessing layers to your dataset
I am deciding to use this option so my data augmentation can be separate from the training model and be saved separately
For the second option:
Only augment the training set, not the validation set
Train a model using the augmented data
Create custom data augmentation layers
Tf.image
Reimport the dataset
Retrieve an image to work with
Flip the image, grayscale the image, saturate the image, change image brightness, center crop the image, rotate
Random transformations (random saturation, brightness, etc)
For my purposes, tf.image may be more useful than the basic preprocessing layers because it can emulate real photos more accurately
Apply the augmentation to the dataset
Then, I created my own Jupyter Notebook to augment a dataset. The first issue I ran into was that the code in the data_augmentation Jupyter Notebook is changing the images (adjusting brightness, saturation, etc.) but it does not specify code for how to increase the number of images (ex. take 1 image and produce 100 versions of that image).
I went online and researched some other resources that would be able to turn one image into multiple images. I found a tutorial titled “Easy Image Dataset Augmentation with TensorFlow”, but it’s not as detailed. The “Data Augmentation” TensorFlow documentation is very detailed, on the other hand, but not aligned with my purposes. Therefore, I am now splicing together lines of code from both documents to fit my objectives.
In order to augment my dataset, I needed to create a folder of the student images I had, and another folder of images of people who don’t attend my school (I used photos of celebrities). The next step was to make sure that all of the image sizes were the same. To check image size, I opened each photo, right clicked –> get info –> more info.
After finding out that the image sizes are all different, I looked for an application on Mac that could unify the image sizes. I kept the original folder with the unedited images, made a copy of the folder and titled it “edited_student_images”, and opened each image in
Preview –> tools –> adjust size –> fit to 320 x 320 pixels
I thought that this was sufficient to create images that were the same size, but I later found out that since the image sizes are scaled proportionally, they still showed up as different sizes on Jupyter Notebook. This issue was fixed by unchecking the “scale proportionally” box.
Now that I have a local file of uniform images of students and non-students, I can work to display the images in Jupyter Notebook.
I started with a sample code from the TensorFlow tutorial
It displays the file names for all of the images that I need, but not the images themselves
I changed the code to try and fix this issue, but then I got this error message
I later realized that the reason it displayed this error message is because there was a hidden file that wasn’t .jpeg. I manually deleted the .DS_Store file via the terminal (shown below). Tomorrow, I will research a way for the code to automatically detect the file types and only attempt to display images that are .jpeg so this issue doesn’t produce an error message next time.
Now, the code in Jupyter Notebook works correctly. The images are all vertically displayed in a row, and they are the same size.
I opened the image retraining guide via the terminal and ran the code which consisted of the following steps that I will be replicating on my facial recognition model:
Select the TF2 SavedModel
Resize images in the dataset for the module, and augment the photos
Put a linear classifier on top of the feature_extractor_layer
Start with non-trainable speed layer
Train the model
Test the model’s accuracy using the validation dataset
Trained model can be saved for deployment to TF Serving or TF Lite (mobile)
The dataset setup step of the image retraining demonstration outputs an AttributeError.
After researching the error message online, the response was that “the specific function (tf.keras.preprocessing.image_dataset_from_directory) is not available under TensorFlow v2.1.x or v2.2.0 yet. It is only available with the tf-nightly builds and is existent in the source code of the master branch.”
Currently, I am on version 2.0.0 of tensorflow. I updated my tensorflow version to 2.3.0 to see if it would fix the error message.
Error message:
PackagesNotFoundError: The following packages are not available from current channels:
The main objective for today was to follow the TensorFlow 2 quickstart code and the TensorFlow Convolutional Neural Network (CNN) tutorial. To successfully compile the code, I upgraded my TensorFlow version to 2.5 and installed matplotlib (using conda) for the image retraining documentation I will finish following tomorrow. The Jupyter Notebook for Retraining an Image Classifier will be especially useful because I will be following very similar steps when I retrain an existing model. I ran into an issue with the image retraining code – the “Setup” step resulted in a “ModuleNotFoundError: No module named ‘matplotlib'” even after I installed matplotlib and checked the version with “conda list | grep matplotlib” on my terminal. I am still in the process of devising a solution.
Update: The reason why it showed an error message is because I installed “matplotlib” under in “base” instead of “tf”. This was the same issue for “tensorflow_hub” too. After repeating the process for “tensorflow_hub”, the code now runs successfully without an error message.
I started deploying HPCC in Azure after installing Azure Cli. Halfway through the process, I ran into an ECL error because the helm chart’s version didn’t match the image version. My first attempted solution involved deleting “mycluster” and deploying a matched version (rc4 instead of rc2), but I still received an error message. Ultimately, the issue was fixed by changing the helm chart version to 8.0.20 to match the docker image 8.0.20-1.