Corporate dress code, branding, style look book, etc contribute to the visual identity of a company or a brand. There are a lot of examples where companies publish a style look book to promote specific dressing rules that emphasize the brand and its values. For such companies and especially for employees it would be of huge assistance to provide a system/app that can quickly asses conformity to guidelines. Imagine, just by taking a selfie via mobile phone the guidelines could be quickly tested, even more, advise could be received.
But, the potential of the model goes far from this use case. It easily can be adopted to check if safety standards are in place, checking if the construction worker wears protective cloth or a motorbike rider, etc, etc
The model we used is a standard pre-trained VGG 16 with finetuned custom classifier. We used a transfer learning to achieve good results with small dataset.
When using transfer learning, instead of starting the learning process from scratch, we start from patterns that we have learned when solving a different problem. This way we are using previous learning and avoid starting from scratch. We used VGG 16 model with default weights, froze upper layers, cut off the default classifier, attached and trained our classifier
We used only 64 images per class, 48 for training and 16 as a test set. We used 5 classes for a dress code. Also, there is a lot of “noise” on used images that affected accuracy.
We used a confusion matrix to evaluate the accuracy of a classification.
We see the model got mostly confused between class 3-business casual and class 4-smart casual. E.g 3 smart casual photos were classified as business casual and 7 business casual photos were classified as smart casual.
Sometimes is necessary to dive into convolutional network layers in order to get insights on what features are learned by specific layers and how does CNN learns in general. The deeper we go into the layers, visualization becomes more abstract and less visually interpretable. They begin to encode higher-level concepts such as single borders, corners, and angles. Higher presentations carry increasingly less information about the visual contents of the image, and increasingly more information related to the class of the image.
Here are some of the examples of how our model sees.