Although pure coding and debugging are often not a passion of mine, I recognize the importance of neural networks and other recent developments in Computer Vision. From several projects regarding AI and Machine Learning that I co-authored during my Bachelor Program, I picked this one since I think it is well documented and explains on a step-by-step basis what we do there.
### Image Super-Resolution using Convolutional Neural Networks (Recreation of a 2016 Paper)
Image Super-Resolution is a hugely important topic in Computer Vision. If it works sufficiently advanced, we could take all our screenshots and selfies and cat pictures from the 2006 facebook-era and even from before and scale them up to suit modern 4K needs.
Just to give an example of what is possible in 2020, just 4 years after the paper here, have a look at this video from 1902:
The 2016 paper we had a look at is much more modest: it tries to upscale only a single Image, but historically, it was one of the first to achieve computing times sufficiently small to make such realtime-video-upscaling as visible in the Video (from 2020) or of the likes that Nvidia uses nowadays to upscale Videogames.
The Neural network is artificially adding Pixels so that we can finally put our measly selfie on a billboard poster and not be appalled by our deformed-and-pixelated-through-technology face.
{% gallery() %}
[
{
"file": "sample_lr.png",
"title": "A low-resolution sample",
"alt": "A sample image with low resolution, used as a baseline for comparison."
},
{
"file": "sample_hr.png",
"title": "A high-resolution sample. This is also called 'ground truth'",
"alt": "A high-resolution image that serves as the reference ground truth for comparison with other samples."
},
{
"file": "sample_sr.png",
"title": "The artificially enlarged image patch resulting from the algorithm",
"alt": "A sample image where the resolution has been artificially increased using an image enhancement algorithm."
},
{
"file": "sample_loss.png",
"title": "A graph showing an exemplary loss function applied during training",
"alt": "A graph illustrating the loss function used to train the model, showing the model's performance over time."
},
{
"file": "sample_cos_sim.png",
"title": "One qualitative measurement we used was pixel-wise cosine similarity. It is used to measure how similar the output and the ground truth images are",
"alt": "A visualization of pixel-wise cosine similarity, used to quantify how similar the generated image is to the ground truth image."
### MTCNN (Application and Comparison of a 2016 Paper)
Here, you can also have a look at another, much smaller project, where we rebuilt a rather classical Machine learning approach for face detection. Here, we use preexisting libraries to demonstrate the difference in efficacy of approaches, showing that Multi-task Cascaded Convolutional Networks (MTCNN) was one of the best-performing approaches in 2016. Since I invested much more love and work into the above project, I would prefer for you to check that one out, in case two projects are too much.
[Face detection using a classical AI Approach (Recreation of a 2016 Paper)](https://colab.research.google.com/drive/1uNGsVZ0Q42JRNa3BuI4W-JNJHaXD26bu?usp=sharing)