Coloring the Past: Using AI to Colorize Historical Images - Image Colorisation 101

Intro - The Art and Science of Bringing Color to the Past

Remember the old Kodak and Victorian images, the first commerical color photos and film came with Kodachrome during the late 1930s. But what if we could breathe life into those monochrome memories? Welcome to the world of image colorization - a blend of art, science, and cutting-edge technology that's revolutionizing how we view history.

This is a blog post giving your a rundown image colorization space using deep learning.

Why am I doing this?

My final year undergraduate project was about this topic. I think it's only right to share this knowledge in an understandable post instead of being locked behind a wall of academic wording. (Who doesn't like to read 10k PDF full of jargon?!??)

If you can see from my other work, this is not the first time of me doing an image colourization project. You can use an old Richard Feynman colorization with DeOldify breakdown. See here During my undergrad, my supervisor recommended me to continue working on the topic due to previous experience.

Whether you're a deep learning enthusiast, a history buff, or simply curious about how those viral "colorized history" posts are created, this post aims to provide you with a comprehensive understanding of image colorization. Strap in, and let's go.

What is Image Colourisation?

Now, what is image colourisation? While it's pretty simple, turning greyscale images into colour.

So the next logical question is, what's behind the VooDo magic that allows this to happen?

The power of deep learning, CNNs (Convolutional neural networks) to be precise.

CNN allows us models to "See" what's in the image.

Now what does image colourisation do turn those pixels into colour?

via comparing black and white images, with colors and features it as already seen before. It can start map greyscale pixels onto color. With the help of some smart color engineering should I say!

This whole basis of image colourisation.

The greyscale images are input output is the RGB layers. Also author metaphor is greyscale is the images and the layers of neural network is the RGB layers with output coloured images. Neural networks great for understanding non linear patterns. So tuning the right RGB combination for the target pixel is great for deep learning.

  • Quick aside: A non-linear is simply a pattern that does not have 1:1 relationship. But there is still a relationship.

Decoding Color: Understanding RGB and LAB Color Spaces

But RGB is not the only color space used. LAB is used as well. Due to it being an absolute color space, color defined regardless of the device. And the separation of Lightness (brightness) vs color channels make it more precise when mapping the colors.

CIELAB color space - Wikipedia

I've used Claude to help provide an ELI5 explanation:

Imagine you have a big box of crayons. Some crayons are different shades of the same color, like light blue and dark blue. In the RGB color box, these crayons might be mixed up and hard to find. But in the LAB color box, they're organized in a special way:

The L drawer: This has all the light and dark versions of colors. It's like controlling how much sunlight shines on your drawing.

The A drawer: This has crayons going from green to red.

The B drawer: This has crayons going from blue to yellow.

When computer artists want to color a black-and-white picture, the LAB box makes it easier. They can choose how bright or dark to make things without messing up the colors. And they can pick colors that look good together more easily because the crayons are sorted in a way that makes sense to our eyes.

The LAB box also has some magic crayons that can make colors your regular crayon box can't! This lets artists make really pretty and natural-looking colorful pictures from black-and-white ones.

So, while RGB is like a regular crayon box, LAB is like a super-organized, magical crayon box that helps artists color pictures in a way that looks great to our eyes!

Convolutional Neural Networks in Image Colorization

On a high level, takes an image as input in the form of matrix of pixel. Then features (Lines, Texture shapes) are identified. As go though each layer its able to identify for complex shapes. (Dogs, Cats, legs etc). For the final layer used for classification.

Foe the features to be identified we use filters, a small matrices of weights that goes though the image. This down in a sliding window manner. starting from top left and though each section of the image one by one.

This is some short python code we can break down, that converts RGB to LAB.

X = rgb2lab(1.0/255*image)[:,:,0] 
Y = rgb2lab(1.0/255*image)[:,:,1:]

We know that RGB has 3 channels. This is passed into the sklearn rgb2lab function.

Now the shape of image looks like this [insert image here].

Now we select the greyscale layer by selecting index zero. (The last element here is channel section, other elements is the pixels themselves). Calling [:,:,1:] selects channels A and B. green-red and blue-yellow.

Image of RGB image showing the channels in 3D space.

Channels are L A B. And row and column are images dims. 3D space remember.

After converting the color space using the function rgb2lab(), we select the greyscale (Lightness) layer with [:,:,0]. This is typically used as input for the neural network. [:,:,1:] selects the two color layers: A (green–red) and B (blue–yellow).

I'm not the best artist, so there other diagram and the videos above will be helpful as well.

skimage.color — skimage 0.23.2 documentation (scikit-image.org)

Here's a code snippet that would show how LAB channels are accessed.

import numpy as np
from skimage import color
import matplotlib.pyplot as plt

# Assume 'image' is your RGB image
lab_image = color.rgb2lab(image / 255.0)  # Normalize RGB values to [0, 1]

L = lab_image[:,:,0]  # Lightness channel (grayscale)
A = lab_image[:,:,1]  # A channel (green-red)
B = lab_image[:,:,2]  # B channel (blue-yellow)

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(12, 12))
axes[0,0].imshow(image)
axes[0,0].set_title('Original RGB')
axes[0,1].imshow(L, cmap='gray')
axes[0,1].set_title('L channel (Grayscale)')
axes[1,0].imshow(A, cmap='RdYlGn_r')
axes[1,0].set_title('A channel (Green-Red)')
axes[1,1].imshow(B, cmap='YlGnBu_r')
axes[1,1].set_title('B channel (Blue-Yellow)')
plt.tight_layout()
plt.show()

Q note on video colourisation, while talking about it in upcoming blog posts. This apply to video, as videos are simply multiple frames run in a certain speed. Video colorization has issues because of flickering and inconsistent colourisation.

TLDR: How make sure colourisation from 1st frame still applies at frame 50th? see here - if you very eager beaver

Now you understand how image colourisation works we start describe the various architectures.

The Evolution of Colorization: CNN, User-Guided, and Exemplar-Based Approaches

Based on this paper, we classify 3 image colourization types. These are CNN-based, User-guided, and Exemplar-based. There are actually more types of image colourization, which you can see in this paper. But for historical imagery, these are the most relevant.

CNN based image colourisation is type we just explained above. All successive models are build on top on a CNN.

The computer does need see the greyscale and color images right?

The influential papers start started were Deep Colorization. Which showed how deep learning can be used for image colourisation. Using CNNs and early GANs and autoencoders. The next generation were real time user guided image colourisation, that introduced user input for image colourisation. And then, exemplar based image colourisation. Which introduced reference images for helping adjust models. Deep Colorization Paper

Check out the videos of Deep Colorisation below:

Real-Time User-Guided Image Colorization with Learned Deep PriorsColorful Image ColorizationReal-Time User-Guided Image Colorization with Learned Deep Priors (Aug 2017, SIGGRAPH)

These models are great, as they nudge the model in the right direction. As talked about with t-shirt examples image Colorization has a subjective element to it. It can be art as well as a science. (Which all of deep learning btw).

User-guided has the most entertaining examples. Like stickman to images and coloring anime. (If you're a weeboo). These User-guided tend to use GANs and large pre-trained models like a U-Net.

GANs are used because they help generate images, compared to CNNs. Which only classify images. Pretrained-network can already identify various features, shapes, lines etc. instead developing a model from scratch. So we can just focus on colourizing the image.

GANs are out of fashion now, thanks to diffusion models. (No, I wont be explain them here sorry. You are already maths up enough). If you're still interested check out this.

Plain Image Colourisation

This section will be on the shorter side, as the intro and the loss functions sections will explain most of the dynamics.

Let's deep dive into the deep colourisation paper, mentioned above with the video. Architecture is a simple 5 full connected linear layers with ReLU activations, and greyscale image taken as input for the CNN. Where the Output layer has two neurons for U and V color channel values.

Extracting the features are done in 3 levels Low-level the actual patches of gray values. mid level DAISY features a fancy name for general features and shapes and semantic labeling. Hard labels saying this is a tree or a car. Then using a post-processing technique called Joint Bilateral Filtering. Via measuring the spatial distance and the intensity difference between pixels.

Colorful Image Colorization, a great paper. The architecture was Eight blocks of stacked convolutional layers, with Each block contains 2-3 convolutional layers followed by ReLU and Batch Normalization. And Striding used instead of pooling for downsampling.

The cool thing here is how to manipulated the color space of the image. By predicting 313 "ab" pairs representing an empirical probability distribution. Via inference share the correct AB pair for the output image. Cool stuff right. This paper starts deal with the washout issue mentioned in the next section.

So the main trends here were how color representation changes, from direct U and V prediction to probability distributions over color space. Many objects can have multiple plausible colors. Predicted U and V values were forced to choose a single color, often resulting in "safe" but unrealistic predictions (like the infamous brown tendency). And upgrading CNNs via residual blocks and batch normalization and various activation functions. Are now a staple in modern deep learning.

User Guided Models

User guided and exemplar based models, provide feedback from user which a pixel or image reference is used. Popular within the literature right now.

Because the model provides more accurate results, via getting help from user and just relying on the training images seen before hand. A user this car should be red, this t-shirt should be white help model adjust from there.

Here's are great survey paper for more details: [2008.10774] Image Colorization: A Survey and Dataset (arxiv.org)

But what happens if the image is not historical accurate? (Hint, Hint: my paper). [move maybe]

Let's start with Scribbler, A model that allows users to add stokes into images were the model colourise the image based on these images. Via using feed forward network and GAN, to identify the sketch. This model applies a bounding box to the sketch and also previous trained on various shapes and sizes so it can provide accurate output.

[1612.00835] Scribbler: Controlling Deep Image Synthesis with Sketch and Color (arxiv.org)

Real-Time User-Guided Colorization: This papers allows the user to add "hints", pixels that on greyscale image that model should use. So you use a green pixel on a t-shirt. And guess what. The t-shirt is now colourised as green not red. This does not use GAN, but closer to the CNN architecture mentioned earlier. The global hint network keeps account of all the pixels in the image, not just the user input.

Hint-Guided Anime Colorization: A model that were you can draw anime sketches and the model colourizes it. Told you would you like this. This also you uses a C-GAN with U-NET. Used for the perceptual loss.

What makes user guided networks great, so it's downfall. These models can be laborious. Because you are effectively labeling each greyscale image before passing it into the model. Also, if a user selects an unnatural color, then this tends to lead the model to fail. (You won't see a purple dog in the wild, would you? 🤨)

Exemplar Based Models

Now we move on to exemplar models, the state of the art for image colourization. Best to think of this as the advanced version of user guided models. Here's we have reference images to guide the model what's great about this, reference image allows us whole range of pixels to use for colourised image. Not just a simple pixel or sketch like previous models showcased above.

For the exemplar based architecture, The reference image is a big deal, (DUH!). This means the architecture takes 2 inputs, reference image and the greyscale image. Best to think reference image a nudge or weight for the greyscale image. (something I built upon on my paper[link to my paper]).

There many techniques to implement this architecture, by using a single image for the reference and target, to using local references that adjust specific section of the target image.

Deep Exemplar-based Colorization

The paper that introduced exemplar-based colorization. The model has 2 main parts, A Similarity sub-network that measures semantic similarity between the reference and target using VGG features. And a colorization sub-network that learns to select, propagate and predict colors end-to-end. With two main branches, Chrominance branch - Learns to selectively propagate colors from well-matched regions. And the perceptual branch: Predicts plausible colors for unmatched regions based on large-scale data.

SPColor: Semantic Prior Guided Exemplar-based Image Colorization

Building upon the Deep Exemplar-based Colorization paper, SPColor introduces semantic information to guide the model. The main components include a semantic prior guided correspondence network (SPC), which identifies objects in the image; a category reduction algorithm (CRA), which develops about 22 semantic categories for efficient image processing; and a similarity masked perceptual loss (SMP loss), a custom loss function that combines perceptual loss with a similarity map to balance color preservation and generation.

The breakthrough in this paper is the use of semantic segmentation, allowing the model to understand spatial context in the image. For example, it can distinguish between a tree and a car, and colorize the image in local areas rather than all at once, helping to avoid mismatches between semantically different regions.

Here we can see how great exemplar based models are, and why there are the state of the art. From better accuracy to more control from the user. This approach demonstrates significant improvements over previous methods, particularly in handling complex scenes and preserving semantic consistency in the colorized images.

Loss Functions

taken from colorful image colourisation: 1603.08511 (arxiv.org)

But you can see the issues of the colourisation; most of the images are washed out, brown, or frankly incorrect. As the image struggles to identify different objects across images.

(Fun fact: The reason why all images start out as brown is because this is most common color it will see across the dataset. By picking this color it has the lowest error.)

Why Brown? - You might ask?

Many colorization models use MSE as their loss function. MSE penalizes large errors more heavily than small ones. Brown emerges as a compromise color that minimizes error across diverse scenes via averaging the color values.

Let's consider a simplified scenario:

  • True colors: [255, 0, 0] (red), [0, 255, 0] (green), [0, 0, 255] (blue)
  • Average color: [85, 85, 85] (a shade of gray/brown)

MSE for average color:

MSE = [(255-85)^2 + (0-85)^2 + (0-85)^2 + 
       (0-85)^2 + (255-85)^2 + (0-85)^2 + 
       (0-85)^2 + (0-85)^2 + (255-85)^2] / 9
    ≈ 14,167

MSE for any specific color (e.g., red):

MSE = [(255-255)^2 + (0-255)^2 + (0-255)^2 + 
       (255-0)^2 + (255-255)^2 + (255-0)^2 + 
       (255-0)^2 + (255-0)^2 + (255-255)^2] / 9
    ≈ 43,350

The average color yields a lower MSE, incentivize the model to predict "safe" brownish (and ugly) colors.

This is why Pixel-wise loss alone, don't cut it. They don't work for spatial relationships between colors in an image. AKA understanding what going in photos and the objects. (spatial context). Using a more technical term this leads to "mode collapse" [How to Identify and Diagnose GAN Failure Modes - MachineLearningMastery.com, Monitor GAN Training Progress and Identify Common Failure Modes - MATLAB & Simulink - MathWorks United Kingdom]. The model tends to converge on a limited set of "safe" colors, leading to the washed-out appearance.

Now you can see why designing good loss functions are important.

Loss function definitions

Due to adversarial nature of GANs it follows a MinMaxLoss function. With the generator and discriminator competing against each other. As generator develops better images to foll the discriminator that try the tell the difference between a generated and a real image. This concept is later used for perceptual loss in non-GAN models.

$$\min_ \max_ \mathbb{x \sim p\text[\log D(x)] + \mathbb_{z \sim p_z(z)}[\log(1 - D(G(z)))]$$

\(\min_{\max} \mathbb{E}_{x \sim p(x)}[\log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]\)

Pixel-wise loss, directly compares the color values of each pixel in the generated colorized image to the corresponding pixel in the ground truth (target) color image. A traditional loss function, like MSE, MAE and L1.

Perceptual loss aims to capture higher-level features and textures that are important to human visual perception, rather than just pixel-level differences

The key idea is to use a pre-trained neural network (often a CNN classifier like VGG) to measure the similarity between the generated colorized image and the target ground truth image in the feature space of the pre-trained network 4. The intuition is that this perceptual loss can better guide the model to generate colorized images that look visually similar to the target, even if the pixel values don't match exactly 4. [perplexity.ai search]

Perceptual loss and Pixel-level loss are combined into a total loss function for the model.

L_total = λ_p * L_perceptual + λ_pix * L_pixel

In latex form: $$L_ = \lambda_p \cdot L_ + \lambda_ \cdot L_$$

Quick deep learning reminder, the lambda expressions are regularization parameters.

Maths Deep Dive for loss functions

Perpetual Loss

An example feature loss equation take from this paper: (PDF) Analysis of Different Losses for Deep Learning Image Colorization (2022) (typeset.io).

Let's breakdown what the formula says.

Understanding the Components:$C_l, W_l, H_l$:

  • These symbols represent the number of channels ($C_l$), width ($W_l$), and height ($H_l$) of the image at layer $l$. Channels refer to color channels (like red, green, blue) or the LAB color space.
  • Width and height are the dimensions of the image, which help in understanding the size of the data being processed.

The Norm $|\Phi_l(u) - \Phi_l(v)|_2^2$:

  • The term $\Phi_l(u)$ and $\Phi_l(v)$ refers to the features extracted from images $u$ and $v$ at layer $l$.
  • The notation $|\cdot|_2$ represents the L2 norm, which is a way to measure the distance between two points in space. In this case, it measures how different the features of the two images are.
  • Squaring this distance (the $^2$ part) emphasizes larger differences, making them more significant in the loss calculation.

Why Divide by $C_l W_l H_l$?

  • The division by $C_l W_l H_l$ normalizes the loss value. This means it adjusts the loss based on the size of the images and the number of features.
  • Normalization is important because it allows for fair comparisons between different images or models, regardless of their size or complexity.

MSE

Also, some for technical details of MSE.

The formula for MSE in the continuous case.

Let's break this down step by step.

  • Variables Explained:
  • $u$ and $v$: These represent two different images or sets of data we are comparing. For example, 'u' could be the colorized version of a greyscale image, and 'v' could be the actual color image we want to achieve.
  • $\Omega$: This symbol represents the area or domain over which we are comparing the two images. Think of it as the entire space of the image we are looking at.
  • $$\mathbb{C}$$ This notation indicates that we are dealing with color information. 'C' represents the number of color channels (like Red, Green, and Blue). So, if we have a color image, 'C' would typically be 3.
  • Understanding the Norm:
  • $|u-v|_{L^2(\Omega; \mathbb{R})}$: This part of the formula calculates the difference between the two images $u$ and $v$ across the entire area $\Omega$. The $L^2$ indicates that we are using the squared differences, which is important for MSE.
  • $|u(x)-v(x)|_2^2$: Here, $x$ represents a specific point in the image. This expression calculates the squared difference in color values at that point. The $2$ in the subscript indicates that we are using the Euclidean norm, which is a way to measure distance in a multi-dimensional space (like color).
  • The Integral:
  • $\int_\Omega$: This symbol means we are adding up (integrating) all the squared differences across the entire image. It helps us get a single number that represents the overall difference between the two images.
  • Breaking Down the Formula discrete version:

The formula given is:

$$\text(u, v) = \sum_^M \sum_^N \sum_^C (u_ - v_)^2$$

$$\text{u, v} = \sum_{i=1}^M \sum_{j=1}^N \sum_{k=1}^C (u_{ijk} - v_{ijk})^2$$

$$\text{d}(u, v) = \|u-v\|_{L^2(\Omega; \mathbb{R}^2)} = \sqrt{\int_{\Omega} |u(x) - v(x)|_2^2 \, dx}$$

Here's what each part means:

  • $u$ and $v$: These represent the two images we are comparing. $u$ is the colorized image, and $v$ is the original image.
  • $M$: This is the height of the images in pixels. It tells us how many rows of pixels there are.
  • $N$: This is the width of the images in pixels. It tells us how many columns of pixels there are.
  • $C$: This represents the number of color channels in the images. For example, a typical color image has three channels: Red, Green, and Blue (RGB).

Understanding the Summation: The formula uses three summations (the $\sum$ symbols) to add up values:

  • The first summation (over $i$) goes through each row of pixels.
  • The second summation (over $j$) goes through each column of pixels.
  • The third summation (over $k$) goes through each color channel.

This means we are looking at every single pixel in every color channel of both images.

Calculating the Difference: Inside the summation, we see $(u - v)^2$:

  • This part calculates the difference between the color value of the pixel in the colorized image $u$ and the original image $v$ for each pixel at position $(i, j)$ and color channel $k$.
  • The difference is then squared. Squaring the difference is important because it makes sure that we do not have negative values, and it emphasizes larger differences more than smaller ones.

MAE

$$\text(u, v) = \int_\Omega |u(x)-v(x)|_ dx$$

Here, $u$ and $v$ represent two different images. $u$ is the image that the model predicts (the colorized image), and $v$ is the actual image we want (the ground truth image).

  • The symbol $\int_\Omega$ means we are looking at all the pixels in the image. $\Omega$ represents the entire area of the image we are analyzing.

  • The integral helps us sum up the differences across all pixels in the image.

  • The term $|u(x)-v(x)|$ is a way to calculate the difference between the predicted color and the actual color for each pixel.

  • The $l^1$ norm specifically means we are taking the absolute value of the difference. This means we are only interested in how far apart the colors are, without worrying about whether one is greater or smaller than the other.

  • Summing Over Color Channels:

  • Here, $C$ represents the number of color channels in the image. For example, in a typical RGB image, there are three channels: Red, Green, and Blue.

  • The expression $|u_k(x) - v_k(x)|$ calculates the absolute difference for each color channel $k$ at a specific pixel $x$.

The entire formula calculates the total error across all pixels and all color channels. It tells us how well the model has done in predicting the colors.

The formula for MAE in the discrete case is:

$$\text{u, v}^c = \sum_{i=1}^M \sum_{j=1}^N \sqrt{c} (u_{ij} - v_{ij})$$

  • Here, $u$ and $v$ represent two images. $u$ is the colored image produced by the computer, and $v$ is the original colored image we want to compare it to.
  • $M$ and $N$ are the dimensions of the images. Specifically, $M$ is the number of rows (height) in the image, and $N$ is the number of columns (width).
  • $c$ represents the number of color channels in the image. For example, a typical colored image has three channels: red, green, and blue (RGB).
  • The formula uses a double summation, which means it adds up values in a systematic way. The first summation ($\sum_{i=1}^M$) goes through each row of the image, and the second summation ($\sum_{j=1}^N$) goes through each column.
  • For each pixel located at position $(i, j)$, the formula calculates the difference between the predicted color value $u$ and the actual color value $v$ for each color channel $k$.

Discrete Settings vs Continuous Settings

Throughout this section, i've shown both discrete and continuous version of the same loss functions. So why do we have different versions of the same thing? (hopefully you remember some calculus)

Discrete Settings are used because images are represented as discrete pixel values. Loss functions like L1 and L2 operate on these pixel values, making them suitable for direct computation of differences between predicted and actual values .

Continuous Settings may involve treating pixel values as continuous variables, which can be beneficial for certain types of models that predict color distributions rather than specific values.

Code version of the Loss functions

# [from perplexity] (https://www.perplexity.ai/)
import torch
import torch.nn as nn
import torchvision.models as models

class SimplePerceptualLoss(nn.Module):
    def __init__(self):
        super(SimplePerceptualLoss, self).__init__()
        
        # Load pre-trained VGG16 and use its first few layers
        vgg = models.vgg16(pretrained=True)
        self.feature_extractor = nn.Sequential(*list(vgg.features)[:5]).eval()
        
        # Freeze the parameters
        for param in self.feature_extractor.parameters():
            param.requires_grad = False
    
    def forward(self, generated, target):
        # Extract features from generated and target images
        gen_features = self.feature_extractor(generated)
        target_features = self.feature_extractor(target)
        
        # Compute mean squared error between features
        loss = nn.MSELoss()(gen_features, target_features)
        
        return loss

# Usage example
perceptual_loss = SimplePerceptualLoss()

# Example tensors representing generated and target images
generated = torch.randn(1, 3, 256, 256)
target = torch.randn(1, 3, 256, 256)

loss = perceptual_loss(generated, target)
print(f"Perceptual Loss: ")

loss = nn.MSELoss()(gen_features, target_features) this is the main line. Comparing VGG features to the image colourisation features.

Funny you can create a loss function for everything, the lesson in deep learning. Go ask Sam Altman.

Main thing to keep in mind for image colorization, is that calculating the difference between the color and black and white images. Which used to adjust the model for colourisation.

Conclusion

As we've journeyed through the interesting world of image colorization, we've seen how this field has rapidly evolved from simple pixel-based techniques to advanced deep learning tools.

  • We started with the basics of color theory and how computers interpret color spaces like RGB and LAB.
  • We explored the fundamental role of Convolutional Neural Networks (CNNs) in modern colorization techniques.
  • We traced the evolution of colorization methods, from plain CNN-based approaches to more advanced user-guided and exemplar-based models.
  • We delved into the intricacies of loss functions, understanding how pixel-wise, perceptual, and GAN losses contribute to more accurate and visually pleasing results.
  • Finally, we examined state-of-the-art exemplar-based models that leverage semantic information and reference images to produce more accurate colorization.

Within a decade the field of image colourisation via deep learning has progressed a lot. Makes you wonder what the next decade has in store with us. With LLMs and better image generation models. Let's see. Also i've opted moved the ethics and humanities section into a separate blog post. Questions like: what happens if image colourisation is not historical accurate what's next? Something that my paper does a deep dive in. Read my paper here

Read More
Climate Change Tobi Olabode Climate Change Tobi Olabode

Hidden Carbon Emissions: Discovering How Companies Impact the Environment

What is the financial supply chain?

(written around Nov 2022, the gas market has changed alot now)

This 2nd is the series of reports about the fintech space that Aaron McCreary kindly gave me.

This report introduces the concept of financed emissions. Where a company’s balance sheet is used to fund fossil products. Their emissions can be missed in their sustainability reports. Best to think about it as the type of scope 3 emissions.

The paper expands this concept with the financial supply chain. A company may not purposely want to invest in fossil fuel companies. But what happens is that company gives their money to the investment bank. Where money is used on their behalf. That bank may use the company’s money to provide loans to a new coal mine or gas field. This is where the financed emissions come in.  

If you’re interested in how this dynamic plays out at the customer level.

Check out climatetowns video:  What Your Bank Really Does With Your Money | Climate Town

The paper highlights some companies:

PayPal reported 24 KtCo2 of emissions, whiles financed emissions were 1,345Ktco2e. A difference of 5,512%.

Disney reported 1,190 KtCo2 of emissions while financed emissions were 2,011 KtCo2e 169%.

Why Fossil Fuels are not Going Away in the Short Term

My issue with the paper is that assumes that fossil fuel investments are all bad. Or even if you do, the organisation will still have an incentive to invest in fossil fuels. After the Ukraine war, the lack of energy is getting very acute for many countries. Germany needs to increase supply ASAP. While they added renewables, they also added coal and natural gas. This is the only way for the nation to survive. Germany’s companies were happy to talk about Net Zero. Now the Russian gas taps were turned off. Now ordered by the government to invest in fossil fuels and more renewables.

 

Because of this, the government is providing incentives to build more of these LNG terminals. If you’re bank how are going to say no to this?

In the short-term investing in gas is unavoidable. But can steps be taken to reduce the damage of these projects? Yes

Policy could help, like making sure these LNG terminals can be converted to hydrogen terminals in the near future. This idea has to be government-backed in my opinion.

 

The biggest issue here is corporations have bigger incentives than saving the world. AKA making more money. This is where policy comes in, suggested later in the report. In shareholder capitalism, the only legal obligation is to make more money for shareholders. Your fiduciary duty is the most common phrase.

In my opinion, the report slightly sides steps this issue. A CFO is not going to invest in sustainable projects if the next quarter has his job on the line.

Corporations have other incentives that may affect shareholder value. Like a government breathing down your neck. Maybe for legitimate reasons like antirust or the government needs political favour from you.

The 4 Buckets for Solutions

The paper lists out 4 mental models for solutions:

  • Select: identify financial institutions and products that are environmentally sustainable and socially equitable from the existing landscape.

  • Engage: their existing finance providers in their financial supply chain on climate and sustainability, making clear requests and incentivizing good practice.

  • Innovate: develop innovative new products, mechanisms, incentive schemes, data insights, behavioural drivers, etc., that enable companies to accelerate the decarbonization of their financial supply chains.

  • Advocate: push for climate-aligned financial regulation and policy that will increasingly drive the financial system toward progressive sustainable products and services.


This is a good mental model to think about various climate fintech ideas. The buckets have wider scope compared to the The climate fintech paper by New Energy Nexus. Where it delved into the sub-categories and touched on some of the tech used in the solutions.

 

1st bucket “Select”, suggests finding solutions already on the market. You don’t have to invent the wheel. So, finding products that you can use for your company and ideally on the shelf is great.

The “Engage” bucket similar to select is looking at existing providers in the market. As see how you can improve their processes for sustainable solutions. This can implement frameworks like SASB. Or promoting sustainability internally in the company.

The “Innovate” bucket talks about creating new products and business models. If you’re interested in founding a start-up this bucket will be of the most interest to you. Recommend checking out the last paper to read more.

The “Policy” Bucket is where the big stick and carrot of government are used. There could be carbon taxes or climate disclosure rules. The inflation reduction act is a classic example of the carrot approach. (Fancy billions of dollars for your new factory, Uncle Sam has got you covered 👍.) NGOs like GANFZ provide industry guidance for decarbonising.

Suggestions from the report

The first solution from the report was “Demand emissions reporting and transparency”.

This idea has been screamed from the rooftops in the industry for good reason. Getting climate data can be very shaky at times. The more reliable the better.

The financial industry needs the tools if you want to decarbonise. I can know about your financed emissions if know they exist or don’t track them.

Luckily the carbon management space is one of the most popular areas of fintech. Multiple products are offering companies to check their emissions for their supply chain. With better tech, it can become more reliable. An example TransitionZero was able to track real-time emissions of coal plants in China using satellite imagery and AI. There should be many cool examples like this as time goes on.

 

Green bonds are another solution suggested by the report. One of the ways companies can get financing for their green projects.

The report mentions:

“it is difficult for bond purchasers to determine the true impact of green bonds. To ensure green bonds are delivering a measurable climate benefit, companies can issue a mandate to hold their money in a green bond with specific terms and ask issuer financial institutions to bid on the funds”

If you’re interested check this video by CNBC explaining green bonds.

“Fuel green demand”

Creating new financial products that let customers make carbon-friendly decisions was suggested by the report. It provided ideas like “sustainability-linked loans [that] offer discounted lending for taking climate-aligned action”. These ideas can be a win-win for a tech company that needs to use customer data to provide products.

“Move the money where you can”

This a simple and important suggestion. Corporations move billions of dollars. Many climate-friendly banks don’t have the scale nor infrastructure in place to do that. The report suggests partly moving corporate money into more simple funds like philanthropy funds.

Patagonia to this advice to the extreme. By making the whole company a trust where all money does to environmentalism and fighting climate change. What a baller move!

Most corporations can’t do that. But with smart accounting practices, you could make a lot of progress. The authors mention that companies can reduce 60% of every dollar spent by doing this.

Conclusion

The report gives a great rundown on the climate fintech space. Touches on the challenges and opportunities in the space. I mentioned the issue of big banks still having an incentive to invest in fossil fuels explaining why that’s the biggest bottleneck and how it can be solved. The report introduces the concept of “financed emissions”. A new way looking into Scope 3 emissions using financial data.

The paper shows you can corporations can make much more progress with their climate goals. If more work was put in to prevent their holding from funding more fossil fuel projects.

Read More
Climate Change, Machine Learning Tobi Olabode Climate Change, Machine Learning Tobi Olabode

The $3 Trillion Showdown to Save the Planet

I was introduced to this report by Aaron McCreary from Doconomy. After asking for resources to learn about climate fintech. The report gave a rundown of the trends and categories of the industry.

When it comes to climate change finance is something we don’t think about. But the IEA says we need $2 Trillion to fund the transition. That money needs to come from somewhere. Hence climate fintech comes in to help financial institutions and customers move into more sustainable areas.

All climate projects will require funding large or small. Creating tech to solve this issue will be important.

The report talks about the types of tools that are being used to create these solutions. Although I do wish that the whitepaper when more into technical details of tools mentioned. But I do understand it’s a whitepaper, not PhD thesis.

Blockchain is being used more in climate fintech. Because blockchain tech allows you to verify transactions and contracts digitally without a 3rd party. There is still a lot of froth in the space as people still working out how to use the tech or plain old grifters.

From what I noticed from the report many of the solutions are vertical-based. Meaning a solution will be built for insurance companies in mind or asset management.

I wonder about the various bottlenecks affecting these climate fintechs. Is it collecting data to build these ML models? Or selling these products to would-be customers. The famous product risk vs market risk conundrum.

 

Stakeholders in climate fintech

The report laid out the ecosystem by mapping various stakeholders into 3 main buckets:

Private capital (Central banks, Investment Banks, Retail and Commercial Banks).

Asset Managers (Assets Managers, Passive funds and Indices, Wealth Managers).

Asset Owners (Insurance companies, Sovereign Wealth Funds, Pension funds).

Your fintech startup will be helping one of these stakeholders. Either helping them invest in climate projects directly or evaluating the assets they already have. 

The Other stakeholders don’t directly invest in these climate projects. But help the climate fintech startup. These are Venture capital, individual investors, accelerators, and universities.

 

The surprising popularity of risk analysis

Risk analysis is an area that found interesting, due amount of active interest in the field. Risk analysis I thought was a solved problem for the climate world and only a small amount of insurance companies will find it useful. But insurance companies plus consultancy have been picking up at an increasing rate.

From the whitepaper:

“Big players are actively acquiring startups. For example, Moody’s recently acquired minor stake in SynTao Green Finance in China and Four Twenty Seven in the US. Major acquisitions were also observed in other regions in both 2019 and 2020; in addition to the previously mentioned MSCI acquisition of Carbon Delta, Bain & Company acquired Ecovadis in Europe, Morningstar acquired Sustainalytics, and BlackRock formed a strategic partnership with Rhodium Group”

In hindsight, it makes sense, as asset owners and insurance companies look to value their assets in a changing world.

The insurance industry is worth $6 trillion, and many other companies need help evaluating their assets with rising sea levels and wildfires. Many houses in California are now worthless as fire strikes that house every year. No insurance company want to cover that.

 

The risk analysis is built on the rise of satellite imagery and AI. These trends allow companies to collect precise geographic data and model that data into something useful.

 

Jupiter Intel a company mentioned in the report evaluates climate risk under different temperature scenarios. (i.e. 1.5C vs 2C). By having high-resolution satellite images, they see effects within a few meters. This allows the company to take action to mitigate the climate risk for each of its assets.

Companies like First Street Foundation can use climate models and satellite imagery to put a wildfire risk for each household in the area. Then homebuyers can make their own decisions from there.

 

The whitepaper mentions that climate risk can be broken down into different areas.

Transition risk: What changes does a net-zero world affect the company?

Policy and Legal Risk, Technology Risk, Market Risk, and Reputation Risk. You can wrap all these into the ESG category. This may explain why consultancies are buying these risk analysis companies.

 

This is the practice of carbon accounting comes in. Companies will need to reduce their emissions. Going through the supply chain for emissions dealing with multiple risks. Legal risk, following a carbon tax. Reputation Risk, not fulling your very public net zero is an embarrassment.

The other main category is Physical Risk. The risk you think about when it comes to climate risk. What is the likelihood that this house is underwater in 10 years? Or what is the likelihood that this house in turned to ash next summer?

The whitepaper shows the EU taxonomy version of these definitions:

Transition Risk relates to the process of transitioning to a lower-carbon economy

Physical Climate Risk relates to the physical impacts of climate change

 

Physical climate risk can be bucketed into these areas:

Source: ngfs_physical_climate_risk_assessment.pdf

https://www.ngfs.net/sites/default/files/media/2022/09/02/ngfs_physical_climate_risk_assessment.pdf

You can also think of risk using this equation:

Risk = hazard x exposure x vulnerability

These startups help clients work out all areas of this equation. With simulations and data.

 

The whitepaper showed this workflow for risk analysis:

Collection >> Processing >> Aggregation >> Solutions.

This workflow is not a unique risk analysis. As this workflow will be used more by many ML-based companies.

I guess that a lot of value is the models, the data less so. Because a lot of value comes from which assets are at risk and what to do about it. Having a dataset of at-risk areas is helpful but a prediction of severity and likelihood is the most useful information for the company. This is where models come in.

Read More
Climate Change Tobi Olabode Climate Change Tobi Olabode

Climate Analytics

Recently I have been writing a lot about how we can use machine learning to help climate change. A lot of the idea from the paper climate.ai. Which tries to bridge the gap between the machine learning community and industries dealing with climate change. One of the areas that I found interesting lots of people don’t talk much about is climate analytics. Where data is collected about the climate which is then used to make financial decisions. Due to the large scale of climate change almost all countries and most industries will be affected. So, it will make sense to make sure that people do not lose out on their investments due to climate change.

Uses of collecting climate data

All of this will require a lot of data, in which machine learning is suited for. They may be some drawbacks. But I do think they will be useful. They are lots of areas that data can help financial investments. One example is flood risk, where focusing on the long-term risk will be useful for insurance companies. So, they can avoid large payouts. Wildfire risk is very similar which can burn through rural and suburb areas with a lot of woodlands. Costing landowners at a lot of money. Especially if you use land actively to create an income. Like using the land to raise cattle.

How data can be collected

The data can be collected in many ways. One increasingly popular area is remote sensing. Where we use satellite imagery to collect data of an area. This is done as the satellite can collect data in other wavelengths that are not visible to the human eye. So, we can view gases. View vegetation and track other elements. Remote sensing can have future use of enforcing regulations. Right now, a lot of satellite data covers wide pieces of an area of each image. Something hundreds of metres per image. So, you can’t be too precise in tracking a certain area. But as technology gets better, we can be able to pinpoint areas of high emissions. And see if a company is following regulations. But this leads to some privacy concerns.

For urban areas. Tracking the amount of movement using smartphones has been very useful. As people can track the usage of public transport and other services. Using information like that we can create incentives so people can use less carbon-heavy transportation rather than cars. This can help companies to invest in new transport methods by looking at supply and demand.

Energy is the most obvious example. Right now, a lot of energy companies are going bankrupt due to unprofitable energy sources, mainly coal. Because of COVID-19, other fossil companies had to size down. As lockdowns reduced the demand of many energy products, mainly oil. This forced lots of energy companies to chart a zero-carbon future. So, they survive the transition

Read More