Category: Machine Learning

  • Double Machine Learning for Dummies as explained by ChatGPT

    I asked ChatGPT to:

    explain double machine learning for dummies

    Here is the answer:

    Double machine learning (DML) is a statistical method used to estimate treatment effects in the presence of high-dimensional controls. To put it simply, it’s a way to answer questions about cause and effect using data and machine learning, even when there are lots of potentially relevant variables.

    Here is a more simplified explanation using a practical example:

    Let’s say you are a coach of a football team, and you want to determine the impact of a new training method on the performance of your team. However, there are many factors that could affect performance – things like diet, amount of sleep, weather conditions, player mood, previous experience, and more.

    With traditional methods, you would have to painstakingly adjust for all these factors, or "controls", to determine the real impact of the new training method. This can be time-consuming, difficult, and may not be accurate, especially if there are a lot of controls.

    Here’s where double machine learning comes in. DML uses machine learning twice (hence the "double" part) to solve this problem:

    1. First, it uses machine learning to predict the outcome (football performance) based on all the control variables (diet, sleep, weather, etc.), and then subtracts this prediction from the actual outcome. This gives you a "residual" outcome that can’t be explained by the controls.

    2. Then, it does the same thing for the treatment (the new training method): it predicts the treatment based on the controls, and subtracts this prediction from the actual treatment. This gives you a "residual" treatment that can’t be explained by the controls.

    After that, you simply correlate the "residual" outcome with the "residual" treatment to estimate the effect of the treatment. This gives you an estimate of the impact of the new training method, accounting for all the controls without needing to adjust for them one by one.

    In other words, double machine learning helps you figure out the cause-and-effect relationship between variables when there are many other variables that could potentially influence the result. It does this by using machine learning to "filter out" the effect of the other variables, leaving just the effect you’re interested in.

  • How to use bnlearn to learn causal structures

    This article on causal machine learning covers a practical example of how to learn structural causal models (SCM) directly from data. We will use bnlearn, which is an open-source library for learning the graphical structure of Bayesian networks in Python. Check out my Github repo for additional code examples. For other frameworks, checkout my page on causal stuff.

    Learning a Bayesian network can be split into structure learning and parameter learning which are both implemented in bnlearn.

    • Structure learning: Given a set of data samples, estimate a DAG that captures the dependencies between the variables.
    • Parameter learning: Given a set of data samples and a DAG that captures the dependencies between the variables, estimate the (conditional) probability distributions of the individual variables.

    Libraries

    We will learn through a practical example and code. The following libraries are used to implement the example. Numpy and pandas are used for recreating a classic synthetic dataset often used in causal machine learning, the "sprinkler" dataset. BNLearn is then used to learn the causal structure among the variables in the dataset.

    You will need the following imports in Python:

    import numpy as np
    import pandas as pd
    import bnlearn as bn
    

    The sprinkler dataset


    Photo by Rémi Müller on Unsplash

    Imagine a small world with a lawn that is sometimes wet. I bet you can smell that lawn just thinking about it. Only two things cause this lawn to be wet. If it rains or if the sprinkler is on. Otherwise the lawn is dry (i.e. ¬wet). While clouds are needed for rain, not all clouds carry rain. It may therefore be cloudy without rain. On sunny days the lawn might need some water and then the sprinkler is turned on. On other sunny days the lawn does not need water and the sprinkler is off. The sprinkler is never on when it is cloudy, because somehow clouds help the lawn stay moist if not wet.

    The lawn world implies four stochastic variables:

    • Cloudy (independent)
    • Rain (depends on Cloudy)
    • Sprinkler (depends on not-Cloudy)
    • Grass wet (depends on Rain and Sprinkler)

    The following code samples the four variables and creates the sprinker dataset :

    n_samples = 10000
    cloudy = np.random.choice(2, p=[0.25, 0.75], size=n_samples)
    rain = cloudy * np.random.choice(2, p=[0.7, 0.3], size=n_samples)
    sprinkler = (1-rain) * (1-cloudy) * np.random.choice(2, p=[0.5, 0.5], size=n_samples)
    grass_wet = np.maximum(rain, sprinkler)
    data = np.column_stack((cloudy, rain, sprinkler, grass_wet))
    df = pd.DataFrame(data, columns=["cloudy", "rain", "sprinkler", "grass_wet"])
    

    The resulting dataset may look like this:

    Cloudy Rain Sprinkler Grass wet
    1 0 0 0
    1 1 0 1
    0 0 0 0
    0 0 1 1

    Take a moment to verify that the observations are consistent with the story told above.

    Learning the causal structure

    The variables were created to have a specific causal structures. Examples of structures among the variables are shown below, where an arrow (X → Y) should be read as "X causes Y":

    • Cloudy → Rain → Grass wet (chain)
    • Rain → Grass wet ← Sprinkler (collider)
    • Sprinkler ← Cloudy → Rain (fork)

    Let’s see if we can learn this causal structure using bnlearn as shown in the following code snippet:

    model = bn.structure_learning.fit(df)
    model = bn.independence_test(model, df)
    

    Notice that we may learn the wrong causal relationships. For example, it may seem that turning the sprinkler off causes clouds to appear. This is because the sprinkler is never on while there are clouds and vice versa. However, any observations where the sprinkler is off and no clouds appear would be evidence to the contrary, which may or may not be present in the sample we generated above.

    Visualising the causal DAG

    Because bnlearn includes networkx, we get the ability to visualise the graph that was learned. It’s a single line of code:

    G = bn.plot(model)
    

    If all went well with the data generation and learning, the graph should look similar to this.

    If it does not, simply try to generate the data again, optionally increasing the number of samples.

    Conclusion

    BNLearn can be used to learn the causal relationships of variables directly from data. It does not always work and is somewhat sensitive to the sample drawn, as causal relationships may sometimes be misinterpreted if insufficient evidence exists in the sample to indicate otherwise.

  • Cosine similarity in Python

    Cosine similarity is the normalised dot product between two vectors. I guess it is called “cosine” similarity because the dot product is the product of Euclidean magnitudes of the two vectors and the cosine of the angle between them. If you want, read more about cosine similarity and dot products on Wikipedia.

    Here is how to compute cosine similarity in Python, either manually (well, using numpy) or using a specialised library:

    import numpy as np
    from sklearn.metrics.pairwise import cosine_similarity
    
    # vectors
    a = np.array([1,2,3])
    b = np.array([1,1,4])
    
    # manually compute cosine similarity
    dot = np.dot(a, b)
    norma = np.linalg.norm(a)
    normb = np.linalg.norm(b)
    cos = dot / (norma * normb)
    
    # use library, operates on sets of vectors
    aa = a.reshape(1,3)
    ba = b.reshape(1,3)
    cos_lib = cosine_similarity(aa, ba)
    
    print(
        dot,
        norma,
        normb,
        cos,
        cos_lib[0][0]
    )
    

    The values might differ a slight bit on the smaller decimals. On my computer I get:

    • 0.9449111825230682 (manual)
    • 0.9449111825230683 (library)
  • How to do backpropagation in Numpy

    I have adapted an example neural net written in Python to illustrate how the back-propagation algorithm works on a small toy example.

    My modifications include printing, a learning rate and using the leaky ReLU activation function instead of sigmoid.

    import numpy as np
    
    # seed random numbers to make calculation
    # deterministic (just a good practice)
    np.random.seed(1)
    # make printed output easier to read
    # fewer decimals and no scientific notation
    np.set_printoptions(precision=3, suppress=True)
    
    # learning rate
    lr = 1e-2
    
    # sigmoid function
    def sigmoid(x,deriv=False):
        if deriv:
            result = x*(1-x)
        else:
            result = 1/(1+np.exp(-x))
        return result
    
    # leaky ReLU function
    def prelu(x, deriv=False):
        c = np.zeros_like(x)
        slope = 1e-1
        if deriv:
            c[x<=0] = slope
            c[x>0] = 1
        else:
            c[x>0] = x[x>0]
            c[x<=0] = slope*x[x<=0]
        return c
    
    # non-linearity (activation function)
    nonlin = prelu # instead of sigmoid
    
    # initialize weights randomly with mean 0
    W = 2*np.random.random((3,1)) - 1
    
    # input dataset
    X = np.array([  [0,0,1],
                    [0,1,1],
                    [1,0,1],
                    [1,1,1] ])
    # output dataset            
    y = np.array([[0,0,1,1]]).T
    
    print('X:\n', X)
    print('Y:\n', y)
    print()
    
    for iter in range(1000):
    
        # forward propagation
        l0 = X
        l1 = nonlin(np.dot(l0,W))
    
        # how much did we miss?
        l1_error = y - l1
    
        # compute gradient (slope of activation function at the values in l1)
        l1_gradient = nonlin(l1, True)    
        # set delta to product of error, gradient and learning rate
        l1_delta = l1_error * l1_gradient * lr
    
        # update weights
        W += np.dot(l0.T,l1_delta)
        
        if iter % 100 == 0:
            print('pred:', l1.squeeze(), 'mse:', (l1_error**2).mean())
    
    print ("Output After Training:")
    print ('l1:', np.around(l1))
    
  • Neural networks on GPUs: cost of DIY vs. Amazon

    I like to dabble with machine learning and specifically neural networks. However, I don't like to wait for exorbitant amounts of time. Since my laptop does not have a graphics card that is supported by the neural network frameworks I use, I have to wait for a long time while my models git fitted. This is a problem.

    The solution to the problem is to get access to a computer with a supported Nvidia GPU. Two approaches are to either get my own rig or rent one from Amazon. Which is cheaper?

    Cost analysis

    I will assume that I will train models on my machine (whether local or at Amazon) for two hours every day.

    The Amazon p2 range of EC2 machines come with Nvidia K80 cards, which costs about 50.000 DKK. Already this analysis is going to be difficult; I will not buy a computer that costs 50.000 DKK just to train NN models. So, in this analysis I will be comparing apples to oranges, but that is how it is.

    Cost of Amazon

    The p2.xlarge EC2 instance has a single K80 GPU, which is at least as good as any rig I would consider buying.

    The on-demand prie is $0.9/hour; the spot price about five times cheaper. Usage for two hours every day for a whole year costs 4.500 DKK for on-demand and 900 DKK for spot instances. However, the p2 instances is sometimes unavailable in the European spot markets.

    Cost of DIY

    What is the best GPU to get for a DIY machine learning rig? In 2016, Quora answers suggested that the Nvidia cards Titan X and GTX980TI would be best. Let's go with that.

    This is quite a bit more than 4.500 DKK and that is only for the graphics card. The finished rig would probably cost around 15.000 DKK (Titan) and 10.000 DKK (GTX).

    The electricity also has to be factored in, plus that the cards are basically slower than the K80.

    Best choice for increased usage

    With increased usage the DIY approach will become cheaper than Amazon, albeit still a slower option. With usage of 5 or 7 hours/day the DIY approaches break even after a year.

    Further reading

    Build a deep learning rig for $800.

  • (Tentative)

    Symbiosen mellem mennesker og AI vil kunne transformere mennesket til en rationel organisme (jvf. Daniel Kahneman som har påvist at mennesket for sig selv ikke er en rationel organisme). Hvordan det? Vores minutiøse adfærd bliver i stigende grad sporet i alle livets væsentlige forhold. Kunstig intelligens bliver bedre og bedre til at skønne om vi er glade, sunde og rige udfra en analyse af alle de spor vi efterlader os overalt. Vi står nu i en situation hvor vi kan - eller snart kan - stille spørgsmål som: hvor glad, sund og rig var person X til tiden t? Hvilke handlinger h1, h2, h3, ... havde person X udført (f.eks. på Spotify, rejser, jobskifte, lægebesøg) som ledte op til dette øjeblik? Hvor glad vil X være til tiden t+1, t+10, t+1000 hvis alting fortsætter som nu? Hvilke handlinger skal X udføre for at maksimere sin glæde til tiden t+1000?
    Med andre ord, der er komplekse livsområder hvor kompleks AI har et potentiale for maksimere vore long-term utility (f.eks. vores "livsglæde" eller formue om 10 år). Forstil dig at en personlig AI kan
    - Finde din næste bolig
    - Finde en skole/fritidsaktivitet til dit barn
    - Finde investeringsobjekter
    - Finde kærlighed
    - Finde venner
    - Finde dit næste måltid
    - o.s.v.

  • What kind of Machine Learning person are you?

    You may ask yourself, if I'm a machine learning person then what kind am I? See for yourself in Jason Eisner's Three Cultures of Machine Learning.

  • PyBrain quickstart and beyond

    After pip install bybrain, the PyBrain the quick start essentially goes as follows:

    from pybrain.tools.shortcuts import buildNetwork
    from pybrain.structure import TanhLayer
    from pybrain.datasets import SupervisedDataSet
    from pybrain.supervised.trainers import BackpropTrainer
    
    # Create a neural network with two inputs, three hidden, and one output
    net = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
    
    # Create a dataset that matches NN input/output sizes:
    xor = SupervisedDataSet(2, 1)
    
    # Add input and target values to dataset
    # Values correspond to XOR truth table
    xor.addSample((0, 0), (0,))
    xor.addSample((0, 1), (1,))
    xor.addSample((1, 0), (1,))
    xor.addSample((1, 1), (0,))
    
    trainer = BackpropTrainer(net, xor)
    #trainer.trainUntilConvergence()
    for epoch in range(1000):
        trainer.train()
    

    However, it does not work, which can be seen by running the following test?

    testdata = xor
    trainer.testOnData(testdata, verbose = True)  # Works if you are lucky!
    

    Kristina Striegnitz code has written and published an XOR example that works more reliably. The code is effectively reproduced below, in case the original should disappear:

    # ... continued from above
    
    # Create a recurrent neural network with four hidden nodes (default is SigmoidLayer) 
    net = buildNetwork(2, 4, 1, recurrent = True)
    
    # Train the network using arguments for learningrate and momentum
    trainer = BackpropTrainer(net, xor, learningrate = 0.01, momentum = 0.99, verbose = True)
    for epoch in range(1000):
        trainer.train()
    
    # This should work every time...
    trainer.testOnData(testdata, verbose = True)