Category: AI

  • How to call OpenAI’s ChatGPT API

    Here is how you can call OpenAI’s ChatGPT API, given that you have an API key. Follow these instructions to get one.

    import openai
    
    openai.api_key = ''
    
    openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"},
            {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
            {"role": "user", "content": "Where was it played?"}
        ]
    )
    

    This will print out something like the following:

    <OpenAIObject chat.completion id=chatcmpl-xxx at xxx> JSON: {
      "id": "chatcmpl-xxx",
      "object": "chat.completion",
      "created": 1689160553,
      "model": "gpt-3.5-turbo-0613",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 53,
        "completion_tokens": 17,
        "total_tokens": 70
      }
    }
  • How to sort numbers with an evolutionary algorithm (CMA-ES)

    Yes, this is clearly nonsense. Sorting is not a hard problem and standard algorithms such as quicksort and mergesort have O(x^2) and O(n log(n)) complexity. But let me scratch this itch of sorting numbers using an evolutionary algorithm, specifically Covariance matrix adaptation evolution strategy (CMA-ES). Technically, we will use what I think is the original library by the inventor of the method, Nikolaus Hansen.

    In python, we will make use of these two libraries:

    import cma  # pip install cma
    import numpy as np  # pip install numpy
    

    Solving without constraints

    CMA-ES like other metaheuristic uses the concept of a fitness function to search for good or optimal solutions to a problem. The algorithm does not need to know the structure of your problem as all knowledge is encapsulated in the fitness function. The algorithm generates candidate solutions and evaluates them with the fitness function. The fitness of solutions is used to generate the next batch of solutions until convergence (fitness = 0).

    While you can define feasibility of solutions by providing constraints, this is not a requirement. Therefore, we first will try to solve the toy problem without constraints.

    Fitness function and initial solution

    For CMA-ES to work, you must provide a fitness function that is used to evaluate solutions. For sorting, we define our fitness function as the euclidean distance between a solution x to the optimal solution xopt, which is a sorted list.

    Clearly, it is nonsense that we must first have the optimal solution in order to define the fitness function. Why search if we already have the answer? But again, this is just a simple example chosen so we can focus on the method, not the application.

    In addition to a fitness function, you must also provide a seed solution, which we will call x0. The algorithm will start from x0 and search for better solutions using that as a starting point. Conceptually, a bunch of "neighbours" are evaluated in each step and the direction of search is determined by computing their fitness. Most metaheuristics will intensify search in promising neighbourhoods and ignore the less promising ones. You can read more about CMA-ES on Wikipedia.

    Below we will setup the problem by defining the fitness function ff and an initial solution x0.

    # Optimal solution (used in fitness function)
    n = 40
    xopt = np.arange(n).astype(float)
    # [0, 1, ..., 38, 39]
    
    # fitness function, the euclidean distance x -> xopt
    ff = lambda x: np.linalg.norm(xopt-x)
    
    # Initial solution, a random permutation of the optimal solution
    x0 = np.random.permutation(xopt)
    # [26, 16, ..., 38, 12]
    
    # initial standard deviation
    sigma0 = 0.5
    

    Now that we have defined the fitness function, we can forget that we ever knew the optimal solution. It is however embedded in the fitness function. For your problem, your would have some meaningful way of defining the fitness of a solution. Keep in mind that a value of 0 means perfect fitness, where larger values mean worse fitness.

    Optimise: using wrapper API

    First, we can optimise using the wrapper functions provided by cma. For some reason, using these wrappers results in slower convergence and I don’t know why. There are several ways to use the wrapper API. Below you see two different ways, which I believe are equivalent:

    # method 1
    es = cma.CMAEvolutionStrategy(x0, sigma0)
    es.optimize(ff)
    xbest = es.result.xbest
    
    # method 2
    xbest, es = cma.fmin2(ff, x0, 0.5)
    
    print(xbest.round(0))
    

    Optimise: using stop-ask-tell

    Next, we will solve the problem without the wrappers. The cma library uses a stop-ask-tell protocol.

    • Stop: returns true if the algorithm has converged
    • Ask: the algorithm returns the current pool of solutions
    • Tell: the user provides a fitness value for each solution

    While the following code is slightly longer, it converges faster for some reason. Again, I don’t know why. It is however an equivalent way to solve the problem and also finds the optimal solution.

    es = cma.CMAEvolutionStrategy(x0, 0.5)
    
    fvals = []  # used for plotting later
    while not es.stop():
        solutions = es.ask()
        fitness = np.array([ff(xopt, x) for x in solutions])
        fvals.append(fitness.min())
        es.tell(solutions, fitness)
    
    xbest = es.result_pretty().xbest.round(precision)
    print(xbest.round(0))
    

    The program finds the solution in less than 1 second on my laptop (Mac Book Pro M2). Is that impressive? Well, it is to some degree. The solution space is essentially any combination of 40 real numbers, since we did not specify any constraints on the values. You can specify constraints in the CMA library, but that is a topic for another day.

    Adding constraints

    You may provide constraints to CMA-ES as a vector-valued function, g_i, which defines a solution x as feasible if and only if g_i(x) ≤ 0 for all i. The following code is based on the example notebook from the pycma website. The structure of the code is almost identical from what we had before. The only difference is that we now combine the old fitness and new constraint function into a new "special" fitness function that is used during optimisation.

    For our problem of sorting numbers, we want to enforce the constraint that any number must be less than or equal to any numbers to the right of it. If you think about it, that is the same as saying the numbers must be sorted. This means that our initial solution, which is just a permutation of sorted numbers, will be infeasible with near 100% probability.

    Modified example that adds a constraint:

    # Create constraint function
    def constraints(x):
        # x_i must be less than or equal to all x's to the right
        return [xi - x[i:].max() for i, xi in enumerate(x)]
    
    # Combine old fitness function and constraints
    # This is used in place of the old fitness function
    ffc = cma.ConstrainedFitnessAL(ff, constraints)
    
    es = cma.CMAEvolutionStrategy(x0, 0.5)
    
    while not es.stop():
        solutions = es.ask()
        fitness = np.array([ffc(x) for x in solutions])
        es.tell(solutions, fitness)
    
    xbest = es.result.xbest
    

    For this particular problem, adding the constraint seemingly does not help the problem converge faster. Maybe it already converges as fast as it can, and the constraint just adds overhead and an initial scramble for a feasible solution?

    Early stopping

    Plotting the fitness value as it evolves over time, it is clear that we could have stopped earlier with a pretty good solution. Maybe a pretty good solution does not make sense for sorting, but it would make sense in many other scenarios, such as financial optimisation, where there is a significant amount of uncertainty.

    Plot fitness over time:

    import matplotlib.pyplot as plt
    
    plt.plot(fvals)
    plt.xlabel('Time')
    plt.ylabel('Fitness')
    plt.title('Fitness over time')
    plt.show()
    

  • How to compute and plot Bollinger Bands® in Python

    The aim is to produce a plot like this. The orange line is your data, the green line is the upper "bollinger" band, the blue line is the lower "bollinger" band. The red dots indicate where your data is either above or below the bands.

    Copy-paste this code:

    import pandas as pd
    import numpy as np
    from matplotlib import pyplot as plt
    
    N = 100
    XMAX = 5
    WINMA = 10
    ALPHA = 2
    
    def get_bollinger(data, winma=10, alpha=2):
        ser = pd.Series(data)
        ma = ser.rolling(winma).mean()
        std = ser.rolling(winma).std()
        lower = pd.Series(ma - alpha*std).fillna(method='bfill').values
        upper = pd.Series(ma + alpha*std).fillna(method='bfill').values
        return lower, upper
    
    def get_alerts(data, lower, upper):
        low = np.argwhere(data < lower)
        high = np.argwhere(data > upper)
        return low, high
    
    if __name__=='__main__':
    
        X = np.linspace(0.0, XMAX, num=N)
        data = np.sin(X) + np.random.random(N)
        lower, upper = get_bollinger(data, winma=WINMA, alpha=ALPHA)
        low, high = get_alerts(data, lower, upper)
        for i in low:
            plt.plot(X[i], data[i], 'ro')
        for i in high:
            plt.plot(X[i], data[i], 'ro')
        plt.plot(X, lower)
        plt.plot(X, data)
        plt.plot(X, upper)
        plt.show()
  • How to scrape images from the web

    I’m interested in object detection and other computer vision tasks. For example, I’m working on a teddy-bear detector with my son.

    So, how do you quickly download images for a certain category? You can use this approach that I learned from a course on Udemy.

    # pip install icrawler
    from icrawler.builtin import GoogleImageCrawler
    
    keywords = ['cat', 'dog']
    for keyword in keywords:
        google_crawler = GoogleImageCrawler(
            parser_threads=2,
            downloader_threads=4,
            storage={'root_dir': 'images/{}'.format(keyword)}
        
        )
        google_crawler.crawl(
            keyword=keyword, max_num=10, min_size=(200, 200))

    In the above example, the crawler will find images in two categories — cats and dogs, as if you search for ‘cat’ and ‘dog’ on Google images and downloaded what you found.

    Let’s walk through the parameters used in the code. First, there is the constructor, which is called with three arguments in the example. The most important parameter is storage, which specifies where the images will be stored. Second, we have the call to the crawl function. Here, the max_num parameter is used to specify that at most 10 images per category should be downloaded. The min_size argument specifies that the images must be at least 200 x 200 pixels.

    That’s it. Happy downloading.

  • How to sample from softmax with temperature

    Here is how to sample from a softmax probability vector at different temperatures.

    import numpy as np
    import matplotlib.pyplot as plt
    import matplotlib as mpl
    import seaborn as sns
    
    mpl.rcParams['figure.dpi']= 144
    
    trials = 1000
    softmax = [0.1, 0.3, 0.6]
    
    def sample(softmax, temperature):
        EPSILON = 10e-16 # to avoid taking the log of zero
        #print(preds)
        (np.array(softmax) + EPSILON).astype('float64')
        preds = np.log(softmax) / temperature
        #print(preds)
        exp_preds = np.exp(preds)
        #print(exp_preds)
        preds = exp_preds / np.sum(exp_preds)
        #print(preds)
        probas = np.random.multinomial(1, preds, 1)
        return probas[0]
    
    temperatures = [(t or 1) / 100 for t in range(0, 101, 10)]
    probas = [
        np.asarray([sample(softmax, t) for _ in range(trials)]).sum(axis=0) / trials
        for t in temperatures
    ]
    
    sns.set_style("darkgrid")
    plt.plot(temperatures, probas)
    plt.show()
    

    Notice how the probabilities change at different temperatures. The softmax probabilities are [0.1, 0.3, 0.6]. At the lowest temperatures of 0.01, the dominant index (value 0.6) has near 100% probability of being sampled. At higher temperatures, the selection probabilities move towards the softmax values, e.g. 60% probability for the third index.

  • How to do backpropagation in Numpy

    I have adapted an example neural net written in Python to illustrate how the back-propagation algorithm works on a small toy example.

    My modifications include printing, a learning rate and using the leaky ReLU activation function instead of sigmoid.

    import numpy as np
    
    # seed random numbers to make calculation
    # deterministic (just a good practice)
    np.random.seed(1)
    # make printed output easier to read
    # fewer decimals and no scientific notation
    np.set_printoptions(precision=3, suppress=True)
    
    # learning rate
    lr = 1e-2
    
    # sigmoid function
    def sigmoid(x,deriv=False):
        if deriv:
            result = x*(1-x)
        else:
            result = 1/(1+np.exp(-x))
        return result
    
    # leaky ReLU function
    def prelu(x, deriv=False):
        c = np.zeros_like(x)
        slope = 1e-1
        if deriv:
            c[x<=0] = slope
            c[x>0] = 1
        else:
            c[x>0] = x[x>0]
            c[x<=0] = slope*x[x<=0]
        return c
    
    # non-linearity (activation function)
    nonlin = prelu # instead of sigmoid
    
    # initialize weights randomly with mean 0
    W = 2*np.random.random((3,1)) - 1
    
    # input dataset
    X = np.array([  [0,0,1],
                    [0,1,1],
                    [1,0,1],
                    [1,1,1] ])
    # output dataset            
    y = np.array([[0,0,1,1]]).T
    
    print('X:\n', X)
    print('Y:\n', y)
    print()
    
    for iter in range(1000):
    
        # forward propagation
        l0 = X
        l1 = nonlin(np.dot(l0,W))
    
        # how much did we miss?
        l1_error = y - l1
    
        # compute gradient (slope of activation function at the values in l1)
        l1_gradient = nonlin(l1, True)    
        # set delta to product of error, gradient and learning rate
        l1_delta = l1_error * l1_gradient * lr
    
        # update weights
        W += np.dot(l0.T,l1_delta)
        
        if iter % 100 == 0:
            print('pred:', l1.squeeze(), 'mse:', (l1_error**2).mean())
    
    print ("Output After Training:")
    print ('l1:', np.around(l1))
    
  • No one in ad tech needs to know your name

    I work in the ad tech industry, which means that I track people online for a living. Mainly, I do it because the industry has interesting computer science problems and because the job pays well.
    I will not defend ad tech. Mainly because ad tech is not important enough to humanity to defend. However, I do believe that ad tech’s algorithms are important to humanity because they can be applied to important areas, such as your health, personal finance and education. However, I have a different point today.

    I have a subtle point about privacy. I have noticed that at no point does the ad tech industry need to know who you really are. Ad tech does not need to know what your real name is, what your parents real names are, your actual street address or any other piece of information that identifies you as you to another human being. It is a little bit hard to explain, but I will try. Ad tech is powered by algorithms and these algorithms operate in an abstract space where your true identity is not important. Most ad tech knows you by a random number that was assigned to you. All your interests are also represented by random numbers. The place you live yet another. Ad tech algorithms only care about the relationships between these numbers, not what the numbers actually represent in the real world.

    Here is how it works. You get assigned a random number, e.g. 123, to represent you. Then, ad tech will attempt to link your number, 123, with the numbers of boxes that represent products or services that you might be interested in. For example, a box A could be people who need a vacation and box B could be people who could be tempted to buy a new BMW. Ideally, if you really need a vacation and someone really wants to sell you that vacation, then a connection between 123 and A should be made. From ad tech’s perspective, the number 123 is linked to the box A. The algorithm does not need to use labels like “Alice Anderson” or “Bob Biermann”, because the numbers 1 and 2 will get the job done just fine -- from a mathematical point of view.

    At some point your true identity becomes interesting, long after ad tech has left the scene. At some point, somebody (e.g. a human being or a robot) might need to print your real name and street address on a card box box, put the product you ordered inside and ship it via DHL. Up until that exact point, your name, street address or any other personally identifiable information is utterly unimportant to anybody. Nobody cares and no advertisement algorithm needs to know. I think this is an important point.
    Ad-tech algorithms, if not ad tech itself, can have a massive and positive impact on areas of life that you probably care about. For example, algorithms can help you with your health, personal finance, insurances, education, whether you should buy Bitcoin or Ether today, or whether you should attend job interview A instead of job interview B today, or your kids attend school X or Y. In these areas, relatively un-altered algorithms from ad tech can help. It is important to keep in mind, that again no algorithm needs to know your name in order to work. Not even if that algorithm is looking through your medical record and correlating your stats with the stats of million of other patient records.
    Of course it is true that your real identity can be learned from seemingly anonymised data. It might even be fairly trivial to do so, using good old detective skills. Differential privacy has some fairly hard results in that area. However, the main point is that someone has to make a conscious decision to look into the data on a mission to find you and possibly design a new algorithm for that purpose.

    Now I get to my main point. Yes, ad tech CAN know who you are with some detective work. However, ad tech does not NEED to know who you are in order to work. This is so important because it means that we can potentially harness the power of algorithms in areas of life that matter — without compromising the privacy of anybody. It is not going to be easy to obtain the granular and self-controlled privacy that is needed, but it is worthwhile. And that is why I joined ad tech in the first place, because the computer science problems are interesting and important — and well, interesting and important things tend to pay well.

  • Neural networks on GPUs: cost of DIY vs. Amazon

    I like to dabble with machine learning and specifically neural networks. However, I don't like to wait for exorbitant amounts of time. Since my laptop does not have a graphics card that is supported by the neural network frameworks I use, I have to wait for a long time while my models git fitted. This is a problem.

    The solution to the problem is to get access to a computer with a supported Nvidia GPU. Two approaches are to either get my own rig or rent one from Amazon. Which is cheaper?

    Cost analysis

    I will assume that I will train models on my machine (whether local or at Amazon) for two hours every day.

    The Amazon p2 range of EC2 machines come with Nvidia K80 cards, which costs about 50.000 DKK. Already this analysis is going to be difficult; I will not buy a computer that costs 50.000 DKK just to train NN models. So, in this analysis I will be comparing apples to oranges, but that is how it is.

    Cost of Amazon

    The p2.xlarge EC2 instance has a single K80 GPU, which is at least as good as any rig I would consider buying.

    The on-demand prie is $0.9/hour; the spot price about five times cheaper. Usage for two hours every day for a whole year costs 4.500 DKK for on-demand and 900 DKK for spot instances. However, the p2 instances is sometimes unavailable in the European spot markets.

    Cost of DIY

    What is the best GPU to get for a DIY machine learning rig? In 2016, Quora answers suggested that the Nvidia cards Titan X and GTX980TI would be best. Let's go with that.

    This is quite a bit more than 4.500 DKK and that is only for the graphics card. The finished rig would probably cost around 15.000 DKK (Titan) and 10.000 DKK (GTX).

    The electricity also has to be factored in, plus that the cards are basically slower than the K80.

    Best choice for increased usage

    With increased usage the DIY approach will become cheaper than Amazon, albeit still a slower option. With usage of 5 or 7 hours/day the DIY approaches break even after a year.

    Further reading

    Build a deep learning rig for $800.

  • How AI, robotics and advanced manufacturing could impact everybody’s life on Earth

    What if everybody could live a wealthy, healthy, job-less and creative life in a post-scarcity Universe? Are we currently on a trajectory to this new reality and what are the obstacles we may face on the way? What are the important game-changing technologies?

    TODO: create and agenda (very tentative):

    1) contrast current life circumstances with a potential future
    2) identify the key problems that we could solve with technology
    3) review the players in society that will take part in this change
    3) contrast views on the opportunities and threats of these technologies
    4) ...

    Our future life conditions here on Earth might soon be impacted by game-changing advancements in artifical intelligence, robotics, manufacturing and genetics; at least if you ask people like Elon Mush, Andrew Ng and Ray Kurzweil. What are the most important technologies and what is the impact they might have? What are the dangers? Opinions differ so the intention here is to review and contrast what leading fiction writers, scientists, visionaries and entrepreneurs think about the question: how will AI, robots, and advanced manufacturing impact everybody's life circumstances here on Earth?

    Fiction to be reviewed

    Post-scarcity:
    - The Culture series

    AI:
    - Asimov

    The Human-Computer Cortex:
    - That Swedish guy who wrote sci-fi computer implants in the 70's

    Non-fiction to be reviewed

    AI:
    - Douglas Hofstadter: GEB

    Videos to be reviewed

    AI:


    The Human-Computer Cortex:

    News articles to be reviewed

    AI:
    - https://aifuture2016.stanford.edu/
    - http://fortune.com/2016/06/15/future-of-work-2/
    - http://www.businessinsider.com/researchers-predictions-future-artificial-intelligence-2015-10?r=US&IR=T&IR=T

    3D printing:
    - https://hbr.org/2013/03/3-d-printing-will-change-the-world