Neural networks on GPUs: cost of DIY vs. Amazon

I like to dabble with machine learning and specifically neural networks. However, I don’t like to wait for exorbitant amounts of time. Since my laptop does not have a graphics card that is supported by the neural network frameworks I use, I have to wait for a long time while my models git fitted. This is a problem.

The solution to the problem is to get access to a computer with a supported Nvidia GPU. Two approaches are to either get my own rig or rent one from Amazon. Which is cheaper?

Cost analysis

I will assume that I will train models on my machine (whether local or at Amazon) for two hours every day.

The Amazon p2 range of EC2 machines come with Nvidia K80 cards, which costs about 50.000 DKK. Already this analysis is going to be difficult; I will not buy a computer that costs 50.000 DKK just to train NN models. So, in this analysis I will be comparing apples to oranges, but that is how it is.

Cost of Amazon

The p2.xlarge EC2 instance has a single K80 GPU, which is at least as good as any rig I would consider buying.

The on-demand prie is $0.9/hour; the spot price about five times cheaper. Usage for two hours every day for a whole year costs 4.500 DKK for on-demand and 900 DKK for spot instances. However, the p2 instances is sometimes unavailable in the European spot markets.

Cost of DIY

What is the best GPU to get for a DIY machine learning rig? In 2016, Quora answers suggested that the Nvidia cards Titan X and GTX980TI would be best. Let’s go with that.

This is quite a bit more than 4.500 DKK and that is only for the graphics card. The finished rig would probably cost around 15.000 DKK (Titan) and 10.000 DKK (GTX).

The electricity also has to be factored in, plus that the cards are basically slower than the K80.

Best choice for increased usage

With increased usage the DIY approach will become cheaper than Amazon, albeit still a slower option. With usage of 5 or 7 hours/day the DIY approaches break even after a year.

Further reading

Build a deep learning rig for $800.

(Tentative)

Symbiosen mellem mennesker og AI vil kunne transformere mennesket til en rationel organisme (jvf. Daniel Kahneman som har påvist at mennesket for sig selv ikke er en rationel organisme). Hvordan det? Vores minutiøse adfærd bliver i stigende grad sporet i alle livets væsentlige forhold. Kunstig intelligens bliver bedre og bedre til at skønne om vi er glade, sunde og rige udfra en analyse af alle de spor vi efterlader os overalt. Vi står nu i en situation hvor vi kan – eller snart kan – stille spørgsmål som: hvor glad, sund og rig var person X til tiden t? Hvilke handlinger h1, h2, h3, … havde person X udført (f.eks. på Spotify, rejser, jobskifte, lægebesøg) som ledte op til dette øjeblik? Hvor glad vil X være til tiden t+1, t+10, t+1000 hvis alting fortsætter som nu? Hvilke handlinger skal X udføre for at maksimere sin glæde til tiden t+1000?
Med andre ord, der er komplekse livsområder hvor kompleks AI har et potentiale for maksimere vore long-term utility (f.eks. vores “livsglæde” eller formue om 10 år). Forstil dig at en personlig AI kan
– Finde din næste bolig
– Finde en skole/fritidsaktivitet til dit barn
– Finde investeringsobjekter
– Finde kærlighed
– Finde venner
– Finde dit næste måltid
– o.s.v.

PyBrain quickstart and beyond

After pip install bybrain, the PyBrain the quick start essentially goes as follows:

from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
 
# Create a neural network with two inputs, three hidden, and one output
net = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
 
# Create a dataset that matches NN input/output sizes:
xor = SupervisedDataSet(2, 1)
 
# Add input and target values to dataset
# Values correspond to XOR truth table
xor.addSample((0, 0), (0,))
xor.addSample((0, 1), (1,))
xor.addSample((1, 0), (1,))
xor.addSample((1, 1), (0,))
 
trainer = BackpropTrainer(net, xor)
#trainer.trainUntilConvergence()
for epoch in range(1000):
    trainer.train()

However, it does not work, which can be seen by running the following test?

testdata = xor
trainer.testOnData(testdata, verbose = True)  # Works if you are lucky!

Kristina Striegnitz code has written and published an XOR example that works more reliably. The code is effectively reproduced below, in case the original should disappear:

# ... continued from above
 
# Create a recurrent neural network with four hidden nodes (default is SigmoidLayer) 
net = buildNetwork(2, 4, 1, recurrent = True)
 
# Train the network using arguments for learningrate and momentum
trainer = BackpropTrainer(net, xor, learningrate = 0.01, momentum = 0.99, verbose = True)
for epoch in range(1000):
    trainer.train()
 
# This should work every time...
trainer.testOnData(testdata, verbose = True)