Categories
Cloud Programming

How to use non-default profile in boto3

Given an AWS credentials file that looks like this: [default] aws_access_key_id = DEFAULT aws_secret_access_key = SECRET1   [dev] aws_access_key_id = DEV aws_secret_access_key = SECRET2   [prod] aws_access_key_id = PROD aws_secret_access_key = SECRET3[default] aws_access_key_id = DEFAULT aws_secret_access_key = SECRET1 [dev] aws_access_key_id = DEV aws_secret_access_key = SECRET2 [prod] aws_access_key_id = PROD aws_secret_access_key = SECRET3 You can use […]

Categories
Cloud Systems

Twitter HyperLogLog monoids in Spark

Want to count unique elements in a stream without blowing up memory? In more specific words, do you want to use a HyperLogLog counter in Spark? Until today, I’d never heard the word “monoid” before. However, Twitter Algebird is a project that contains a collection of monoids including a HyperLogLog monoid, which can be used […]

Categories
Cloud

Word-count exercise with Spark on Amazon EMR

This is a mini-workshop that shows you how to work with Spark on Amazon Elastic Map-Reduce; It’s a kind of hello world of Spark on EMR. We will solve a simple problem, namely use Spark and Amazon EMR to count the words in a text file stored in S3. To follow along you will need […]