Category: Cloud

  • How to use non-default profile in boto3

    Given an AWS credentials file that looks like this: [default] aws_access_key_id = DEFAULT aws_secret_access_key = SECRET1 [dev] aws_access_key_id = DEV aws_secret_access_key = SECRET2 [prod] aws_access_key_id = PROD aws_secret_access_key = SECRET3 You can use any profile, say dev, like this in Python: import boto3.session dev = boto3.session.Session(profile_name=’dev’) s3 = dev.resource(‘s3’) for bucket in s3.buckets.all(): print(bucket.name) print(”)

  • Twitter HyperLogLog monoids in Spark

    Want to count unique elements in a stream without blowing up memory? In more specific words, do you want to use a HyperLogLog counter in Spark? Until today, I’d never heard the word “monoid” before. However, Twitter Algebird is a project that contains a collection of monoids including a HyperLogLog monoid, which can be used […]

  • Word-count exercise with Spark on Amazon EMR

    This is a mini-workshop that shows you how to work with Spark on Amazon Elastic Map-Reduce; It’s a kind of hello world of Spark on EMR. We will solve a simple problem, namely use Spark and Amazon EMR to count the words in a text file stored in S3. To follow along you will need […]