Reading diary August 28, 2012

I’m keeping public track of what I read. Today I read three papers using the three pass method.

First pass

I did a first pass over two papers on distributed file systems. Both papers are by authors Sage A. Weil, Scott A. Brandt, Ethan L. Miller and Darrell D. E. Long of University of California, Santa Cruz.

  • Ceph: A Scalable, High-Performance Distributed File System
  • CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data

Second pass

A gave a Facebook paper a second pass. The paper describes a system for efficient photo storage and retrieval, that minimizes metadata so that it can be kept in memory, and stores a large number of photos in a single physical file, and stores pointers into this file, among other things.

  • Finding a needle in Haystack: Facebook’s photo storage

Find all of these papers on Google Scholar.

How to read computer science papers

Situation: You have a large pile of computer science papers in front of you. You want to read them all. What to do?

My suggestion is that you read the two guides below. They are really short and helpful. I’m one year into my CS PhD, and I still find reading a large pile of papers to be quite hard. Especially if the papers are exploring problems within a field that I’m not super familiar with.

Continue reading “How to read computer science papers”

Benchmark: Reading uncompressed and compressed files from disc

In this post I’ll compare the running time of reading uncompressed and compressed files from disc.

I’ll run a test using two files, data.txt (858M) and data.txt.gz (83M), that have the same content.

About cat and zcat

The well-known command cat, prints the contents of a file. The lesser-known zcat, prints the contents of a GZIP’ed file.

Continue reading “Benchmark: Reading uncompressed and compressed files from disc”