You can use the shell to extract a random sample of lines from a file in *nix. The two commands you need are "shuf" and "head" (+ "tail" for CSV files with a header). The shuf command will randomly shuffle all the lines of its input. The head command will cut of the input after the first k lines. Examples for both general files and CSV files are given below.
General pattern
To randomly sample 100 lines from any file in *nix:
shuf INPUT_FILE | head -n 100 > SAMPLE_FILE |
Pattern for CSV
If you file is a CSV file, you probably want to extract the header and only sample the body. You can use the head and tail commands, respectively, to extract the header and sample the contents of the CSV file.
Extract the header of the CSV file:
head -1 INPUT_FILE.csv > SAMPLE_FILE.csv |
Sample 100 lines from the body of the CSV file and append to sample file (notice ">" above versus ">>" below):
tail +2 INPUT_FILE.csv | shuf | head -100 >> SAMPLE_FILE.csv |
Install dependencies on Mac
On Mac, the shuf command is not shipped with the OS. You can get it via brew. It will be named "gshuf":
brew install coreutils |
So, on Mac you should replace shuf with gshuf in the example above.