How to split a log file into smaller files

In this example I had a big log file (many million lines), that I wanted to split into smaller logfiles (each one million lines) for processing on Elastic MapReduce.

-rw-r--r--  1 kostas staff 543067012012 Oct 11 13:45 huge_logfile

This is a job for the split command. Because individual lines in the log file must be kept intact, the -l option is used to specify the number of lines in each file. In this example, certain lines are first filtered out with grep, to show how split is used when data is piped in:

grep 'some-pattern' huge_logfile | split -a 6 -l 1000000 - log_

The dash in the split command is used to accept input from standard input, while the log_ is used as a prefix for generated filenames. The -a 6 option tells split to use a 6 character extension after the prefix when naming files. The output looks like this:

huge_logfile
log_aaaaaa
log_aaaaab
log_aaaaac
log_aaaaad
log_aaaaae
...

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.