Note: Since writing this post, I've learned about the fileinput module, which turns most of the following into a oneliner:
import fileinput for line in fileinput.input(): process(line) |
It works both with stuff you pipe into the program, or if you use a filename as argument.
Read on for my old post
In this post I'll show how you could process a file, line-by-line with some "plugable" Python code (see code listings below):
cat some_file | line_by_line.py <line_proc_module> <line_proc_module> ... |
In the example above, a "line proc module" is any python module that contains a function called proc_line that takes a string, and returns another string. Such as the following:
# Example of proc_line def proc_line(line): return line # Hint: Do something to the line before returning |
Example: Uppercasing
In this example the user has written a "line proc module" called "uppercase" (see code listings below):
$ echo "bla bla bla" | ./line_by_line.py uppercase BLA BLA BLA |
Example: Chaining
Here the user has written two "line proc modules", and chained them together (see code listings below):
echo "bla bla bla" | ./line_by_line.py uppercase leet BL4 BL4 BL4 |
First "uppercase" is applied, then "leet" is applied, to each line.
Code listings
uppercase.py:
def proc_line(line): return line.upper() |
leet.py:
def proc_line(line): return line.replace("A", "4").replace("a", "4") |
line_by_line.py:
#!/usr/bin/python import sys def main(argv): usermodules = [] if len(argv) < 2: print "Usage: line_by_line.py <name-of-module> ..." print "module most contain a function proc_line(line)" return 1 try: for i in range(1, len(argv)): usermodules.append(__import__( argv[i] )) except: print 'Failed to import module "%s"' % (argv[1]) return 1 line = sys.stdin.readline() try: while line: # do something to the line and print the result for mod in usermodules: line = mod.proc_line(line) print line # fetch new line line = sys.stdin.readline() except EOFError: return None return 0 if __name__ == "__main__": main(sys.argv) |