Note: Since writing this post, I’ve learned about the fileinput module, which turns most of the following into a oneliner:
import fileinput
for line in fileinput.input():
process(line)
It works both with stuff you pipe into the program, or if you use a filename as argument.
Read on for my old post
In this post I’ll show how you could process a file, line-by-line with some “plugable” Python code (see code listings below):
cat some_file | line_by_line.py ...
In the example above, a “line proc module” is any python module that contains a function called proc_line that takes a string, and returns another string. Such as the following:
# Example of proc_line
def proc_line(line):
return line # Hint: Do something to the line before returning
Example: Uppercasing
In this example the user has written a “line proc module” called “uppercase” (see code listings below):
$ echo "bla bla bla" | ./line_by_line.py uppercase
BLA BLA BLA
Example: Chaining
Here the user has written two “line proc modules”, and chained them together (see code listings below):
echo "bla bla bla" | ./line_by_line.py uppercase leet
BL4 BL4 BL4
First “uppercase” is applied, then “leet” is applied, to each line.
Code listings
uppercase.py:
def proc_line(line):
return line.upper()
leet.py:
def proc_line(line):
return line.replace("A", "4").replace("a", "4")
line_by_line.py:
#!/usr/bin/python
import sys
def main(argv):
usermodules = []
if len(argv) < 2:
print "Usage: line_by_line.py ..."
print "module most contain a function proc_line(line)"
return 1
try:
for i in range(1, len(argv)):
usermodules.append(__import__( argv[i] ))
except:
print 'Failed to import module "%s"' % (argv[1])
return 1
line = sys.stdin.readline()
try:
while line:
# do something to the line and print the result
for mod in usermodules:
line = mod.proc_line(line)
print line
# fetch new line
line = sys.stdin.readline()
except EOFError:
return None
return 0
if __name__ == "__main__":
main(sys.argv)