O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


 
Buy the book!
BSD Hacks
By Dru Lavigne
May 2004
More Info

HACK
#17
Delimiter Dilemma
Deal with double quotation marks in delimited files
The Code
[Discuss (0) | Link to this hack]

The Code

The Python script redelim.py implements the preceding algorithm. It prompts the user for the original datafile and a name for the new datafile. The delim and new_delim variables are hardcoded, but those are easily changed within the script.

This script copies a space-delimited text file with text values in double quotes to a new, tab-delimited file without the double quotes. The advantage of using this script is that it leaves spaces that were within double quotes unchanged.

There are no command-line arguments for this script. The script will prompt the user for source and destination file information.

You can redefine the variables for the original and new delimiters, delim and new_delim, in the script as needed.

#!/usr/local/bin/python
import os

print """ Change text file delimiters.

# Ask user for source and target files.
sourcefile = raw_input('Please enter the path and name of the source file:')
targetfile = raw_input('Please enter the path and name of the target file:')

# Open files for reading and writing.
source = open(sourcefile,'r')
dest   = open(targetfile,'w')

# The variable 'm' acts as a text/non-text switch that reminds python
# whether it is working within a text or non-text data field.
tswitch = 1

# If the source delimiter that you want to change is not a space,
# redefine the variable delim in the next line.
delim = ' '

# If the new delimiter that you want to change is not a tab,
# redefine the variable new_delim in the next line.
new_delim = '\t'

for charn in source.read( ):
        if tswitch =  = 1:
              if charn =  = delim:
                       dest.write(new_delim)
              elif charn =  = '\"':
                       tswitch = tswitch * -1
              else:
                       dest.write(charn)
     elif tswitch =  = -1:
              if charn =  = '\"':
                      tswitch = tswitch * -1
              else:
                      dest.write(charn)


source.close( )
dest.close( )

Use of redelim.py assumes that you have installed Python, which is available through the ports collection or as a binary package. The Python module used in this code is installed by default.


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.