Really the title should be “the joy of UNIX”, but this is funnier.
I love text files – you can do so much with them. Today I had a problem at work where a user was telling me that a file was broken because they were only identifying ~600 unique identifiers in a file > 214,000 lines long. The problem was that they were loading it into Excel which only supports 64K rows. I had to back this up though, and this is where my love of text files (and UNIX utils) comes to the fore.
Each identifier has multiple rows in the file, and I just wanted to count the unique identifiers in the original file and one I’d truncated to 64K lines. This is on a Windows box. I spent a little while considering the options (loading it into a database table, using something like Log Parser), then remembered good old awk and grep, and the fact that I had Cygwin installed. A very simple awk script (to only print the identifier if the value of another column was correct – this easily reduced it to a list of unique identifiers) from which the output was piped to grep to count the number of identifiers, and it was job done.
The only thing I was missing was a util to truncate the file – I had to do it by hand. I’m sure something exists, I’m just not aware of it.