Finding Duplicates with sort and uniq

Imagine this: You have two text files full of information, with one data entry on each line. You want to find out which lines occur in both files. Now, if the files are mostly the same, it’s probably best to use a program called diff. However, if the files are mostly different, you can use this little incantation:

cat file1.txt file2.txt | sort -n | uniq -d

This will join file1 and file2, sort the joined data -numerically, and display only the lines that are not unique (uniq -d).

This came in handy for manipulating electrode files today. Our electrode files just contain lists of node numbers. The simulator gets unhappy when you try to do things with overlapping electrodes, so in this way we were able to remove the offending overlap without too much trouble.

Brock's Blog

Formerly Virtually Shocking

Finding Duplicates with sort and uniq