Extracting certain lines from a file using Perl

As often happens to me when processing data, I needed to extract a specific set of lines from many files. If the files are small or few, or the list of lines is short, this can be done manually. If the list of lines or number of files are large, well, that’s what computers are for.

As Google will reveal, there are a number of ways to hack this up with bash, grep, sed, and other command-line tools, but none (that I know of) are really designed for this.

Here’s my script. It’s designed around some heart model stuff, and we use base-0 (first line is line 0, second line is line 1, etc), while many other tasks require base-1 (first line is line 1, second line is line 2, etc), and I’m sure there are other numbering schemes out there. Therefore, I included an optional base parameter. Without that it assumes base 1.

Note: I have no idea if this works on non-UNIX-like systems (i.e. Windows).

Use it like:

extract_lines.pl <file with list of lines> <file from which to extract lines> [optional base number]

And now, the code:

#!/usr/bin/env perl

use strict;

unless(@ARGV == 2 || @ARGV == 3){
    die "Usage: extract_lines.pl <line number file> <source file> [base]\n";
}

open(NUMBERS, "<$ARGV[0]") || die "Failed to open line number file $ARGV[0] for read: $!\n";
chomp(my @numbers = <NUMBERS>);
close(NUMBERS);

open(SOURCE, "<$ARGV[1]") || die "Failed to open source file $ARGV[1] for read: $!\n";
chomp(my @source = <SOURCE>);
close(SOURCE);

my $base = 1;

if(@ARGV == 3){
    $base = $ARGV[2];
}

# sort inputs just in case
my @sorted = sort { $a <=> $b } @numbers;

# save mem
undef @numbers;

my $nextline;

$nextline = shift(@sorted);
my $currline = $base;

foreach my $line (@source){
    if($nextline == $currline){
        print $line . "\n";
        if(@sorted > 0){
            $nextline = shift(@sorted);
        }
    }
    $currline++;
}

Bug reports welcome, just leave a comment.

One thought on “Extracting certain lines from a file using Perl

Comments are closed.