Friday, May 25, 2012

Loading an Excel Spreadsheet

A lot of what I'll be writing about is based on code I'm using in production.  It may not be the most optimized, the most "correct" or even all that elegant.  But it works for me.  If you're just getting started with perl, you may find this interesting.

I load a lot of data from many sources into a database for further analysis.  One of the common forms of data files is an excel spreadsheet.  A client's system may produce a daily log of activity, and I need to aggregate that into one place to study long term trends.

Storing the files in a sub-directory (folder to you windows users), and running a perl to load all those files into a database is an easy way to get the data where I need it.  See last weeks post for the perl to cycle through all files in a sub-directory.

You'll need to add this statement after the "use strict;" statement:

use Spreadsheet::ParseExcel;

You'll notice a couple of comments in the code about "having your way with it".  Once of the things you can do is load the excel file into arrays for later fun stuff.  Here's the code you can insert at either of these comments:

    my $workbook = Spreadsheet::ParseExcel::Workbook->Parse($CurrentFile);
    my($row, $col, $sheet, $WkC);
    foreach my $sheet (@{$workbook->{Worksheet}}) {

        for (my $row = $sheet->{MinRow}; defined $sheet->{MaxRow} && $row <= $sheet->{MaxRow} ; $row++) {

            for (my $col = $sheet->{MinCol};  defined $sheet->{MaxCol} && $col <= $sheet->{MaxCol} ; $col++) {

        # ----- Header Row
                if ($row == $sheet->{MinRow}) {
                    my $column_name = $sheet->{Cells}[0][$col]->{Val};
                    $FieldName[$col] = $column_name;
                } else {

                    $WkC = $sheet->{Cells}[$row][$col];
                    # ----- you could decide to skip a column 

                    # ----- if there's no field name in row 1
                    # next if ( !(defined($FieldName[$col])) || ($FieldName[$col] eq '') ); # skip row if it doesn't have a header

                    # ----- cleanup any data here
                    if (($WkC) && ($WkC->Value ne '') && ($FieldName[$col] eq 'SomeSpecificFieldName')) {
                        my $temp_value =$WkC->Value;
                        $temp_value =~ s/\'/\'\'/g; # deal with stinking '

            # ----- Here we might do something interesting with each cell data

                    }

            # ----- Here we can do something interesting with every cell data

                } # end of else for row 0 handling

            } # end of column handling

            # ----- Now that you have the row loaded,
            # ----- You can do something interesting with it here


        } # end of row handling

    } # ----- end of worksheet handling



I hope you find this helpful. 

Friday, May 18, 2012

Cycle through all files in a subdirectory

I recently signed up for the perl Ironman challenge at http://www.enlightenedperl.org/ironman.html.  I've had this blog sitting around for a long time (maybe 7 years) but never posted to it.  I've been using my blog "The Red Stallion Patrol" for all my posts, but wanted to separate out my perls of wisdom.  This is a cross post of an item I posted 4 days ago.

Here's the code I use to cycle through all the files in a sub-directory, make sure they exist and have a non-zero size. If this all passes, process the file.  I use this pretty frequently at work to do "something interesting" with all files in a sub-directory (folder to you windows users) that may contain any number of files (probably added to daily) of a certain type.

#!/usr/bin/perl

use warnings;
use strict;


my $DIR = "/some/directory";
my $inputfile = $ARGV[0];


# ----- Go through all the files in a directory
my $InputDir = join("/", $DIR, $inputfile);
opendir (ARCHIVE, $InputDir ) or die "can't opendir ARCHIVE $InputDir: $!";
my @InputDirFiles = grep { -f }                               # select files only
                    map { "$InputDir/$_" }                    # IMPORTANT - prepend directory name
                    grep { ( $_ ne '.' ) and ( $_ ne '..' ) } # throw out dots entries
                    readdir ARCHIVE;
closedir ( ARCHIVE ) or warn "can't closedir ARCHIVE: $!";

local $, = "\n";

foreach my $CurrentFile (@InputDirFiles) {
    # ----- Does the file exist?
    (my $dev,my $ino,my $mode,my $nlink,my $uid,my $gid,my $rdev,my $size,my $atime,my $mtime,my $ctime,my $blksize,my $blocks)
            = stat("$CurrentFile");

    if (-e _) {
        printf "\t$CurrentFile exists.\n";
        if (-z _) {
            printf "\t\tIt is zero bytes\n";
            exit;
        }
        if (-s _) {
            printf "\t\tIt is $size bytes long\n";
            # ----- at this point the file exists and has size.
            #       you could open it and have your way with it.
        }
    } else {
        printf "\tDidn't find $CurrentFile\n";
        exit;
    }

    # ----- You could also wait until this point to
    #       open it and have your way with it.
}