Friday, May 25, 2012

Loading an Excel Spreadsheet

A lot of what I'll be writing about is based on code I'm using in production.  It may not be the most optimized, the most "correct" or even all that elegant.  But it works for me.  If you're just getting started with perl, you may find this interesting.

I load a lot of data from many sources into a database for further analysis.  One of the common forms of data files is an excel spreadsheet.  A client's system may produce a daily log of activity, and I need to aggregate that into one place to study long term trends.

Storing the files in a sub-directory (folder to you windows users), and running a perl to load all those files into a database is an easy way to get the data where I need it.  See last weeks post for the perl to cycle through all files in a sub-directory.

You'll need to add this statement after the "use strict;" statement:

use Spreadsheet::ParseExcel;

You'll notice a couple of comments in the code about "having your way with it".  Once of the things you can do is load the excel file into arrays for later fun stuff.  Here's the code you can insert at either of these comments:

    my $workbook = Spreadsheet::ParseExcel::Workbook->Parse($CurrentFile);
    my($row, $col, $sheet, $WkC);
    foreach my $sheet (@{$workbook->{Worksheet}}) {

        for (my $row = $sheet->{MinRow}; defined $sheet->{MaxRow} && $row <= $sheet->{MaxRow} ; $row++) {

            for (my $col = $sheet->{MinCol};  defined $sheet->{MaxCol} && $col <= $sheet->{MaxCol} ; $col++) {

        # ----- Header Row
                if ($row == $sheet->{MinRow}) {
                    my $column_name = $sheet->{Cells}[0][$col]->{Val};
                    $FieldName[$col] = $column_name;
                } else {

                    $WkC = $sheet->{Cells}[$row][$col];
                    # ----- you could decide to skip a column 

                    # ----- if there's no field name in row 1
                    # next if ( !(defined($FieldName[$col])) || ($FieldName[$col] eq '') ); # skip row if it doesn't have a header

                    # ----- cleanup any data here
                    if (($WkC) && ($WkC->Value ne '') && ($FieldName[$col] eq 'SomeSpecificFieldName')) {
                        my $temp_value =$WkC->Value;
                        $temp_value =~ s/\'/\'\'/g; # deal with stinking '

            # ----- Here we might do something interesting with each cell data

                    }

            # ----- Here we can do something interesting with every cell data

                } # end of else for row 0 handling

            } # end of column handling

            # ----- Now that you have the row loaded,
            # ----- You can do something interesting with it here


        } # end of row handling

    } # ----- end of worksheet handling



I hope you find this helpful. 

No comments: