Thursday, May 11, 2006

Monitor Web w/ googs.pl

I wanted a way to monitor the web for certain terms(i.e. leaked info on a company). For example, being able to have an arrary of search terms and operators to query aganist, and then email me a nice little html report. This is the reason for googs.pl.

I used google API to do the querys. I also (although not sure how well it works) append the google operator daterange: which needs the julian date, thus hoping to only return new results that day and only email me if it does find new ones. This way I don't have to look at old stuff all the time or get tons of email. You can comment that feature out if you dont want it. To figure the date I used the perl module Cal::Date which I posted a link below. Then I just set it up in a cronjob to run everyday.


# Devin Ertel
# googs.pl
#
#!/usr/bin/perl

use strict;
use SOAP::Lite;
use MIME::Lite;
use Net::SMTP;
use Cal::Date qw(DJM MJD today);

#Get Todays Date
my $date = today();

#convert to julian
my $jul_today= DJM($date);

#Put Your Google API Key Here
my $google_key='your_google_key_here';

#Google WSDL File Location
my $google_wsdl = "./GoogleSearch.wsdl";

#Put querys here, escape any "'s with \"
my $query;
my @query = ("company + hacking",
"allintext:company + hacking",
"your querys"
);


#assign current julian date to query
my $goog_daterange = " + daterange:".$jul_today."-".$jul_today;

#SOAP::Lite instance with GoogleSearch.wsdl.
my $google_soap = SOAP::Lite->service("file:$google_wsdl");


#Set Up Mail Vars
my $faddy = 'from_address@blah.com';
my $taddy = 'to_address@blah.com';
my $mail_host = 'your_mail_host';

my $subject = "New Information Posted!";
my $msg_body ="";

#Its Google Time

#Loop Through Array of Querys
foreach $query (@query){

#add daterange: operator to curren query
my $query_date=$query.$goog_daterange;

my $results = $google_soap ->
doGoogleSearch(
$google_key, $query_date , 0, 10, "false", "", "false",
"", "latin1", "latin1"
);

# Exit On No Results
@{$results->{resultElements}} or exit;

# Loop Results and Output to HTML
foreach my $result (@{$results->{resultElements}}) {

#had to take brackets out for this post for the html breaks and lines
$msg_body .= "br".
$result->{'title'}."br".
"a href=".$result->{URL}.">".$result->{URL}."/a br".
$result->{snippet}.
"
hr";

}
}
#Setup Message

my $msg=MIME::Lite->new (
From => $faddy,
To => $taddy,
Subject => $subject,
Type => 'TEXT/HTML',
Encoding => 'quoted-printable',
Data => $msg_body,
) or die "Could Not Create Msg: $!\n";


#Send Message
MIME::Lite->send('smtp', $mail_host, Timeout=>60);
$msg->send;


References:
http://freshmeat.net/projects/caldate/
http://www.google.com/apis/
http://search.cpan.org/~yves/MIME-Lite-3.01/lib/MIME/Lite.pm

No comments: