# HG changeset patch # User Matti Hamalainen # Date 1295203878 -7200 # Node ID 8003090caa3598d45c29a95f64eec3bfb7c94a91 # Parent 0bc9c1115d42465cd86b4d922d043e9bb95995fa Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube. diff -r 0bc9c1115d42 -r 8003090caa35 urllog.tcl --- a/urllog.tcl Sun Jan 16 18:48:11 2011 +0200 +++ b/urllog.tcl Sun Jan 16 20:51:18 2011 +0200 @@ -1,7 +1,7 @@ ########################################################################## # # URLLog v1.99.12 by ccr/TNSP -# (C) Copyright 2000-2010 Tecnic Software productions (TNSP) +# (C) Copyright 2000-2011 Tecnic Software productions (TNSP) # ########################################################################## # @@ -48,20 +48,6 @@ ### -### WWW page creation options -### -# 1 = Create the www pages, 0 = Don't. -set urllog_makewww 0 - -# TIP: If you don't want the bot to create the HTML-file, you can -# use a simple Perl/Ruby/Python/PHP/whatnot-scripted page to do that. - -# Filename AND FULL PATH of the html-file where the www-format log goes -# (Remember to set the permissions right after the file has been created) -set urllog_webfile "/home/niinuska/public_html/urllog.html" - - -### ### Search related settings ### @@ -161,14 +147,9 @@ bind topc - *.* urllog_checkmsg bind msg - paska urllog_checkmsg2 -if {$urllog_makewww != 0} { - bind dcc m dowebfile urllog_dowebfile - bind time - "*0 % % % %" urllog_timer -} - ### Initialization messages -set urllog_message "$urllog_name v$urllog_version (C) 2000-2010 ccr/TNSP" +set urllog_message "$urllog_name v$urllog_version (C) 2000-2011 ccr/TNSP" putlog "$urllog_message" if {$http_proxy != 0} { @@ -183,10 +164,6 @@ putlog " (Verbose mode enabled)" } -if {$urllog_makewww != 0} { - putlog " (Creating WWW page to $urllog_webfile)" -} - if {$urllog_search != 0} { putlog " (Search commands enabled)" } @@ -213,18 +190,6 @@ } -proc urllog_dowebfile {hand idx text} { -global urllog_name urllog_makewww - -if {$urllog_makewww == 1} { - urllog_log "Making webfiles..." - urllog_make_web_file -} else { - urllog_log "Webfile disabled." -} -} - - proc urllog_isnumber {uarg} { set ufoo 1 @@ -268,54 +233,6 @@ #------------------------------------------------------------------------- -proc urllog_make_web_file {} { -global urllog_file urllog_webfile urllog_message botnick - -# Starting message -urllog_log "Creating HTML-file for WWW..." - -# Open files -set fd [open $urllog_webfile w] -set fd2 [open $urllog_file r] - -# HTML headers -puts $fd "Caught URLs" -puts $fd "" -puts $fd "" -puts $fd "
URLs caught by $botnick

" -puts $fd "" - -# Process database, convert to links & info -while {![eof $fd2]} { - gets $fd2 foo - if {$foo != ""} { - regsub -all "<|>|\"" $foo "" foo - set foo [split $foo " "] - puts $fd "[lindex $foo 0]
Added on [urllog_ctime [lindex $foo 1]] by [lindex $foo 2]

" - } - } - -# HTML footers -puts $fd "
Generated by $urllog_message
" -puts $fd "(Last updated [urllog_ctime [unixtime]])
" -puts $fd "" - -# Close files -close $fd -close $fd2 - -# OK-message -urllog_log "HTML-file generated OK." -} - - -#------------------------------------------------------------------------- -proc urllog_timer {umin uhour uday umonth uyear} { - urllog_make_web_file -} - - -#------------------------------------------------------------------------- proc urllog_get_short {utime} { global urllog_shorturl urllog_shorturl_prefix set ustr "ABCDEFGHIJKLNMOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" @@ -342,298 +259,306 @@ #------------------------------------------------------------------------- proc urllog_addurl {urlStr urlNick urlHost urlChan urlTitle} { -global urlmsg_alreadyknown urllog_file urllog_shorturl + global urlmsg_alreadyknown urllog_file urllog_shorturl -### Let's check if we already know the URL -set fd [open $urllog_file a+] -close $fd + ### Let's check if we already know the URL + set fd [open $urllog_file a+] + close $fd -set fd [open $urllog_file r] -set urlID -1 -while {![eof $fd]} { - set qitems [split [gets $fd] " "] - set qindex [lindex $qitems 4] - if {$qindex != "" && $qindex > $urlID} { - set urlID $qindex + set fd [open $urllog_file r] + set urlID -1 + while {![eof $fd]} { + set qitems [split [gets $fd] " "] + set qindex [lindex $qitems 4] + if {$qindex != "" && $qindex > $urlID} { + set urlID $qindex + } + if {[lindex $qitems 0] == $urlStr} { + urllog_log "URL said by $urlNick ($urlStr) already known" + if {$urllog_shorturl != 0} { + set qstr "[urllog_get_short $urlID] " + } else { + set qstr "" + } + append qstr "([lindex $qitems 2]@[urllog_ctime [lindex $qitems 1]])" + if {[string length $urlTitle] > 0} { + set qstr "$urlmsg_alreadyknown - '$urlTitle' $qstr" + } else { + set qstr "$urlmsg_alreadyknown $qstr" + } + urllog_verb_msg $urlNick $urlChan $qstr + return 0 + } } - if {[lindex $qitems 0] == $urlStr} { - urllog_log "URL said by $urlNick ($urlStr) already known" - if {$urllog_shorturl != 0} { - set qstr "[urllog_get_short $urlID] " - } else { - set qstr "" - } - append qstr "([lindex $qitems 2]@[urllog_ctime [lindex $qitems 1]])" - if {[string length $urlTitle] > 0} { - set qstr "$urlmsg_alreadyknown - '$urlTitle' $qstr" - } else { - set qstr "$urlmsg_alreadyknown $qstr" - } - urllog_verb_msg $urlNick $urlChan $qstr - return 0 - } -} -close $fd + close $fd -### OK, the URL was not already known - thus we add it -incr urlID -set urlTime [unixtime] -set fd [open $urllog_file a+] -puts $fd "$urlStr $urlTime $urlNick ($urlHost) $urlID" -close $fd -urllog_log "Added URL ($urlNick@$urlChan): $urlStr" + ### OK, the URL was not already known - thus we add it + incr urlID + set urlTime [unixtime] + set fd [open $urllog_file a+] + puts $fd "$urlStr $urlTime $urlNick ($urlHost) $urlID" + close $fd + urllog_log "Added URL ($urlNick@$urlChan): $urlStr" -### Let's say something, to confirm that everything went well. -if {$urllog_shorturl != 0} { - set qstr "[urllog_get_short $urlID] " -} else { - set qstr "" -} -if {[string length $urlTitle] > 0} { - urllog_verb_msg $urlNick $urlChan "'$urlTitle' ([urllog_chop_url $urlStr]) $qstr" -} else { - urllog_verb_msg $urlNick $urlChan "[urllog_chop_url $urlStr] $qstr" -} + ### Let's say something, to confirm that everything went well. + if {$urllog_shorturl != 0} { + set qstr "[urllog_get_short $urlID] " + } else { + set qstr "" + } + if {[string length $urlTitle] > 0} { + urllog_verb_msg $urlNick $urlChan "'$urlTitle' ([urllog_chop_url $urlStr]) $qstr" + } else { + urllog_verb_msg $urlNick $urlChan "[urllog_chop_url $urlStr] $qstr" + } -return 1 + return 1 } #------------------------------------------------------------------------- -proc urllog_http_handler {utoken utotal ucurrent} { +proc urllog_http_handler {utoken utotal ucurr} { upvar #0 $utoken state - # Stop after around 3000 received bytes, typically we would assume - # that section is contained in this amount of data. - if {$state(currentsize) >= 3000} { + # Stop fetching data after 3000 bytes, this should be enough to + # contain the head section of a HTML page. + if {$ucurr > 3000} { set state(status) "ok" } } #------------------------------------------------------------------------- proc urllog_checkurl {urlStr urlNick urlHost urlChan} { -global botnick urllog_html urllog_tlds urllog_check urllog_file -global urlmsg_nosuchhost urlmsg_ioerror urlmsg_timeout urlmsg_errorgettingdoc -global urllog_httprep urllog_shorturl_prefix urllog_shorturl urllog_encoding + global botnick urllog_html urllog_tlds urllog_check urllog_file + global urlmsg_nosuchhost urlmsg_ioerror urlmsg_timeout urlmsg_errorgettingdoc + global urllog_httprep urllog_shorturl_prefix urllog_shorturl urllog_encoding + + ### Print status to bot's log + urllog_log "$urlStr ($urlNick@$urlChan)" -### Some status -urllog_log "$urlStr ($urlNick@$urlChan)" - -### Try to determine the URL protocol component (if it is missing) -set u_checktld 1 - -if {[string match "*www.*" $urlStr] && ![string match "http://*" $urlStr] && ![string match "https://*" $urlStr]} { - set urlStr "http://$urlStr" + ### Try to determine the URL protocol component (if it is missing) + set u_checktld 1 + if {[string match "*www.*" $urlStr] && ![string match "http://*" $urlStr] && ![string match "https://*" $urlStr]} { + set urlStr "http://$urlStr" + } elseif {[string match "*ftp.*" $urlStr] && ![string match "ftp://*" $urlStr]} { + set urlStr "ftp://$urlStr" } -if {[string match "*ftp.*" $urlStr] && ![string match "ftp://*" $urlStr]} { - set urlStr "ftp://$urlStr" + if {[regexp "(ftp|http|https)://(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})" $urlStr u_match u_prefix ni1 ni2 ni3 ni4]} { + # Check if the IP is on local network + if {($ni1 == 127) || ($ni1 == 10) || ($ni1 == 192 && $ni2 == 168) || ($ni1 == 0)} { + urllog_log "URL pointing to local or invalid network, ignored ($urlStr)." + return 0 + } + + # Skip TLD check for URLs with IP address + set u_checktld 0 } -if {[regexp "(ftp|http|https)://(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})" $urlStr u_match u_prefix ni1 ni2 ni3 ni4]} { - # Check if the IP is on local network - if {($ni1 == 127) || ($ni1 == 10) || ($ni1 == 192 && $ni2 == 168) || ($ni1 == 0)} { - urllog_log "URL pointing to local or invalid network, ignored ($urlStr)." + if {$urllog_shorturl != 0 && [string match "*$urllog_shorturl_prefix*" $urlStr]} { + urllog_log "Ignoring ShortURL." return 0 - } - - # Skip TLD check for URLs with IP address - set u_checktld 0 } -if {$urllog_shorturl != 0 && [string match "*$urllog_shorturl_prefix*" $urlStr]} { - urllog_log "Ignoring ShortURL." - return 0 -} + ### Check the PORT (if the ":" is there) + set u_record [split $urlStr "/"] + set u_hostname [lindex $u_record 2] + set u_port [lindex [split $u_hostname ":"] end] -### Check the PORT (if the ":" is there) -set u_record [split $urlStr "/"] -set u_hostname [lindex $u_record 2] -set u_port [lindex [split $u_hostname ":"] end] - -if {![urllog_isnumber $u_port] && $u_port != "" && $u_port != $u_hostname} { - urllog_log "Broken URL from $urlNick: ($urlStr) illegal port $u_port" - return 0 + if {![urllog_isnumber $u_port] && $u_port != "" && $u_port != $u_hostname} { + urllog_log "Broken URL from $urlNick: ($urlStr) illegal port $u_port" + return 0 } -# Default to port 80 (HTTP) -if {![urllog_isnumber $u_port]} { - set u_port 80 + # Default to port 80 (HTTP) + if {![urllog_isnumber $u_port]} { + set u_port 80 + } + + ### Is it a http or ftp url? (FIX ME!) + if {[string range $urlStr 0 3] != "http" && [string range $urlStr 0 2] != "ftp"} { + urllog_log "Broken URL from $urlNick: ($urlStr) UNSUPPORTED TYPE (not HTTP or FTP)" + return 0 } -### Is it a http or ftp url? (FIX ME!) -if {[string range $urlStr 0 3] != "http" && [string range $urlStr 0 2] != "ftp"} { - urllog_log "Broken URL from $urlNick: ($urlStr) UNSUPPORTED TYPE (not HTTP or FTP)" - return 0 - } + ### Check the Top Level Domain (TLD) validity + if {$u_checktld != 0} { + set u_sane [lindex [split $u_hostname "."] end] + set u_tld [lindex [split $u_sane ":"] 0] + set u_found 0 -### Check the Top Level Domain (TLD) validity -if {$u_checktld != 0} { - set u_sane [lindex [split $u_hostname "."] end] - set u_tld [lindex [split $u_sane ":"] 0] - set u_found 0 - - if {[string length $u_tld] == 2} { - # Assume all 2-letter domains to be valid :) - set u_found 1 - } else { - # Check our list of known TLDs - foreach itld $urllog_tlds { - if {[string match $itld $u_tld]} { - set u_found 1 + if {[string length $u_tld] == 2} { + # Assume all 2-letter domains to be valid :) + set u_found 1 + } else { + # Check our list of known TLDs + foreach itld $urllog_tlds { + if {[string match $itld $u_tld]} { + set u_found 1 + } } } - } - if {$u_found == 0} { - urllog_log "Broken URL from $urlNick: ($urlStr) illegal TLD: $u_tld." - return 0 + if {$u_found == 0} { + urllog_log "Broken URL from $urlNick: ($urlStr) illegal TLD: $u_tld." + return 0 } } -set urlStr [string map $urllog_httprep $urlStr] + set urlStr [string map $urllog_httprep $urlStr] + -### Do we perform additional optional checks? -if {$urllog_check != 0 && [string range $urlStr 0 6] == "http://"} { - # Does the document pointed by the URL exist? - if {[catch {set utoken [::http::geturl $urlStr -progress urllog_http_handler -blocksize 1024 -timeout 4000]} uerrmsg]} { + ### Do we perform additional optional checks? + if {$urllog_check == 0 || [string range $urlStr 0 6] != "http://"} { + # No optional checks, just add the URL + urllog_addurl $urlStr $urlNick $urlHost $urlChan "" + return 1 + } + + ### Does the document pointed by the URL exist? + if {[catch {set utoken [::http::geturl $urlStr -progress urllog_http_handler -blocksize 1024 -timeout 3000]} uerrmsg]} { urllog_verb_msg $urlNick $urlChan "$urlmsg_ioerror ($uerrmsg)" urllog_log "HTTP request failed: $uerrmsg" return 0 - } + } - upvar #0 $utoken ustate - - if {$ustate(status) == "timeout"} { + if {[::http::status $utoken] == "timeout"} { urllog_verb_msg $urlNick $urlChan "$urlmsg_timeout" urllog_log "HTTP request timed out ($urlStr)" - } + return 0 + } - if {$ustate(status) == "error"} { + if {[::http::status $utoken] != "ok"} { urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ([::http::error $utoken])" urllog_log "Error in HTTP transaction: [::http::error $utoken] ($urlStr)" + return 0 + } + + # Fixme! Handle redirects! + set ucode [::http::ncode $utoken] + if {$ucode >= 200 && $ucode <= 309} { + set udata [::http::data $utoken] + set umatches [regexp -nocase -inline -- "" $udata] + set uconvert 0 + if {[llength $umatches] > 0} { + set uencoding [lindex $umatches 1] + if {[string length $uencoding] > 3} { + set uconvert 1 + } } - # FIXME! Handle redirects! - if {$ustate(status) == "ok"} { - if {[::http::ncode $utoken] >= 200 && [::http::ncode $utoken] <= 309} { - set udata $ustate(body) - set umatches [regexp -nocase -inline -- "" $udata] - set uconvert 0 - if {[llength $umatches] > 0} { - set uencoding [lindex $umatches 1] - if {[string length $uencoding] > 3} { - set uconvert 1 - } + set umatches [regexp -nocase -inline -- "(.\*\?)" $udata] + if {[llength $umatches] > 0} { + set urlTitle [lindex $umatches 1] + if {$uconvert != 0} { + set urlTitle [encoding convertfrom $uencoding $urlTitle] } - - set umatches [regexp -nocase -inline -- "(.\*\?)" $udata] + set urlTitle [urllog_convert_ent $urlTitle] + regsub -all "(^ *| *$)" $urlTitle "" urlTitle + } else { + set urlTitle "" + } + + # Rasiatube hack + if {[string match "*/rasiatube/view*" $urlStr]} { + set umatches [regexp -nocase -inline -- " 0} { - set urlTitle [lindex $umatches 1] - if {$uconvert != 0} { - set urlTitle [encoding convertfrom $uencoding $urlTitle] - } - set urlTitle [urllog_convert_ent $urlTitle] - regsub -all "(^ *| *$)" $urlTitle "" urlTitle - } else { - set urlTitle "" + set urlStr [lindex $umatches 1] + regsub -all "\/v\/" $urlStr "\/watch\?v=" urlStr + } - urllog_addurl $urlStr $urlNick $urlHost $urlChan $urlTitle - } else { - urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ([::http::code $utoken])" - urllog_log "[::http::code $utoken] - $urlStr" + urllog_log "RasiaTube mangler: $urlStr" + urllog_verb_msg $urlNick $urlChan "Korjataan haiseva rasiatube-linkki: $urlStr" } + + urllog_addurl $urlStr $urlNick $urlHost $urlChan $urlTitle + return 1 + } else { + urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ([::http::code $utoken])" + urllog_log "[::http::code $utoken] - $urlStr" } ::http::cleanup $utoken - } else { - # No optional checks, just add the URL - urllog_addurl $urlStr $urlNick $urlHost $urlChan "" - } } #------------------------------------------------------------------------- proc urllog_checkmsg {nick uhost hand chan text} { -### Check the nick -if {$nick == "*"} { - urllog_log "urllog_checkmsg: nick was wc, this should not happen." - return 0 + ### Check the nick + if {$nick == "*"} { + urllog_log "urllog_checkmsg: nick was wc, this should not happen." + return 0 } -### Do the URL checking -foreach istr [split $text " "] { - if {[string match "*http://*" $istr] || [string match "*ftp://*" $istr] || - [string match "*www.*" $istr] || [string match "*ftp.*" $istr] || - [regexp "(ftp|http|https)://\[0-9\]\{1,3\}\\.\[0-9\]\{1,3\}\\.\[0-9\]\{1,3\}\\.\[0-9\]\{1,3\}" $istr imatch]} { - urllog_checkurl $istr $nick $uhost $chan + ### Do the URL checking + foreach istr [split $text " "] { + if {[string match "*http://*" $istr] || [string match "*ftp://*" $istr] || + [string match "*www.*" $istr] || [string match "*ftp.*" $istr] || + [regexp "(ftp|http|https)://\[0-9\]\{1,3\}\\.\[0-9\]\{1,3\}\\.\[0-9\]\{1,3\}\\.\[0-9\]\{1,3\}" $istr imatch]} { + urllog_checkurl $istr $nick $uhost $chan } } -return 0 + return 0 } #------------------------------------------------------------------------- ### Find from database according to patterns, up to imax results. +proc urllog_urlfind {ipatterns imax} { + global urllog_file -proc urllog_urlfind {ipatterns imax} { -global urllog_file + ### Search the database for pattern + ### Clear the count, open the URL logfile + set iresults {} + set nresults 0 + set fd [open $urllog_file r] -### Search the database for pattern -### Clear the count, open the URL logfile -set iresults {} -set nresults 0 -set fd [open $urllog_file r] + ### Start searching... + while {![eof $fd]} { -### Start searching... -while {![eof $fd]} { + # Get one URL for inspection + gets $fd foo + set irecord [split [string tolower $foo] " "] + set iurl [lindex $irecord 0] + set iname [lindex $irecord 2] - # Get one URL for inspection - gets $fd foo - set irecord [split [string tolower $foo] " "] - set iurl [lindex $irecord 0] - set iname [lindex $irecord 2] + # Match with all given patterns and rules + set imatched 1 + foreach ipattern $ipatterns { + set foob [split [string tolower $ipattern] " "] + set ftoken [lindex $foob 0] + set fparam [lindex $foob 1] + set fmatch [string match $fparam $iurl] + if {$ftoken == "+" && $fmatch == 0} { set imatched 0 } + if {$ftoken == "-" && $fmatch == 1} { set imatched 0 } + if {$ftoken == "%" && [string match $fparam $iname] == 0} { set imatched 0 } + } - # Match with all given patterns and rules - set imatched 1 - foreach ipattern $ipatterns { - set foob [split [string tolower $ipattern] " "] - set ftoken [lindex $foob 0] - set fparam [lindex $foob 1] - set fmatch [string match $fparam $iurl] - - if {$ftoken == "+" && $fmatch == 0} { set imatched 0 } - - if {$ftoken == "-" && $fmatch == 1} { set imatched 0 } - - if {$ftoken == "%" && [string match $fparam $iname] == 0} { set imatched 0 } + # If the patterns matched, add to the results list + if {$imatched == 1 && $foo != ""} { + incr nresults + lappend iresults $foo + } } - # If the all patterns matched, add to the list... - if {$imatched == 1 && $foo != ""} { - incr nresults - lappend iresults $foo - } -} + # Close file + close $fd -# Close file -close $fd - -# Take only last imax results -return [lrange $iresults [expr $nresults-$imax] $nresults] + # Take only last imax results + return [lrange $iresults [expr $nresults - $imax] $nresults] } #------------------------------------------------------------------------- ### Parse arguments, find and show the results proc urllog_find {unick uhand uchan utext upublic} { -global botnick urllog_name urllog_version urllog_shorturl -global urllog_showmax_pub urllog_showmax_priv urlmsg_nomatch + global botnick urllog_name urllog_version urllog_shorturl + global urllog_showmax_pub urllog_showmax_priv urlmsg_nomatch -### Parse the given command + ### Parse the given command urllog_log "$unick/$uhand searched URL: $utext" set footokens [split $utext " "] @@ -648,7 +573,7 @@ } } -### Get the matches from database + ### Get the matches from database if {$upublic == 0} { set iresults [urllog_urlfind $ipatlist $urllog_showmax_priv] @@ -656,7 +581,7 @@ set iresults [urllog_urlfind $ipatlist $urllog_showmax_pub] } -### Show the results + ### Show the results if {$iresults != ""} { set j 0 foreach i $iresults { @@ -677,35 +602,29 @@ urllog_msg $upublic $unick $uchan $urlmsg_nomatch } -return 0 + return 0 } #------------------------------------------------------------------------- ### Finding binded functions proc urllog_pub_urlfind {unick uhost uhand uchan utext} { - -urllog_find $unick $uhand $uchan $utext 1 - -return 0 + urllog_find $unick $uhand $uchan $utext 1 + return 0 } proc urllog_msg_urlfind {unick uhost uhand utext} { - -urllog_find $unick $uhand "" $utext 0 + urllog_find $unick $uhand "" $utext 0 + return 0 +} -return 0 -} #------------------------------------------------------------------------- proc urllog_checkmsg2 {unick uhost uhand utext} { - -urllog_checkurl $utext $unick $uhost "#CHANNEL" - -return 0 + urllog_checkurl $utext $unick $uhost "#CHANNEL" + return 0 } - # end of script