Mercurial > hg > egg-tcls
annotate urllog.tcl @ 329:50d47bdd4425
urllog: Bump version.
author | Matti Hamalainen <ccr@tnsp.org> |
---|---|
date | Wed, 03 Jun 2015 18:52:34 +0300 |
parents | a5282cdc56e6 |
children | 9dd4d2e3a4ac |
rev | line source |
---|---|
0 | 1 ########################################################################## |
2 # | |
329 | 3 # URLLog v2.4.2 by Matti 'ccr' Hamalainen <ccr@tnsp.org> |
250 | 4 # (C) Copyright 2000-2015 Tecnic Software productions (TNSP) |
0 | 5 # |
113
077c7383f36f
urllog: Add line about the script's license.
Matti Hamalainen <ccr@tnsp.org>
parents:
112
diff
changeset
|
6 # This script is freely distributable under GNU GPL (version 2) license. |
077c7383f36f
urllog: Add line about the script's license.
Matti Hamalainen <ccr@tnsp.org>
parents:
112
diff
changeset
|
7 # |
0 | 8 ########################################################################## |
9 # | |
50
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
10 # URL-logger script for EggDrop IRC robot, utilizing SQLite3 database |
81
17e542b7985a
urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
73
diff
changeset
|
11 # This script requires SQLite TCL extension. Under Debian, you need: |
17e542b7985a
urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
73
diff
changeset
|
12 # tcl8.5 libsqlite3-tcl (and eggdrop eggdrop-data, of course) |
50
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
13 # |
81
17e542b7985a
urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
73
diff
changeset
|
14 # NOTICE! If you are upgrading to URLLog v2.0+ from any 1.x version, you |
50
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
15 # may want to run a conversion script against your URL-database file, |
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
16 # if you wish to preserve the old data. |
0 | 17 # |
50
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
18 # See convert_urllog_db.tcl for more information. |
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
19 # |
81
17e542b7985a
urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
73
diff
changeset
|
20 # If you are doing a fresh install, you will need to create the |
50
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
21 # initial SQLite3 database with the required table schemas. You |
f69363fc1f61
Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents:
49
diff
changeset
|
22 # can do that by running: create_urllog_db.tcl |
0 | 23 # |
24 ########################################################################## | |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
25 |
263
f01d60175c44
urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents:
260
diff
changeset
|
26 ### The configuration should be in config.urllog in same directory |
f01d60175c44
urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents:
260
diff
changeset
|
27 ### as this script. Or change the line below to point where ever |
f01d60175c44
urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents:
260
diff
changeset
|
28 ### you wish. See "config.urllog.example" for an example config file. |
f01d60175c44
urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents:
260
diff
changeset
|
29 source [file dirname [info script]]/config.urllog |
0 | 30 |
291
54d34d086b47
urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
289
diff
changeset
|
31 ### Required utillib.tcl |
54d34d086b47
urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
289
diff
changeset
|
32 source [file dirname [info script]]/utillib.tcl |
54d34d086b47
urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
289
diff
changeset
|
33 |
0 | 34 |
35 ########################################################################## | |
36 # No need to look below this line | |
37 ########################################################################## | |
38 set urllog_name "URLLog" | |
329 | 39 set urllog_version "2.4.2" |
0 | 40 |
300
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
41 set urllog_tld_list [split $urllog_tld_list ","] |
0 | 42 set urllog_httprep [split "\@|%40|{|%7B|}|%7D|\[|%5B|\]|%5D" "|"] |
43 | |
102
5425dc418505
urllog: Entity data is now in UTF-8, but TCL source files are interpreted with current system locale, which may not be UTF-8. We must therefore "convert" the entity mapping string to UTF-8 to be certain of TCL's interpretation of its encoding.
Matti Hamalainen <ccr@tnsp.org>
parents:
101
diff
changeset
|
44 |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
45 ### Require packages |
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
46 package require sqlite3 |
0 | 47 package require http |
7
50b52294e93e
urllog: Strip ‏ entities from titles; Some work on SSL/https support.
Matti Hamalainen <ccr@tnsp.org>
parents:
4
diff
changeset
|
48 |
0 | 49 ### Binding initializations |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
50 bind pub - !urlfind urllog_pub_urlfind |
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
51 bind msg - !urlfind urllog_msg_urlfind |
249 | 52 bind pubm - *.* urllog_check_line |
53 bind topc - *.* urllog_check_line | |
0 | 54 |
55 | |
56 ### Initialization messages | |
250 | 57 set urllog_message "$urllog_name v$urllog_version (C) 2000-2015 ccr/TNSP" |
0 | 58 putlog "$urllog_message" |
59 | |
289
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
60 ### Miscellaneous init messages |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
61 if {$urllog_extra_checks != 0} { |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
62 putlog " (Additional URL validity checks enabled)" |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
63 } |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
64 |
300
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
65 if {$urllog_check_tld != 0} { |
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
66 putlog " (Check TLD)" |
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
67 } |
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
68 |
289
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
69 if {$urllog_verbose != 0} { |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
70 putlog " (Verbose mode enabled)" |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
71 } |
5067843cee3d
urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents:
269
diff
changeset
|
72 |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
73 ### HTTP module initialization |
269
d62280f2a9c7
urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
267
diff
changeset
|
74 if {[info exists http_user_agent] && $http_user_agent != ""} { |
d62280f2a9c7
urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
267
diff
changeset
|
75 ::http::config -useragent $http_user_agent |
d62280f2a9c7
urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
267
diff
changeset
|
76 } else { |
d62280f2a9c7
urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
267
diff
changeset
|
77 ::http::config -useragent "$urllog_name/$urllog_version" |
d62280f2a9c7
urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
267
diff
changeset
|
78 } |
d62280f2a9c7
urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
267
diff
changeset
|
79 |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
80 if {[info exists http_use_proxy] && $http_use_proxy != 0} { |
28 | 81 ::http::config -proxyhost $http_proxy_host -proxyport $http_proxy_port |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
82 putlog " (Using proxy $http_proxy_host:$http_proxy_port)" |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
83 } |
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
84 |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
85 if {[info exists http_tls_support] && $http_tls_support != 0} { |
28 | 86 package require tls |
235
059660980388
urllog: Enable TLS, fixes annoying issues where https fails.
Matti Hamalainen <ccr@tnsp.org>
parents:
230
diff
changeset
|
87 ::http::register https 443 [list ::tls::socket -request 1 -require 1 -tls1 1 -cadir $http_tls_cadir] |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
88 putlog " (TLS/SSL support enabled)" |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
89 } |
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
90 |
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
91 ### SQLite database initialization |
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
92 if {[catch {sqlite3 urldb $urllog_db_file} uerrmsg]} { |
28 | 93 putlog " Could not open SQLite3 database '$urllog_db_file': $uerrmsg" |
94 exit 2 | |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
95 } |
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
96 |
0 | 97 |
98 #------------------------------------------------------------------------- | |
99 ### Utility functions | |
100 proc urllog_log {arg} { | |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
101 global urllog_log_enable urllog_name |
0 | 102 |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
103 if {$urllog_log_enable != 0} { |
28 | 104 putlog "$urllog_name: $arg" |
105 } | |
0 | 106 } |
107 | |
108 | |
152 | 109 proc urllog_ctime {utime} { |
28 | 110 if {$utime == "" || $utime == "*"} { |
111 set utime 0 | |
112 } | |
113 return [clock format $utime -format "%d.%m.%Y %H:%M"] | |
0 | 114 } |
115 | |
116 | |
117 proc urllog_isnumber {uarg} { | |
28 | 118 foreach i [split $uarg {}] { |
65
31c8c4f50aa6
urllog: Improve urllog_isnumber function.
Matti Hamalainen <ccr@tnsp.org>
parents:
62
diff
changeset
|
119 if {![string match \[0-9\] $i]} { return 0 } |
28 | 120 } |
65
31c8c4f50aa6
urllog: Improve urllog_isnumber function.
Matti Hamalainen <ccr@tnsp.org>
parents:
62
diff
changeset
|
121 return 1 |
0 | 122 } |
123 | |
124 | |
125 proc urllog_msg {apublic anick achan amsg} { | |
28 | 126 global urllog_preferredmsg |
0 | 127 |
28 | 128 if {$apublic == 1} { |
129 putserv "$urllog_preferredmsg $achan :$amsg" | |
130 } else { | |
131 putserv "$urllog_preferredmsg $anick :$amsg" | |
132 } | |
0 | 133 } |
134 | |
135 | |
136 proc urllog_verb_msg {anick achan amsg} { | |
28 | 137 global urllog_verbose |
0 | 138 |
28 | 139 if {$urllog_verbose != 0} { |
140 urllog_msg 1 $anick $achan $amsg | |
141 } | |
0 | 142 } |
143 | |
144 | |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
145 proc urllog_sanitize_encoding {uencoding} { |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
146 regsub -- "^\[a-z\]\[a-z\]_\[A-Z\]\[A-Z\]\." $uencoding "" uencoding |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
147 set uencoding [string tolower $uencoding] |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
148 regsub -- "^iso-" $uencoding "iso" uencoding |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
149 return $uencoding |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
150 } |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
151 |
294 | 152 |
0 | 153 #------------------------------------------------------------------------- |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
154 set urllog_shorturl_str "ABCDEFGHIJKLNMOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" |
13
e06d41fb69d5
Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents:
8
diff
changeset
|
155 |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
156 proc urllog_get_short {utime} { |
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
157 global urllog_shorturl_prefix urllog_shorturl_str |
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
158 |
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
159 set ulen [string length $urllog_shorturl_str] |
0 | 160 |
28 | 161 set u1 [expr $utime / ($ulen * $ulen)] |
162 set utmp [expr $utime % ($ulen * $ulen)] | |
163 set u2 [expr $utmp / $ulen] | |
164 set u3 [expr $utmp % $ulen] | |
0 | 165 |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
166 return "\[ $urllog_shorturl_prefix[string index $urllog_shorturl_str $u1][string index $urllog_shorturl_str $u2][string index $urllog_shorturl_str $u3] \]" |
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
167 } |
0 | 168 |
169 | |
170 #------------------------------------------------------------------------- | |
171 proc urllog_chop_url {url} { | |
28 | 172 global urllog_shorturl_orig |
68 | 173 |
28 | 174 if {[string length $url] > $urllog_shorturl_orig} { |
175 return "[string range $url 0 $urllog_shorturl_orig]..." | |
176 } else { | |
177 return $url | |
178 } | |
0 | 179 } |
180 | |
241 | 181 |
0 | 182 #------------------------------------------------------------------------- |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
183 proc urllog_exists {urlStr urlNick urlHost urlChan} { |
28 | 184 global urldb urlmsg_alreadyknown urllog_shorturl |
315
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
185 global urllog_msg_channels |
0 | 186 |
295
141bb4a2b76f
utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents:
294
diff
changeset
|
187 set usql "SELECT id AS uid, utime AS utime, url AS uurl, user AS uuser, host AS uhost, chan AS uchan, title AS utitle FROM urls WHERE url='[utl_escape $urlStr]'" |
297 | 188 urldb eval $usql { |
28 | 189 urllog_log "URL said by $urlNick ($urlStr) already known" |
190 if {$urllog_shorturl != 0} { | |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
191 set qstr "[urllog_get_short $uid] " |
28 | 192 } else { |
193 set qstr "" | |
194 } | |
195 append qstr "($uuser/$uchan@[urllog_ctime $utime])" | |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
196 if {[string length $utitle] > 0} { |
311 | 197 set qstr "$urlmsg_alreadyknown - '$utitle' $qstr" |
28 | 198 } else { |
199 set qstr "$urlmsg_alreadyknown $qstr" | |
200 } | |
315
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
201 |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
202 if {[utl_match_delim_list $urllog_msg_channels $uchan]} { |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
203 urllog_verb_msg $urlNick $urlChan $qstr |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
204 } |
28 | 205 return 0 |
206 } | |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
207 return 1 |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
208 } |
0 | 209 |
18
1e2232135354
More changes for SQLite support.
Matti Hamalainen <ccr@tnsp.org>
parents:
13
diff
changeset
|
210 |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
211 #------------------------------------------------------------------------- |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
212 proc urllog_addurl {urlStr urlNick urlHost urlChan urlTitle} { |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
213 global urldb urllog_shorturl |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
214 |
93
4e02c0219afe
urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents:
92
diff
changeset
|
215 if {$urlTitle == ""} { |
4e02c0219afe
urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents:
92
diff
changeset
|
216 set uins "NULL" |
4e02c0219afe
urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents:
92
diff
changeset
|
217 } else { |
295
141bb4a2b76f
utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents:
294
diff
changeset
|
218 set uins "'[utl_escape $urlTitle]'" |
93
4e02c0219afe
urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents:
92
diff
changeset
|
219 } |
295
141bb4a2b76f
utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents:
294
diff
changeset
|
220 set usql "INSERT INTO urls (utime,url,user,host,chan,title) VALUES ([unixtime], '[utl_escape $urlStr]', '[utl_escape $urlNick]', '[utl_escape $urlHost]', '[utl_escape $urlChan]', $uins)" |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
221 if {[catch {urldb eval $usql} uerrmsg]} { |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
222 urllog_log "$uerrmsg on SQL:\n$usql" |
28 | 223 return 0 |
224 } | |
82
1bbc79f41a1c
urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents:
81
diff
changeset
|
225 set uid [urldb last_insert_rowid] |
28 | 226 urllog_log "Added URL ($urlNick@$urlChan): $urlStr" |
0 | 227 |
228 | |
28 | 229 ### Let's say something, to confirm that everything went well. |
230 if {$urllog_shorturl != 0} { | |
82
1bbc79f41a1c
urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents:
81
diff
changeset
|
231 set qstr "[urllog_get_short $uid] " |
28 | 232 } else { |
233 set qstr "" | |
234 } | |
235 if {[string length $urlTitle] > 0} { | |
311 | 236 urllog_verb_msg $urlNick $urlChan "'$urlTitle' ([urllog_chop_url $urlStr]) $qstr" |
28 | 237 } else { |
238 urllog_verb_msg $urlNick $urlChan "[urllog_chop_url $urlStr] $qstr" | |
239 } | |
0 | 240 |
28 | 241 return 1 |
0 | 242 } |
243 | |
244 | |
245 #------------------------------------------------------------------------- | |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
246 proc urllog_dorequest { urlNick urlChan urlStr urlStatus urlSCode urlCode urlData urlMeta } { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
247 global urlmsg_ioerror urlmsg_timeout urlmsg_errorgettingdoc |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
248 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
249 upvar 1 $urlStatus ustatus |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
250 upvar 1 $urlSCode uscode |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
251 upvar 1 $urlCode ucode |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
252 upvar 1 $urlData udata |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
253 upvar 1 $urlMeta umeta |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
254 |
312
2371cd2b3f67
urllog: Force binary fetch.
Matti Hamalainen <ccr@tnsp.org>
parents:
311
diff
changeset
|
255 if {[catch {set utoken [::http::geturl $urlStr -timeout 6000 -binary 1 -headers {Accept-Encoding identity}]} uerrmsg]} { |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
256 urllog_verb_msg $urlNick $urlChan "$urlmsg_ioerror ($uerrmsg)" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
257 urllog_log "HTTP request failed: $uerrmsg" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
258 return 0 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
259 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
260 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
261 set ustatus [::http::status $utoken] |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
262 if {$ustatus == "timeout"} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
263 urllog_verb_msg $urlNick $urlChan "$urlmsg_timeout" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
264 urllog_log "HTTP request timed out ($urlStr)" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
265 return 0 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
266 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
267 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
268 if {$ustatus != "ok"} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
269 urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ([::http::error $utoken])" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
270 urllog_log "Error in HTTP transaction: [::http::error $utoken] ($urlStr)" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
271 return 0 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
272 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
273 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
274 set ustatus [::http::status $utoken] |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
275 set uscode [::http::code $utoken] |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
276 set ucode [::http::ncode $utoken] |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
277 set udata [::http::data $utoken] |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
278 array set umeta [::http::meta $utoken] |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
279 ::http::cleanup $utoken |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
280 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
281 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
282 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
283 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
284 #------------------------------------------------------------------------- |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
285 proc urllog_validate_url { urlNick urlChan urlMStr urlMProto urlMHostName } { |
302
f487cc166714
urllog: Add message for unknown TLDs.
Matti Hamalainen <ccr@tnsp.org>
parents:
300
diff
changeset
|
286 global urllog_tld_list urlmsg_nosuchhost urllog_httprep urlmsg_unknown_tld |
300
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
287 global urllog_shorturl_prefix urllog_shorturl urllog_check_tld |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
288 upvar 1 $urlMStr urlStr |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
289 upvar 1 $urlMProto urlProto |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
290 upvar 1 $urlMHostName urlHostName |
3
8003090caa35
Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents:
0
diff
changeset
|
291 |
96
e5a6c27be365
urllog: Comments and cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents:
95
diff
changeset
|
292 ### Try to guess the URL protocol component (if it is missing) |
28 | 293 set u_checktld 1 |
294 if {[string match "*www.*" $urlStr] && ![string match "http://*" $urlStr] && ![string match "https://*" $urlStr]} { | |
295 set urlStr "http://$urlStr" | |
296 } elseif {[string match "*ftp.*" $urlStr] && ![string match "ftp://*" $urlStr]} { | |
297 set urlStr "ftp://$urlStr" | |
298 } | |
0 | 299 |
95
687bdd74dfac
urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents:
93
diff
changeset
|
300 ### Handle URLs that have an IPv4-address |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
301 if {[regexp "(\[a-z\]+)://(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})" $urlStr urlMatch urlProto ni1 ni2 ni3 ni4]} { |
28 | 302 # Check if the IP is on local network |
92
f6f4595856ff
urllog: Cosmetics. Remove useless parenthesis.
Matti Hamalainen <ccr@tnsp.org>
parents:
91
diff
changeset
|
303 if {$ni1 == 127 || $ni1 == 10 || ($ni1 == 192 && $ni2 == 168) || $ni1 == 0} { |
28 | 304 urllog_log "URL pointing to local or invalid network, ignored ($urlStr)." |
305 return 0 | |
306 } | |
307 # Skip TLD check for URLs with IP address | |
308 set u_checktld 0 | |
309 } | |
0 | 310 |
96
e5a6c27be365
urllog: Comments and cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents:
95
diff
changeset
|
311 ### Check now if we have an ShortURL here ... |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
312 if {[string match "$urllog_shorturl_prefix*" $urlStr]} { |
98
fbbe7ee40e2f
urllog: Improve one informational / error message.
Matti Hamalainen <ccr@tnsp.org>
parents:
97
diff
changeset
|
313 urllog_log "Ignoring ShortURL from $urlNick: $urlStr" |
252
eb2fce89b8ab
urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents:
251
diff
changeset
|
314 # set uud "" |
eb2fce89b8ab
urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents:
251
diff
changeset
|
315 # set usql "SELECT id AS uid, url AS uurl, user AS uuser, host AS uhost, chan AS uchan, title AS utitle FROM urls WHERE utime=$uud" |
eb2fce89b8ab
urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents:
251
diff
changeset
|
316 # urldb eval $usql { |
eb2fce89b8ab
urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents:
251
diff
changeset
|
317 # urllog_verb_msg $urlNick $urlChan "'$utitle' - $uurl" |
eb2fce89b8ab
urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents:
251
diff
changeset
|
318 # return 1 |
eb2fce89b8ab
urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents:
251
diff
changeset
|
319 # } |
28 | 320 return 0 |
321 } | |
0 | 322 |
95
687bdd74dfac
urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents:
93
diff
changeset
|
323 ### Get URL protocol component |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
324 set urlProto "" |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
325 regexp "(\[a-z\]+)://" $urlStr urlMatch urlProto |
95
687bdd74dfac
urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents:
93
diff
changeset
|
326 |
28 | 327 ### Check the PORT (if the ":" is there) |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
328 set urlRecord [split $urlStr "/"] |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
329 set urlHostName [lindex $urlRecord 2] |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
330 set urlPort [lindex [split $urlHostName ":"] end] |
0 | 331 |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
332 if {![urllog_isnumber $urlPort] && $urlPort != "" && $urlPort != $urlHostName} { |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
333 urllog_log "Broken URL from $urlNick: ($urlStr) illegal port $urlPort" |
28 | 334 return 0 |
335 } | |
0 | 336 |
298 | 337 ### Is it a http or ftp url? |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
338 if {$urlProto != "http" && $urlProto != "https" && $urlProto != "ftp"} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
339 urllog_log "Broken URL from $urlNick: ($urlStr) UNSUPPORTED protocol class ($urlProto)." |
28 | 340 return 0 |
341 } | |
0 | 342 |
28 | 343 ### Check the Top Level Domain (TLD) validity |
300
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
344 if {$urllog_check_tld != 0 && $u_checktld != 0} { |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
345 set u_sane [lindex [split $urlHostName "."] end] |
28 | 346 set u_tld [lindex [split $u_sane ":"] 0] |
347 set u_found 0 | |
0 | 348 |
28 | 349 if {[string length $u_tld] == 2} { |
350 # Assume all 2-letter domains to be valid :) | |
351 set u_found 1 | |
352 } else { | |
353 # Check our list of known TLDs | |
300
2a9ee3f68225
urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
299
diff
changeset
|
354 foreach itld $urllog_tld_list { |
28 | 355 if {[string match $itld $u_tld]} { |
356 set u_found 1 | |
357 } | |
358 } | |
359 } | |
0 | 360 |
28 | 361 if {$u_found == 0} { |
302
f487cc166714
urllog: Add message for unknown TLDs.
Matti Hamalainen <ccr@tnsp.org>
parents:
300
diff
changeset
|
362 urllog_log "Broken URL from $urlNick: ($urlStr) unknown TLD: ${u_tld}." |
f487cc166714
urllog: Add message for unknown TLDs.
Matti Hamalainen <ccr@tnsp.org>
parents:
300
diff
changeset
|
363 urllog_verb_msg $urlNick $urlChan $urlmsg_unknown_tld |
28 | 364 return 0 |
365 } | |
366 } | |
0 | 367 |
28 | 368 set urlStr [string map $urllog_httprep $urlStr] |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
369 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
370 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
371 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
372 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
373 #------------------------------------------------------------------------- |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
374 proc urllog_check_url {urlStr urlNick urlHost urlChan} { |
299
1ff281e821a3
urllog: Make rasiatube hack configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
298
diff
changeset
|
375 global urllog_encoding http_tls_support urlmsg_errorgettingdoc |
304
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
376 global urllog_extra_checks urllog_extra_strict urllog_rasiatube_hack |
3
8003090caa35
Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents:
0
diff
changeset
|
377 |
91
6f4bfd8e9447
urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents:
90
diff
changeset
|
378 ### Does the URL already exist? |
6f4bfd8e9447
urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents:
90
diff
changeset
|
379 if {![urllog_exists $urlStr $urlNick $urlHost $urlChan]} { |
6f4bfd8e9447
urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents:
90
diff
changeset
|
380 return 1 |
6f4bfd8e9447
urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents:
90
diff
changeset
|
381 } |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
382 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
383 ### Validate URL compoments, etc. |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
384 set urlProto "" |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
385 set urlHostName "" |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
386 if {![urllog_validate_url $urlNick $urlChan urlStr urlProto urlHostName]} { |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
387 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
388 } |
0 | 389 |
267
da239a953e24
urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents:
264
diff
changeset
|
390 ### Do we perform additional checks? |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
391 if {$urllog_extra_checks == 0 || !(($http_tls_support != 0 && $urlProto == "https") || $urlProto == "http")} { |
230 | 392 # No optional checks, or it's not http/https. |
306 | 393 if {$urllog_extra_strict == 0} { |
304
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
394 # Strict checking disabled, so add the URL, if it does not exist already. |
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
395 urllog_addurl $urlStr $urlNick $urlHost $urlChan "" |
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
396 return 1 |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
397 } elseif {$http_tls_support == 0 && $urlProto == "https"} { |
304
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
398 # Strict ENABLED: If TLS support is disabled and we have https, do nothing |
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
399 return 1 |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
400 } elseif {$urlProto != "http" && $urlProto != "https"} { |
304
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
401 # Strict ENABLED: It's not http, or https |
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
402 return 1 |
f1589fe20732
urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents:
302
diff
changeset
|
403 } |
28 | 404 } |
7
50b52294e93e
urllog: Strip ‏ entities from titles; Some work on SSL/https support.
Matti Hamalainen <ccr@tnsp.org>
parents:
4
diff
changeset
|
405 |
28 | 406 ### Does the document pointed by the URL exist? |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
407 if {![urllog_dorequest $urlNick $urlChan $urlStr ustatus uscode ucode udata umeta]} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
408 return 1 |
28 | 409 } |
0 | 410 |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
411 ### Handle redirects |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
412 if {$ucode >= 301 && $ucode <= 302} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
413 set nurlStr $umeta(Location) |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
414 if {![regexp "\[a-z\]+://" $nurlStr]} { |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
415 if {[string range $nurlStr 0 0] != "/"} { |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
416 append nurlStr "/" |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
417 } |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
418 set nurlStr "${urlProto}://${urlHostName}${nurlStr}" |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
419 } |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
420 urllog_log "Redirection: $urlStr -> $nurlStr" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
421 set urlStr $nurlStr |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
422 |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
423 if {![urllog_validate_url $urlNick $urlChan urlStr urlProto urlHostName]} { |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
424 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
425 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
426 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
427 if {![urllog_dorequest $urlNick $urlChan $urlStr ustatus uscode ucode udata umeta]} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
428 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
429 } |
28 | 430 } |
3
8003090caa35
Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents:
0
diff
changeset
|
431 |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
432 ### Handle 2nd level redirects |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
433 if {$ucode >= 301 && $ucode <= 302} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
434 set nurlStr $umeta(Location) |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
435 if {![regexp "\[a-z\]+://" $nurlStr]} { |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
436 if {[string range $nurlStr 0 0] != "/"} { |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
437 append nurlStr "/" |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
438 } |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
439 set nurlStr "${urlProto}://${urlHostName}${nurlStr}" |
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
440 } |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
441 urllog_log "Redirection #2: $urlStr -> $nurlStr" |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
442 set urlStr $nurlStr |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
443 |
327
a5282cdc56e6
urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents:
319
diff
changeset
|
444 if {![urllog_validate_url $urlNick $urlChan urlStr urlProto urlHostName]} { |
251
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
445 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
446 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
447 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
448 if {![urllog_dorequest $urlNick $urlChan $urlStr ustatus uscode ucode udata umeta]} { |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
449 return 1 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
450 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
451 } |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
452 |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
453 # Final document |
e59f0c3ea0f4
urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents:
250
diff
changeset
|
454 if {$ucode >= 200 && $ucode <= 205} { |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
455 set uenc_doc "" |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
456 set uenc_http "" |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
457 set uencoding "" |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
458 |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
459 # Get information about specified character encodings |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
460 if {[info exists umeta(Content-Type)] && [regexp -nocase {charset\s*=\s*([a-z0-9._-]+)} $umeta(Content-Type) umatches uenc_http]} { |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
461 # Found character set encoding information in HTTP headers |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
462 } |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
463 |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
464 if {[regexp -nocase -- "<meta.\*\?content=\"text/html.\*\?charset=(\[^\"\]*)\".\*\?/\?>" $udata umatches uenc_doc]} { |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
465 # Found old style HTML meta tag with character set information |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
466 } elseif {[regexp -nocase -- "<meta.\*\?charset=\"(\[^\"\]*)\".\*\?/\?>" $udata umatches uenc_doc]} { |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
467 # Found HTML5 style meta tag with character set information |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
468 } |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
469 |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
470 # Make sanitized versions of the encoding strings |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
471 set uenc_http2 [urllog_sanitize_encoding $uenc_http] |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
472 set uenc_doc2 [urllog_sanitize_encoding $uenc_doc] |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
473 |
311 | 474 # Check if the document has specified encoding |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
475 # KLUDGE! |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
476 set uencoding $uenc_http2 |
318
5d886e2137d5
urllog: Fix character set conversion a bit.
Matti Hamalainen <ccr@tnsp.org>
parents:
315
diff
changeset
|
477 if {$uencoding == "" && $uenc_doc2 != ""} { |
5d886e2137d5
urllog: Fix character set conversion a bit.
Matti Hamalainen <ccr@tnsp.org>
parents:
315
diff
changeset
|
478 set uencoding $uenc_doc2 |
5d886e2137d5
urllog: Fix character set conversion a bit.
Matti Hamalainen <ccr@tnsp.org>
parents:
315
diff
changeset
|
479 } elseif {$uencoding == ""} { |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
480 # If _NO_ known encoding of any kind, assume the default of iso8859-1 |
86
4c2b6482c08c
urllog: Different strategy for charset encoding conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
84
diff
changeset
|
481 set uencoding "iso8859-1" |
4c2b6482c08c
urllog: Different strategy for charset encoding conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
84
diff
changeset
|
482 } |
0 | 483 |
311 | 484 urllog_log "Charsets: http='$uenc_http', doc='$uenc_doc' / sanitized http='$uenc_http2', doc='$uenc_doc2' -> '$uencoding'" |
485 | |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
486 # Get the document title, if any |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
487 set urlTitle "" |
313
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
488 set tmpRes [regexp -nocase -- "<title.\*\?>(.\*\?)</title>" $udata umatches urlTitle] |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
489 |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
490 # If facebook, get meta info |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
491 if {[regexp -nocase -- "(http|https):\/\/www.facebook.com" $urlStr]} { |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
492 if {[regexp -nocase -- "<meta name=\"description\" content=\"(.\*\?)\"" $udata umatches urlTmp]} { |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
493 if {$urlTitle != ""} { append urlTitle " :: " } |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
494 append urlTitle $urlTmp |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
495 } |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
496 } |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
497 |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
498 # If character set conversion is required, do it now |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
499 if {$urlTitle != "" && $uencoding != ""} { |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
500 if {[catch {set urlTitle [encoding convertfrom $uencoding $urlTitle]} cerrmsg]} { |
8175ef52889b
urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
312
diff
changeset
|
501 urllog_log "Error in charset conversion: $cerrmsg" |
28 | 502 } |
150
52350ed97775
urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents:
136
diff
changeset
|
503 |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
504 # Convert some HTML entities to plaintext and do some cleanup |
291
54d34d086b47
urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
289
diff
changeset
|
505 set utmp [utl_convert_html_ent $urlTitle] |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
506 regsub -all "\r|\n|\t" $utmp " " utmp |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
507 regsub -all " *" $utmp " " utmp |
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
508 set urlTitle [string trim $utmp] |
28 | 509 } |
3
8003090caa35
Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents:
0
diff
changeset
|
510 |
28 | 511 # Rasiatube hack |
299
1ff281e821a3
urllog: Make rasiatube hack configurable.
Matti Hamalainen <ccr@tnsp.org>
parents:
298
diff
changeset
|
512 if {$urllog_rasiatube_hack != 0 && [string match "*/rasiatube/view*" $urlStr]} { |
28 | 513 set rasia 0 |
118
e5f2961a6145
urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents:
117
diff
changeset
|
514 if {[regexp -nocase -- "<link rel=\"video_src\"\.\*\?file=(http://\[^&\]+)&" $udata umatches utmp]} { |
e5f2961a6145
urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents:
117
diff
changeset
|
515 regsub -all "\/v\/" $utmp "\/watch\?v=" urlStr |
28 | 516 set rasia 1 |
517 } else { | |
118
e5f2961a6145
urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents:
117
diff
changeset
|
518 if {[regexp -nocase -- "SWFObject.\"(\[^\"\]+)\", *\"flashvideo" $udata umatches utmp]} { |
e5f2961a6145
urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents:
117
diff
changeset
|
519 regsub "http:\/\/www.dailymotion.com\/swf\/" $utmp "http:\/\/www.dailymotion.com\/video\/" urlStr |
28 | 520 set rasia 1 |
521 } | |
522 } | |
523 if {$rasia != 0} { | |
524 urllog_log "RasiaTube mangler: $urlStr" | |
525 urllog_verb_msg $urlNick $urlChan "Korjataan haiseva rasiatube-linkki: $urlStr" | |
526 } | |
527 } | |
3
8003090caa35
Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents:
0
diff
changeset
|
528 |
83
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
529 # Check if the URL already exists, just in case we had some redirects |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
530 if {[urllog_exists $urlStr $urlNick $urlHost $urlChan]} { |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
531 urllog_addurl $urlStr $urlNick $urlHost $urlChan $urlTitle |
f171a9fb7b7b
urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents:
82
diff
changeset
|
532 } |
28 | 533 return 1 |
534 } else { | |
116
4f3edcf72987
urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents:
115
diff
changeset
|
535 urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ($ucode)" |
224
aaf433ab696a
urllog: Improve error messages a bit.
Matti Hamalainen <ccr@tnsp.org>
parents:
223
diff
changeset
|
536 urllog_log "Error fetching document: status=$ustatus, code=$ucode, scode=$uscode, url=$urlStr" |
28 | 537 } |
0 | 538 } |
539 | |
540 | |
541 #------------------------------------------------------------------------- | |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
542 |
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
543 |
249 | 544 proc urllog_check_line {unick uhost uhand uchan utext} { |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
545 global urllog_log_channels |
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
546 |
28 | 547 ### Check the nick |
87 | 548 if {$unick == "*"} { |
249 | 549 urllog_log "urllog_check_line: Nick was wc, this should not happen." |
28 | 550 return 0 |
551 } | |
0 | 552 |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
553 ### Check the channel |
315
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
554 if {[utl_match_delim_list $urllog_log_channels $uchan]} { |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
555 ### Do the URL checking |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
556 foreach str [split $utext " "] { |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
557 if {[regexp "((ftp|http|https)://\[^\[:space:\]\]+|^(www|ftp)\.\[^\[:space:\]\]+)" $str ulink]} { |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
558 urllog_check_url $str $unick $uhost $uchan |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
559 } |
28 | 560 } |
561 } | |
0 | 562 |
28 | 563 return 0 |
0 | 564 } |
565 | |
566 | |
567 #------------------------------------------------------------------------- | |
568 ### Parse arguments, find and show the results | |
569 proc urllog_find {unick uhand uchan utext upublic} { | |
62
6428b1bcb34b
urllog: Remove some global variable references where they are not used.
Matti Hamalainen <ccr@tnsp.org>
parents:
50
diff
changeset
|
570 global urllog_shorturl urldb |
28 | 571 global urllog_showmax_pub urllog_showmax_priv urlmsg_nomatch |
0 | 572 |
28 | 573 if {$upublic == 0} { |
574 set ulimit 5 | |
575 } else { | |
576 set ulimit 3 | |
577 } | |
19
9cf22053e5da
Repair !urlfind functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
18
diff
changeset
|
578 |
28 | 579 ### Parse the given command |
580 urllog_log "$unick/$uhand searched URL: $utext" | |
0 | 581 |
28 | 582 set ftokens [split $utext " "] |
583 set fpatlist "" | |
584 foreach ftoken $ftokens { | |
585 set fprefix [string range $ftoken 0 0] | |
586 set fpattern [string range $ftoken 1 end] | |
295
141bb4a2b76f
utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents:
294
diff
changeset
|
587 set qpattern "'%[utl_escape $fpattern]%'" |
0 | 588 |
28 | 589 if {$fprefix == "-"} { |
128
0d21b9d1d2b9
urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
127
diff
changeset
|
590 lappend fpatlist "(url NOT LIKE $qpattern OR title NOT LIKE $qpattern)" |
28 | 591 } elseif {$fprefix == "%"} { |
128
0d21b9d1d2b9
urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
127
diff
changeset
|
592 lappend fpatlist "user LIKE $qpattern" |
28 | 593 } elseif {$fprefix == "@"} { |
594 # foo | |
112
fae3dd7a8b20
urllog: Oops, a typo in variable name. Fixed.
Matti Hamalainen <ccr@tnsp.org>
parents:
111
diff
changeset
|
595 } elseif {$fprefix == "+"} { |
128
0d21b9d1d2b9
urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
127
diff
changeset
|
596 lappend fpatlist "(url LIKE $qpattern OR title LIKE $qpattern)" |
28 | 597 } else { |
295
141bb4a2b76f
utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents:
294
diff
changeset
|
598 set qpattern "'%[utl_escape $ftoken]%'" |
128
0d21b9d1d2b9
urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
127
diff
changeset
|
599 lappend fpatlist "(url LIKE $qpattern OR title LIKE $qpattern)" |
28 | 600 } |
601 } | |
19
9cf22053e5da
Repair !urlfind functionality.
Matti Hamalainen <ccr@tnsp.org>
parents:
18
diff
changeset
|
602 |
27
6e381916b016
Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents:
20
diff
changeset
|
603 if {[llength $fpatlist] > 0} { |
6e381916b016
Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents:
20
diff
changeset
|
604 set fquery "WHERE [join $fpatlist " AND "]" |
6e381916b016
Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents:
20
diff
changeset
|
605 } else { |
6e381916b016
Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents:
20
diff
changeset
|
606 set fquery "" |
6e381916b016
Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents:
20
diff
changeset
|
607 } |
68 | 608 |
28 | 609 set iresults 0 |
82
1bbc79f41a1c
urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents:
81
diff
changeset
|
610 set usql "SELECT id AS uid, utime AS utime, url AS uurl, user AS uuser, host AS uhost FROM urls $fquery ORDER BY utime DESC LIMIT $ulimit" |
68 | 611 urldb eval $usql { |
28 | 612 incr iresults |
613 set shortURL $uurl | |
82
1bbc79f41a1c
urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents:
81
diff
changeset
|
614 if {$urllog_shorturl != 0 && $uid != ""} { |
1bbc79f41a1c
urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents:
81
diff
changeset
|
615 set shortURL "$shortURL [urllog_get_short $uid]" |
28 | 616 } |
617 urllog_msg $upublic $unick $uchan "#$iresults: $shortURL ($uuser@[urllog_ctime $utime])" | |
618 } | |
619 | |
620 if {$iresults == 0} { | |
621 # If no URLs were found | |
622 urllog_msg $upublic $unick $uchan $urlmsg_nomatch | |
623 } | |
0 | 624 |
28 | 625 return 0 |
0 | 626 } |
627 | |
628 | |
629 #------------------------------------------------------------------------- | |
630 ### Finding binded functions | |
631 proc urllog_pub_urlfind {unick uhost uhand uchan utext} { | |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
632 global urllog_search_channels |
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
633 |
315
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
634 if {[utl_match_delim_list $urllog_search_channels $uchan]} { |
7a987b22a817
urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents:
313
diff
changeset
|
635 return [urllog_find $unick $uhand $uchan $utext 1] |
219
4e09bcc48851
urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents:
218
diff
changeset
|
636 } |
28 | 637 return 0 |
0 | 638 } |
639 | |
640 | |
641 proc urllog_msg_urlfind {unick uhost uhand utext} { | |
28 | 642 urllog_find $unick $uhand "" $utext 0 |
643 return 0 | |
3
8003090caa35
Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents:
0
diff
changeset
|
644 } |
0 | 645 |
646 # end of script |