annotate urllog.tcl @ 464:506977ea9d0c

urllog: Improve URL validation.
author Matti Hamalainen <ccr@tnsp.org>
date Wed, 07 Feb 2018 19:01:36 +0200
parents cfbe6acc1d73
children cbad3fa706fe
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
2 #
425
e5810c52d376 Bump some copyright years and versions.
Matti Hamalainen <ccr@tnsp.org>
parents: 424
diff changeset
3 # URLLog v2.4.3 by Matti 'ccr' Hamalainen <ccr@tnsp.org>
e5810c52d376 Bump some copyright years and versions.
Matti Hamalainen <ccr@tnsp.org>
parents: 424
diff changeset
4 # (C) Copyright 2000-2017 Tecnic Software productions (TNSP)
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
5 #
113
077c7383f36f urllog: Add line about the script's license.
Matti Hamalainen <ccr@tnsp.org>
parents: 112
diff changeset
6 # This script is freely distributable under GNU GPL (version 2) license.
077c7383f36f urllog: Add line about the script's license.
Matti Hamalainen <ccr@tnsp.org>
parents: 112
diff changeset
7 #
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
8 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
9 #
458
cfbe6acc1d73 urllog: tcl-tls 1.7.x (tested with 1.7.13) is now a requirement. It is
Matti Hamalainen <ccr@tnsp.org>
parents: 457
diff changeset
10 # NOTICE! NOTICE! This script REQUIRES tcl-tls 1.7.13+ if you wish to
cfbe6acc1d73 urllog: tcl-tls 1.7.x (tested with 1.7.13) is now a requirement. It is
Matti Hamalainen <ccr@tnsp.org>
parents: 457
diff changeset
11 # support SSL/TLS https for URL checking. And you probably do.
cfbe6acc1d73 urllog: tcl-tls 1.7.x (tested with 1.7.13) is now a requirement. It is
Matti Hamalainen <ccr@tnsp.org>
parents: 457
diff changeset
12 #
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
13 # URL-logger script for EggDrop IRC robot, utilizing SQLite3 database
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
14 # This script requires SQLite TCL extension. Under Debian, you need:
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
15 # tcl8.5 libsqlite3-tcl (and eggdrop eggdrop-data, of course)
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
16 #
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
17 # NOTICE! If you are upgrading to URLLog v2.0+ from any 1.x version, you
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
18 # may want to run a conversion script against your URL-database file,
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
19 # if you wish to preserve the old data.
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
20 #
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
21 # See convert_urllog_db.tcl for more information.
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
22 #
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
23 # If you are doing a fresh install, you will need to create the
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
24 # initial SQLite3 database with the required table schemas. You
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
25 # can do that by running: create_urllog_db.tcl
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
26 #
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
27 ##########################################################################
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
28
263
f01d60175c44 urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents: 260
diff changeset
29 ### The configuration should be in config.urllog in same directory
f01d60175c44 urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents: 260
diff changeset
30 ### as this script. Or change the line below to point where ever
f01d60175c44 urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents: 260
diff changeset
31 ### you wish. See "config.urllog.example" for an example config file.
f01d60175c44 urllog: Move configuration to external file.
Matti Hamalainen <ccr@tnsp.org>
parents: 260
diff changeset
32 source [file dirname [info script]]/config.urllog
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
33
291
54d34d086b47 urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 289
diff changeset
34 ### Required utillib.tcl
54d34d086b47 urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 289
diff changeset
35 source [file dirname [info script]]/utillib.tcl
54d34d086b47 urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 289
diff changeset
36
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
37
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
38 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
39 # No need to look below this line
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
40 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
41 set urllog_name "URLLog"
425
e5810c52d376 Bump some copyright years and versions.
Matti Hamalainen <ccr@tnsp.org>
parents: 424
diff changeset
42 set urllog_version "2.4.3"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
43
300
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
44 set urllog_tld_list [split $urllog_tld_list ","]
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
45 set urllog_httprep [split "\@|%40|{|%7B|}|%7D|\[|%5B|\]|%5D" "|"]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
46
102
5425dc418505 urllog: Entity data is now in UTF-8, but TCL source files are interpreted with current system locale, which may not be UTF-8. We must therefore "convert" the entity mapping string to UTF-8 to be certain of TCL's interpretation of its encoding.
Matti Hamalainen <ccr@tnsp.org>
parents: 101
diff changeset
47
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
48 ### Require packages
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
49 package require sqlite3
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
50 package require http
7
50b52294e93e urllog: Strip &rlm; entities from titles; Some work on SSL/https support.
Matti Hamalainen <ccr@tnsp.org>
parents: 4
diff changeset
51
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
52 ### Binding initializations
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
53 bind pub - !urlfind urllog_pub_urlfind
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
54 bind msg - !urlfind urllog_msg_urlfind
249
d98876dd9ee1 urllog: Rename a function.
Matti Hamalainen <ccr@tnsp.org>
parents: 241
diff changeset
55 bind pubm - *.* urllog_check_line
d98876dd9ee1 urllog: Rename a function.
Matti Hamalainen <ccr@tnsp.org>
parents: 241
diff changeset
56 bind topc - *.* urllog_check_line
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
57
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
58
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
59 ### Initialization messages
425
e5810c52d376 Bump some copyright years and versions.
Matti Hamalainen <ccr@tnsp.org>
parents: 424
diff changeset
60 set urllog_message "$urllog_name v$urllog_version (C) 2000-2017 ccr/TNSP"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
61 putlog "$urllog_message"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
62
289
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
63 ### Miscellaneous init messages
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
64 if {$urllog_extra_checks != 0} {
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
65 putlog " (Additional URL validity checks enabled)"
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
66 }
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
67
300
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
68 if {$urllog_check_tld != 0} {
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
69 putlog " (Check TLD)"
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
70 }
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
71
289
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
72 if {$urllog_verbose != 0} {
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
73 putlog " (Verbose mode enabled)"
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
74 }
5067843cee3d urllog: Move some initialization messages around.
Matti Hamalainen <ccr@tnsp.org>
parents: 269
diff changeset
75
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
76 ### HTTP module initialization
269
d62280f2a9c7 urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 267
diff changeset
77 if {[info exists http_user_agent] && $http_user_agent != ""} {
d62280f2a9c7 urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 267
diff changeset
78 ::http::config -useragent $http_user_agent
d62280f2a9c7 urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 267
diff changeset
79 } else {
d62280f2a9c7 urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 267
diff changeset
80 ::http::config -useragent "$urllog_name/$urllog_version"
d62280f2a9c7 urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 267
diff changeset
81 }
d62280f2a9c7 urllog: Make user agent string configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 267
diff changeset
82
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
83 if {[info exists http_use_proxy] && $http_use_proxy != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
84 ::http::config -proxyhost $http_proxy_host -proxyport $http_proxy_port
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
85 putlog " (Using proxy $http_proxy_host:$http_proxy_port)"
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
86 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
87
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
88 if {[info exists http_tls_support] && $http_tls_support != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
89 package require tls
458
cfbe6acc1d73 urllog: tcl-tls 1.7.x (tested with 1.7.13) is now a requirement. It is
Matti Hamalainen <ccr@tnsp.org>
parents: 457
diff changeset
90 ::http::register https 443 [list ::tls::socket -request true -require true -ssl2 false -ssl3 false -tls1 true -tls1.1 true -tls1.2 true -cadir $http_tls_cadir -autoservername true]
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
91 putlog " (TLS/SSL support enabled)"
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
92 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
93
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
94 ### SQLite database initialization
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
95 if {[catch {sqlite3 urldb $urllog_db_file} uerrmsg]} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
96 putlog " Could not open SQLite3 database '$urllog_db_file': $uerrmsg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
97 exit 2
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
98 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
99
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
100
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
101 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
102 ### Utility functions
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
103 proc urllog_log {arg} {
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
104 global urllog_log_enable urllog_name
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
105
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
106 if {$urllog_log_enable != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
107 putlog "$urllog_name: $arg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
108 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
109 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
110
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
111
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
112 proc urllog_isnumber {uarg} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
113 foreach i [split $uarg {}] {
65
31c8c4f50aa6 urllog: Improve urllog_isnumber function.
Matti Hamalainen <ccr@tnsp.org>
parents: 62
diff changeset
114 if {![string match \[0-9\] $i]} { return 0 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
115 }
65
31c8c4f50aa6 urllog: Improve urllog_isnumber function.
Matti Hamalainen <ccr@tnsp.org>
parents: 62
diff changeset
116 return 1
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
117 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
118
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
119
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
120 proc urllog_msg {apublic anick achan amsg} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
121 global urllog_preferredmsg
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
122
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
123 if {$apublic == 1} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
124 putserv "$urllog_preferredmsg $achan :$amsg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
125 } else {
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
126 putserv "$urllog_preferredmsg $anick :$amsg"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
127 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
128 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
129
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
130
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
131 proc urllog_verb_msg {anick achan amsg} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
132 global urllog_verbose
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
133
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
134 if {$urllog_verbose != 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
135 urllog_msg 1 $anick $achan $amsg
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
136 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
137 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
138
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
139
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
140 proc urllog_sanitize_encoding {uencoding} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
141 regsub -- "^\[a-z\]\[a-z\]_\[A-Z\]\[A-Z\]\." $uencoding "" uencoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
142 set uencoding [string tolower $uencoding]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
143 regsub -- "^iso-" $uencoding "iso" uencoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
144 return $uencoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
145 }
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
146
294
2bb9bcfb104a Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 291
diff changeset
147
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
148 #-------------------------------------------------------------------------
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
149 set urllog_shorturl_str "ABCDEFGHIJKLNMOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
150
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
151 proc urllog_get_short {utime} {
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
152 global urllog_shorturl_prefix urllog_shorturl_str
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
153
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
154 set ulen [string length $urllog_shorturl_str]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
155
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
156 set u1 [expr $utime / ($ulen * $ulen)]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
157 set utmp [expr $utime % ($ulen * $ulen)]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
158 set u2 [expr $utmp / $ulen]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
159 set u3 [expr $utmp % $ulen]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
160
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
161 return "\[ $urllog_shorturl_prefix[string index $urllog_shorturl_str $u1][string index $urllog_shorturl_str $u2][string index $urllog_shorturl_str $u3] \]"
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
162 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
163
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
164
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
165 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
166 proc urllog_chop_url {url} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
167 global urllog_shorturl_orig
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
168
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
169 if {[string length $url] > $urllog_shorturl_orig} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
170 return "[string range $url 0 $urllog_shorturl_orig]..."
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
171 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
172 return $url
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
173 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
174 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
175
241
669842725e2f Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 240
diff changeset
176
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
177 #-------------------------------------------------------------------------
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
178 proc urllog_exists {urlStr urlNick urlHost urlChan} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
179 global urldb urlmsg_alreadyknown urllog_shorturl
315
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
180 global urllog_msg_channels
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
181
295
141bb4a2b76f utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents: 294
diff changeset
182 set usql "SELECT id AS uid, utime AS utime, url AS uurl, user AS uuser, host AS uhost, chan AS uchan, title AS utitle FROM urls WHERE url='[utl_escape $urlStr]'"
297
ecd465aab52e urllog: 10L.
Matti Hamalainen <ccr@tnsp.org>
parents: 295
diff changeset
183 urldb eval $usql {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
184 urllog_log "URL said by $urlNick ($urlStr) already known"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
185 if {$urllog_shorturl != 0} {
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
186 set qstr "[urllog_get_short $uid] "
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
187 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
188 set qstr ""
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
189 }
422
880a07485275 Add utl_ctime() to utillib and use it elsewhere.
Matti Hamalainen <ccr@tnsp.org>
parents: 372
diff changeset
190 append qstr "($uuser/$uchan@[utl_ctime $utime])"
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
191 if {[string length $utitle] > 0} {
311
adc519c72f53 urllog: Various cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 306
diff changeset
192 set qstr "$urlmsg_alreadyknown - '$utitle' $qstr"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
193 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
194 set qstr "$urlmsg_alreadyknown $qstr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
195 }
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
196
315
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
197 if {[utl_match_delim_list $urllog_msg_channels $uchan]} {
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
198 urllog_verb_msg $urlNick $urlChan $qstr
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
199 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
200 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
201 }
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
202 return 1
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
203 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
204
18
1e2232135354 More changes for SQLite support.
Matti Hamalainen <ccr@tnsp.org>
parents: 13
diff changeset
205
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
206 #-------------------------------------------------------------------------
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
207 proc urllog_addurl {urlStr urlNick urlHost urlChan urlTitle} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
208 global urldb urllog_shorturl
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
209
93
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
210 if {$urlTitle == ""} {
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
211 set uins "NULL"
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
212 } else {
295
141bb4a2b76f utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents: 294
diff changeset
213 set uins "'[utl_escape $urlTitle]'"
93
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
214 }
295
141bb4a2b76f utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents: 294
diff changeset
215 set usql "INSERT INTO urls (utime,url,user,host,chan,title) VALUES ([unixtime], '[utl_escape $urlStr]', '[utl_escape $urlNick]', '[utl_escape $urlHost]', '[utl_escape $urlChan]', $uins)"
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
216 if {[catch {urldb eval $usql} uerrmsg]} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
217 urllog_log "$uerrmsg on SQL:\n$usql"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
218 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
219 }
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
220 set uid [urldb last_insert_rowid]
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
221 urllog_log "Added URL ($urlNick@$urlChan): $urlStr"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
222
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
223
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
224 ### Let's say something, to confirm that everything went well.
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
225 if {$urllog_shorturl != 0} {
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
226 set qstr "[urllog_get_short $uid] "
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
227 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
228 set qstr ""
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
229 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
230 if {[string length $urlTitle] > 0} {
311
adc519c72f53 urllog: Various cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 306
diff changeset
231 urllog_verb_msg $urlNick $urlChan "'$urlTitle' ([urllog_chop_url $urlStr]) $qstr"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
232 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
233 urllog_verb_msg $urlNick $urlChan "[urllog_chop_url $urlStr] $qstr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
234 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
235
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
236 return 1
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
237 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
238
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
239
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
240 #-------------------------------------------------------------------------
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
241 proc urllog_dorequest { urlNick urlChan urlStr urlStatus urlSCode urlCode urlData urlMeta } {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
242 global urlmsg_ioerror urlmsg_timeout urlmsg_errorgettingdoc
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
243
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
244 upvar 1 $urlStatus ustatus
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
245 upvar 1 $urlSCode uscode
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
246 upvar 1 $urlCode ucode
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
247 upvar 1 $urlData udata
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
248 upvar 1 $urlMeta umeta
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
249
456
102dc89488af urllog: Improve how http headers are formed.
Matti Hamalainen <ccr@tnsp.org>
parents: 425
diff changeset
250 set urlHeaders {}
102dc89488af urllog: Improve how http headers are formed.
Matti Hamalainen <ccr@tnsp.org>
parents: 425
diff changeset
251 lappend urlHeaders "Accept-Encoding" "identity"
457
a7029d65796b urllog: Do not use Connection: keep-alive for production.
Matti Hamalainen <ccr@tnsp.org>
parents: 456
diff changeset
252 # lappend urlHeaders "Connection" "keep-alive"
456
102dc89488af urllog: Improve how http headers are formed.
Matti Hamalainen <ccr@tnsp.org>
parents: 425
diff changeset
253
102dc89488af urllog: Improve how http headers are formed.
Matti Hamalainen <ccr@tnsp.org>
parents: 425
diff changeset
254 if {[catch {set utoken [::http::geturl $urlStr -timeout 6000 -binary 1 -headers $urlHeaders]} uerrmsg]} {
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
255 urllog_verb_msg $urlNick $urlChan "$urlmsg_ioerror ($uerrmsg)"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
256 urllog_log "HTTP request failed: $uerrmsg"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
257 return 0
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
258 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
259
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
260 set ustatus [::http::status $utoken]
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
261 if {$ustatus == "timeout"} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
262 urllog_verb_msg $urlNick $urlChan "$urlmsg_timeout"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
263 urllog_log "HTTP request timed out ($urlStr)"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
264 return 0
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
265 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
266
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
267 if {$ustatus != "ok"} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
268 urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ([::http::error $utoken])"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
269 urllog_log "Error in HTTP transaction: [::http::error $utoken] ($urlStr)"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
270 return 0
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
271 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
272
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
273 set ustatus [::http::status $utoken]
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
274 set uscode [::http::code $utoken]
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
275 set ucode [::http::ncode $utoken]
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
276 set udata [::http::data $utoken]
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
277 array set umeta [::http::meta $utoken]
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
278 ::http::cleanup $utoken
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
279
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
280 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
281 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
282
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
283 #-------------------------------------------------------------------------
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
284 proc urllog_validate_url { urlNick urlChan urlMStr urlMProto urlMHostName } {
302
f487cc166714 urllog: Add message for unknown TLDs.
Matti Hamalainen <ccr@tnsp.org>
parents: 300
diff changeset
285 global urllog_tld_list urlmsg_nosuchhost urllog_httprep urlmsg_unknown_tld
300
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
286 global urllog_shorturl_prefix urllog_shorturl urllog_check_tld
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
287 upvar 1 $urlMStr urlStr
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
288 upvar 1 $urlMProto urlProto
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
289 upvar 1 $urlMHostName urlHostName
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
290
96
e5a6c27be365 urllog: Comments and cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 95
diff changeset
291 ### Try to guess the URL protocol component (if it is missing)
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
292 set u_checktld 1
464
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
293 if {![string match "http://*" $urlStr] && ![string match "https://*" $urlStr] && ![string match "ftp://*" $urlStr] && ![string match "*://*" $urlStr]} {
372
9dd4d2e3a4ac urllog: Fix missing http(s)/ftp:// protocol prefix guessing.
Matti Hamalainen <ccr@tnsp.org>
parents: 329
diff changeset
294 if {[string match "*www.*" $urlStr]} {
9dd4d2e3a4ac urllog: Fix missing http(s)/ftp:// protocol prefix guessing.
Matti Hamalainen <ccr@tnsp.org>
parents: 329
diff changeset
295 set urlStr "http://$urlStr"
9dd4d2e3a4ac urllog: Fix missing http(s)/ftp:// protocol prefix guessing.
Matti Hamalainen <ccr@tnsp.org>
parents: 329
diff changeset
296 } elseif {[string match "*ftp.*" $urlStr]} {
9dd4d2e3a4ac urllog: Fix missing http(s)/ftp:// protocol prefix guessing.
Matti Hamalainen <ccr@tnsp.org>
parents: 329
diff changeset
297 set urlStr "ftp://$urlStr"
9dd4d2e3a4ac urllog: Fix missing http(s)/ftp:// protocol prefix guessing.
Matti Hamalainen <ccr@tnsp.org>
parents: 329
diff changeset
298 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
299 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
300
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
301 ### Handle URLs that have an IPv4-address
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
302 if {[regexp "(\[a-z\]+)://(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})" $urlStr urlMatch urlProto ni1 ni2 ni3 ni4]} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
303 # Check if the IP is on local network
92
f6f4595856ff urllog: Cosmetics. Remove useless parenthesis.
Matti Hamalainen <ccr@tnsp.org>
parents: 91
diff changeset
304 if {$ni1 == 127 || $ni1 == 10 || ($ni1 == 192 && $ni2 == 168) || $ni1 == 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
305 urllog_log "URL pointing to local or invalid network, ignored ($urlStr)."
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
306 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
307 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
308 # Skip TLD check for URLs with IP address
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
309 set u_checktld 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
310 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
311
96
e5a6c27be365 urllog: Comments and cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 95
diff changeset
312 ### Check now if we have an ShortURL here ...
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
313 if {[string match "$urllog_shorturl_prefix*" $urlStr]} {
98
fbbe7ee40e2f urllog: Improve one informational / error message.
Matti Hamalainen <ccr@tnsp.org>
parents: 97
diff changeset
314 urllog_log "Ignoring ShortURL from $urlNick: $urlStr"
252
eb2fce89b8ab urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents: 251
diff changeset
315 # set uud ""
eb2fce89b8ab urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents: 251
diff changeset
316 # set usql "SELECT id AS uid, url AS uurl, user AS uuser, host AS uhost, chan AS uchan, title AS utitle FROM urls WHERE utime=$uud"
eb2fce89b8ab urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents: 251
diff changeset
317 # urldb eval $usql {
eb2fce89b8ab urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents: 251
diff changeset
318 # urllog_verb_msg $urlNick $urlChan "'$utitle' - $uurl"
eb2fce89b8ab urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents: 251
diff changeset
319 # return 1
eb2fce89b8ab urllog: Comment out some currently unused code.
Matti Hamalainen <ccr@tnsp.org>
parents: 251
diff changeset
320 # }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
321 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
322 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
323
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
324 ### Get URL protocol component
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
325 set urlProto ""
464
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
326 if {[regexp "(\[a-z\]+)://" $urlStr urlMatch urlProto]} {
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
327 ### Is it a http or ftp url?
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
328 if {$urlProto != "http" && $urlProto != "https" && $urlProto != "ftp"} {
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
329 urllog_log "Broken URL from $urlNick: ($urlStr) UNSUPPORTED protocol class ($urlProto)."
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
330 return 0
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
331 }
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
332 } else {
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
333 urllog_log "Broken URL from $urlNick: ($urlStr), no protocol specifier."
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
334 return 0
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
335 }
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
336
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
337 ### Check the PORT (if the ":" is there)
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
338 set urlRecord [split $urlStr "/"]
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
339 set urlHostName [lindex $urlRecord 2]
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
340 set urlPort [lindex [split $urlHostName ":"] end]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
341
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
342 if {![urllog_isnumber $urlPort] && $urlPort != "" && $urlPort != $urlHostName} {
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
343 urllog_log "Broken URL from $urlNick: ($urlStr) illegal port $urlPort"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
344 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
345 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
346
298
67e43d6f5c41 urllog: Fix comment.
Matti Hamalainen <ccr@tnsp.org>
parents: 297
diff changeset
347 ### Is it a http or ftp url?
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
348 if {$urlProto != "http" && $urlProto != "https" && $urlProto != "ftp"} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
349 urllog_log "Broken URL from $urlNick: ($urlStr) UNSUPPORTED protocol class ($urlProto)."
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
350 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
351 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
352
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
353 ### Check the Top Level Domain (TLD) validity
300
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
354 if {$urllog_check_tld != 0 && $u_checktld != 0} {
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
355 set u_sane [lindex [split $urlHostName "."] end]
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
356 set u_tld [lindex [split $u_sane ":"] 0]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
357 set u_found 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
358
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
359 if {[string length $u_tld] == 2} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
360 # Assume all 2-letter domains to be valid :)
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
361 set u_found 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
362 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
363 # Check our list of known TLDs
300
2a9ee3f68225 urllog: Make TLD check configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 299
diff changeset
364 foreach itld $urllog_tld_list {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
365 if {[string match $itld $u_tld]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
366 set u_found 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
367 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
368 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
369 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
370
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
371 if {$u_found == 0} {
302
f487cc166714 urllog: Add message for unknown TLDs.
Matti Hamalainen <ccr@tnsp.org>
parents: 300
diff changeset
372 urllog_log "Broken URL from $urlNick: ($urlStr) unknown TLD: ${u_tld}."
f487cc166714 urllog: Add message for unknown TLDs.
Matti Hamalainen <ccr@tnsp.org>
parents: 300
diff changeset
373 urllog_verb_msg $urlNick $urlChan $urlmsg_unknown_tld
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
374 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
375 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
376 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
377
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
378 set urlStr [string map $urllog_httprep $urlStr]
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
379 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
380 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
381
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
382
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
383 #-------------------------------------------------------------------------
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
384 proc urllog_check_url {urlStr urlNick urlHost urlChan} {
299
1ff281e821a3 urllog: Make rasiatube hack configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 298
diff changeset
385 global urllog_encoding http_tls_support urlmsg_errorgettingdoc
304
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
386 global urllog_extra_checks urllog_extra_strict urllog_rasiatube_hack
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
387
91
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
388 ### Does the URL already exist?
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
389 if {![urllog_exists $urlStr $urlNick $urlHost $urlChan]} {
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
390 return 1
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
391 }
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
392
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
393 ### Validate URL compoments, etc.
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
394 set urlProto ""
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
395 set urlHostName ""
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
396 if {![urllog_validate_url $urlNick $urlChan urlStr urlProto urlHostName]} {
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
397 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
398 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
399
267
da239a953e24 urllog: Change some setting names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 264
diff changeset
400 ### Do we perform additional checks?
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
401 if {$urllog_extra_checks == 0 || !(($http_tls_support != 0 && $urlProto == "https") || $urlProto == "http")} {
230
3e3756b113a1 urllog: Cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 229
diff changeset
402 # No optional checks, or it's not http/https.
306
9858b93387a2 urllog: 100L.
Matti Hamalainen <ccr@tnsp.org>
parents: 304
diff changeset
403 if {$urllog_extra_strict == 0} {
304
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
404 # Strict checking disabled, so add the URL, if it does not exist already.
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
405 urllog_addurl $urlStr $urlNick $urlHost $urlChan ""
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
406 return 1
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
407 } elseif {$http_tls_support == 0 && $urlProto == "https"} {
304
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
408 # Strict ENABLED: If TLS support is disabled and we have https, do nothing
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
409 return 1
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
410 } elseif {$urlProto != "http" && $urlProto != "https"} {
304
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
411 # Strict ENABLED: It's not http, or https
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
412 return 1
f1589fe20732 urllog: Added urllog_extra_strict option.
Matti Hamalainen <ccr@tnsp.org>
parents: 302
diff changeset
413 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
414 }
7
50b52294e93e urllog: Strip &rlm; entities from titles; Some work on SSL/https support.
Matti Hamalainen <ccr@tnsp.org>
parents: 4
diff changeset
415
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
416 ### Does the document pointed by the URL exist?
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
417 if {![urllog_dorequest $urlNick $urlChan $urlStr ustatus uscode ucode udata umeta]} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
418 return 1
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
419 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
420
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
421 ### Handle redirects
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
422 if {$ucode >= 301 && $ucode <= 302} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
423 set nurlStr $umeta(Location)
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
424 if {![regexp "\[a-z\]+://" $nurlStr]} {
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
425 if {[string range $nurlStr 0 0] != "/"} {
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
426 append nurlStr "/"
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
427 }
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
428 set nurlStr "${urlProto}://${urlHostName}${nurlStr}"
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
429 }
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
430 urllog_log "Redirection: $urlStr -> $nurlStr"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
431 set urlStr $nurlStr
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
432
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
433 if {![urllog_validate_url $urlNick $urlChan urlStr urlProto urlHostName]} {
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
434 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
435 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
436
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
437 if {![urllog_dorequest $urlNick $urlChan $urlStr ustatus uscode ucode udata umeta]} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
438 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
439 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
440 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
441
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
442 ### Handle 2nd level redirects
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
443 if {$ucode >= 301 && $ucode <= 302} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
444 set nurlStr $umeta(Location)
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
445 if {![regexp "\[a-z\]+://" $nurlStr]} {
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
446 if {[string range $nurlStr 0 0] != "/"} {
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
447 append nurlStr "/"
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
448 }
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
449 set nurlStr "${urlProto}://${urlHostName}${nurlStr}"
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
450 }
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
451 urllog_log "Redirection #2: $urlStr -> $nurlStr"
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
452 set urlStr $nurlStr
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
453
327
a5282cdc56e6 urllog: Fix redirection handling for HTTP 1.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 319
diff changeset
454 if {![urllog_validate_url $urlNick $urlChan urlStr urlProto urlHostName]} {
251
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
455 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
456 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
457
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
458 if {![urllog_dorequest $urlNick $urlChan $urlStr ustatus uscode ucode udata umeta]} {
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
459 return 1
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
460 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
461 }
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
462
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
463 # Final document
e59f0c3ea0f4 urllog: Handle first and second level redirects.
Matti Hamalainen <ccr@tnsp.org>
parents: 250
diff changeset
464 if {$ucode >= 200 && $ucode <= 205} {
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
465 set uenc_doc ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
466 set uenc_http ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
467 set uencoding ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
468
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
469 # Get information about specified character encodings
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
470 if {[info exists umeta(Content-Type)] && [regexp -nocase {charset\s*=\s*([a-z0-9._-]+)} $umeta(Content-Type) umatches uenc_http]} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
471 # Found character set encoding information in HTTP headers
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
472 }
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
473
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
474 if {[regexp -nocase -- "<meta.\*\?content=\"text/html.\*\?charset=(\[^\"\]*)\".\*\?/\?>" $udata umatches uenc_doc]} {
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
475 # Found old style HTML meta tag with character set information
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
476 } elseif {[regexp -nocase -- "<meta.\*\?charset=\"(\[^\"\]*)\".\*\?/\?>" $udata umatches uenc_doc]} {
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
477 # Found HTML5 style meta tag with character set information
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
478 }
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
479
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
480 # Make sanitized versions of the encoding strings
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
481 set uenc_http2 [urllog_sanitize_encoding $uenc_http]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
482 set uenc_doc2 [urllog_sanitize_encoding $uenc_doc]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
483
311
adc519c72f53 urllog: Various cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 306
diff changeset
484 # Check if the document has specified encoding
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
485 # KLUDGE!
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
486 set uencoding $uenc_http2
318
5d886e2137d5 urllog: Fix character set conversion a bit.
Matti Hamalainen <ccr@tnsp.org>
parents: 315
diff changeset
487 if {$uencoding == "" && $uenc_doc2 != ""} {
5d886e2137d5 urllog: Fix character set conversion a bit.
Matti Hamalainen <ccr@tnsp.org>
parents: 315
diff changeset
488 set uencoding $uenc_doc2
5d886e2137d5 urllog: Fix character set conversion a bit.
Matti Hamalainen <ccr@tnsp.org>
parents: 315
diff changeset
489 } elseif {$uencoding == ""} {
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
490 # If _NO_ known encoding of any kind, assume the default of iso8859-1
86
4c2b6482c08c urllog: Different strategy for charset encoding conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 84
diff changeset
491 set uencoding "iso8859-1"
4c2b6482c08c urllog: Different strategy for charset encoding conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 84
diff changeset
492 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
493
311
adc519c72f53 urllog: Various cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 306
diff changeset
494 urllog_log "Charsets: http='$uenc_http', doc='$uenc_doc' / sanitized http='$uenc_http2', doc='$uenc_doc2' -> '$uencoding'"
adc519c72f53 urllog: Various cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 306
diff changeset
495
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
496 # Get the document title, if any
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
497 set urlTitle ""
313
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
498 set tmpRes [regexp -nocase -- "<title.\*\?>(.\*\?)</title>" $udata umatches urlTitle]
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
499
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
500 # If facebook, get meta info
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
501 if {[regexp -nocase -- "(http|https):\/\/www.facebook.com" $urlStr]} {
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
502 if {[regexp -nocase -- "<meta name=\"description\" content=\"(.\*\?)\"" $udata umatches urlTmp]} {
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
503 if {$urlTitle != ""} { append urlTitle " :: " }
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
504 append urlTitle $urlTmp
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
505 }
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
506 }
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
507
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
508 # If character set conversion is required, do it now
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
509 if {$urlTitle != "" && $uencoding != ""} {
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
510 if {[catch {set urlTitle [encoding convertfrom $uencoding $urlTitle]} cerrmsg]} {
8175ef52889b urllog: Improve URL title functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 312
diff changeset
511 urllog_log "Error in charset conversion: $cerrmsg"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
512 }
150
52350ed97775 urllog: Cleanups, rename/move some global variables.
Matti Hamalainen <ccr@tnsp.org>
parents: 136
diff changeset
513
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
514 # Convert some HTML entities to plaintext and do some cleanup
291
54d34d086b47 urllog: Use the utility lib for entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 289
diff changeset
515 set utmp [utl_convert_html_ent $urlTitle]
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
516 regsub -all "\r|\n|\t" $utmp " " utmp
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
517 regsub -all " *" $utmp " " utmp
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
518 set urlTitle [string trim $utmp]
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
519 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
520
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
521 # Rasiatube hack
299
1ff281e821a3 urllog: Make rasiatube hack configurable.
Matti Hamalainen <ccr@tnsp.org>
parents: 298
diff changeset
522 if {$urllog_rasiatube_hack != 0 && [string match "*/rasiatube/view*" $urlStr]} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
523 set rasia 0
118
e5f2961a6145 urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents: 117
diff changeset
524 if {[regexp -nocase -- "<link rel=\"video_src\"\.\*\?file=(http://\[^&\]+)&" $udata umatches utmp]} {
e5f2961a6145 urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents: 117
diff changeset
525 regsub -all "\/v\/" $utmp "\/watch\?v=" urlStr
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
526 set rasia 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
527 } else {
118
e5f2961a6145 urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents: 117
diff changeset
528 if {[regexp -nocase -- "SWFObject.\"(\[^\"\]+)\", *\"flashvideo" $udata umatches utmp]} {
e5f2961a6145 urllog: Improve rasiatube URL de-mangling.
Matti Hamalainen <ccr@tnsp.org>
parents: 117
diff changeset
529 regsub "http:\/\/www.dailymotion.com\/swf\/" $utmp "http:\/\/www.dailymotion.com\/video\/" urlStr
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
530 set rasia 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
531 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
532 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
533 if {$rasia != 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
534 urllog_log "RasiaTube mangler: $urlStr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
535 urllog_verb_msg $urlNick $urlChan "Korjataan haiseva rasiatube-linkki: $urlStr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
536 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
537 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
538
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
539 # Check if the URL already exists, just in case we had some redirects
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
540 if {[urllog_exists $urlStr $urlNick $urlHost $urlChan]} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
541 urllog_addurl $urlStr $urlNick $urlHost $urlChan $urlTitle
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
542 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
543 return 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
544 } else {
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
545 urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ($ucode)"
224
aaf433ab696a urllog: Improve error messages a bit.
Matti Hamalainen <ccr@tnsp.org>
parents: 223
diff changeset
546 urllog_log "Error fetching document: status=$ustatus, code=$ucode, scode=$uscode, url=$urlStr"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
547 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
548 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
549
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
550
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
551 #-------------------------------------------------------------------------
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
552
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
553
249
d98876dd9ee1 urllog: Rename a function.
Matti Hamalainen <ccr@tnsp.org>
parents: 241
diff changeset
554 proc urllog_check_line {unick uhost uhand uchan utext} {
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
555 global urllog_log_channels
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
556
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
557 ### Check the nick
87
97c56d1e9ce2 urllog: Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 86
diff changeset
558 if {$unick == "*"} {
249
d98876dd9ee1 urllog: Rename a function.
Matti Hamalainen <ccr@tnsp.org>
parents: 241
diff changeset
559 urllog_log "urllog_check_line: Nick was wc, this should not happen."
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
560 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
561 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
562
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
563 ### Check the channel
315
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
564 if {[utl_match_delim_list $urllog_log_channels $uchan]} {
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
565 ### Do the URL checking
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
566 foreach str [split $utext " "] {
464
506977ea9d0c urllog: Improve URL validation.
Matti Hamalainen <ccr@tnsp.org>
parents: 458
diff changeset
567 if {[regexp "(\[a-z]+://\[^\[:space:\]\]+|^(www|ftp)\.\[^\[:space:\]\]+)" $str ulink]} {
315
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
568 urllog_check_url $str $unick $uhost $uchan
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
569 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
570 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
571 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
572
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
573 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
574 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
575
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
576
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
577 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
578 ### Parse arguments, find and show the results
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
579 proc urllog_find {unick uhand uchan utext upublic} {
62
6428b1bcb34b urllog: Remove some global variable references where they are not used.
Matti Hamalainen <ccr@tnsp.org>
parents: 50
diff changeset
580 global urllog_shorturl urldb
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
581 global urllog_showmax_pub urllog_showmax_priv urlmsg_nomatch
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
582
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
583 if {$upublic == 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
584 set ulimit 5
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
585 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
586 set ulimit 3
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
587 }
19
9cf22053e5da Repair !urlfind functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 18
diff changeset
588
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
589 ### Parse the given command
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
590 urllog_log "$unick/$uhand searched URL: $utext"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
591
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
592 set ftokens [split $utext " "]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
593 set fpatlist ""
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
594 foreach ftoken $ftokens {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
595 set fprefix [string range $ftoken 0 0]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
596 set fpattern [string range $ftoken 1 end]
295
141bb4a2b76f utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents: 294
diff changeset
597 set qpattern "'%[utl_escape $fpattern]%'"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
598
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
599 if {$fprefix == "-"} {
128
0d21b9d1d2b9 urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 127
diff changeset
600 lappend fpatlist "(url NOT LIKE $qpattern OR title NOT LIKE $qpattern)"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
601 } elseif {$fprefix == "%"} {
128
0d21b9d1d2b9 urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 127
diff changeset
602 lappend fpatlist "user LIKE $qpattern"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
603 } elseif {$fprefix == "@"} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
604 # foo
112
fae3dd7a8b20 urllog: Oops, a typo in variable name. Fixed.
Matti Hamalainen <ccr@tnsp.org>
parents: 111
diff changeset
605 } elseif {$fprefix == "+"} {
128
0d21b9d1d2b9 urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 127
diff changeset
606 lappend fpatlist "(url LIKE $qpattern OR title LIKE $qpattern)"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
607 } else {
295
141bb4a2b76f utillib: utl_escape (which will be deprecated soon).
Matti Hamalainen <ccr@tnsp.org>
parents: 294
diff changeset
608 set qpattern "'%[utl_escape $ftoken]%'"
128
0d21b9d1d2b9 urllog: Improve search functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 127
diff changeset
609 lappend fpatlist "(url LIKE $qpattern OR title LIKE $qpattern)"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
610 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
611 }
19
9cf22053e5da Repair !urlfind functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 18
diff changeset
612
27
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
613 if {[llength $fpatlist] > 0} {
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
614 set fquery "WHERE [join $fpatlist " AND "]"
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
615 } else {
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
616 set fquery ""
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
617 }
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
618
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
619 set iresults 0
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
620 set usql "SELECT id AS uid, utime AS utime, url AS uurl, user AS uuser, host AS uhost FROM urls $fquery ORDER BY utime DESC LIMIT $ulimit"
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
621 urldb eval $usql {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
622 incr iresults
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
623 set shortURL $uurl
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
624 if {$urllog_shorturl != 0 && $uid != ""} {
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
625 set shortURL "$shortURL [urllog_get_short $uid]"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
626 }
422
880a07485275 Add utl_ctime() to utillib and use it elsewhere.
Matti Hamalainen <ccr@tnsp.org>
parents: 372
diff changeset
627 urllog_msg $upublic $unick $uchan "#$iresults: $shortURL ($uuser@[utl_ctime $utime])"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
628 }
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
629
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
630 if {$iresults == 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
631 # If no URLs were found
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
632 urllog_msg $upublic $unick $uchan $urlmsg_nomatch
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
633 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
634
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
635 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
636 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
637
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
638
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
639 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
640 ### Finding binded functions
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
641 proc urllog_pub_urlfind {unick uhost uhand uchan utext} {
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
642 global urllog_search_channels
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
643
315
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
644 if {[utl_match_delim_list $urllog_search_channels $uchan]} {
7a987b22a817 urllog: Add new configuration option urllog_msg_channels.
Matti Hamalainen <ccr@tnsp.org>
parents: 313
diff changeset
645 return [urllog_find $unick $uhand $uchan $utext 1]
219
4e09bcc48851 urllog: Add settings for specifying channels where URL logging is active, and where !urlfind functionality works (separately, if so desired.)
Matti Hamalainen <ccr@tnsp.org>
parents: 218
diff changeset
646 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
647 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
648 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
649
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
650
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
651 proc urllog_msg_urlfind {unick uhost uhand utext} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
652 urllog_find $unick $uhand "" $utext 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
653 return 0
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
654 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
655
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
656 # end of script