annotate urllog.tcl @ 117:d40a0f3af7ab

urllog: Add some entities for conversion.
author Matti Hamalainen <ccr@tnsp.org>
date Thu, 13 Oct 2011 20:13:31 +0300
parents 4f3edcf72987
children e5f2961a6145
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
2 #
114
593874678e45 Clarify authorship by doing sed "s/ccr\/TNSP/Matti 'ccr' Hamalainen/g".
Matti Hamalainen <ccr@tnsp.org>
parents: 113
diff changeset
3 # URLLog v2.2.1 by Matti 'ccr' Hamalainen <ccr@tnsp.org>
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
4 # (C) Copyright 2000-2011 Tecnic Software productions (TNSP)
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
5 #
113
077c7383f36f urllog: Add line about the script's license.
Matti Hamalainen <ccr@tnsp.org>
parents: 112
diff changeset
6 # This script is freely distributable under GNU GPL (version 2) license.
077c7383f36f urllog: Add line about the script's license.
Matti Hamalainen <ccr@tnsp.org>
parents: 112
diff changeset
7 #
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
8 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
9 #
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
10 # URL-logger script for EggDrop IRC robot, utilizing SQLite3 database
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
11 # This script requires SQLite TCL extension. Under Debian, you need:
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
12 # tcl8.5 libsqlite3-tcl (and eggdrop eggdrop-data, of course)
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
13 #
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
14 # NOTICE! If you are upgrading to URLLog v2.0+ from any 1.x version, you
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
15 # may want to run a conversion script against your URL-database file,
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
16 # if you wish to preserve the old data.
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
17 #
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
18 # See convert_urllog_db.tcl for more information.
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
19 #
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
20 # If you are doing a fresh install, you will need to create the
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
21 # initial SQLite3 database with the required table schemas. You
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
22 # can do that by running: create_urllog_db.tcl
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
23 #
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
24 ##########################################################################
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
25
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
26 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
27 ### HTTP options
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
28 ###
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
29 # Set to 1 if you want to enable use of HTTP proxy.
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
30 # If you do, you MUST set the proxy settings below too.
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
31 set http_proxy 0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
32
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
33 # Proxy host and port number (only used if enabled above)
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
34 set http_proxy_host ""
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
35 set http_proxy_port 8080
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
36
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
37 # Enable _experimental_ TLS/SSL support. This may not work at all.
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
38 # If unsure, leave this option disabled (0).
104
da337ca10e0a urllog: Enable SSL/TLS support by default.
Matti Hamalainen <ccr@tnsp.org>
parents: 103
diff changeset
39 set http_tls_support 1
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
40
89
77e05ce9e9b8 urllog: Add certdir option setting.
Matti Hamalainen <ccr@tnsp.org>
parents: 87
diff changeset
41 set http_tls_cadir "/usr/share/ca-certificates/mozilla"
77e05ce9e9b8 urllog: Add certdir option setting.
Matti Hamalainen <ccr@tnsp.org>
parents: 87
diff changeset
42
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
43
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
44 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
45 ### General options
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
46 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
47
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
48 # Filename of the SQLite URL database file
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
49 set urllog_db_file "urllog.sqlite"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
50
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
51
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
52 # 1 = Verbose: Say messages when URL is OK, bad, etc.
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
53 # 0 = Quiet : Be quiet (only speak if asked with !urlfind, etc)
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
54 set urllog_verbose 1
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
55
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
56
50
f69363fc1f61 Update some comments and add a bit of documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 49
diff changeset
57 # 1 = Enable logging of various script actions into bot's log
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
58 # 0 = Don't.
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
59 set urllog_logmsg 1
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
60
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
61
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
62 # 1 = Check URLs for validity and existence before adding.
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
63 # 0 = No checks. Add _anything_ that looks like an URL to the database.
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
64 set urllog_check 1
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
65
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
66
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
67 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
68 ### Search related settings
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
69 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
70
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
71 # 0 = No search-commands available
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
72 # 1 = Search enabled
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
73 set urllog_search 1
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
74
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
75
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
76 # Limit how many URLs should the "!urlfind" command show at most.
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
77 set urllog_showmax_pub 3
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
78
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
79 # Same as above, but for private message search.
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
80 set urllog_showmax_priv 6
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
81
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
82
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
83 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
84 ### ShortURL-settings
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
85 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
86
73
646b2fd67312 urllog: Improve documentation of different settings.
Matti Hamalainen <ccr@tnsp.org>
parents: 70
diff changeset
87 # 1 = Enable showing of ShortURLs
646b2fd67312 urllog: Improve documentation of different settings.
Matti Hamalainen <ccr@tnsp.org>
parents: 70
diff changeset
88 # 0 = ShortURLs not shown in any bot actions
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
89 set urllog_shorturl 1
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
90
73
646b2fd67312 urllog: Improve documentation of different settings.
Matti Hamalainen <ccr@tnsp.org>
parents: 70
diff changeset
91 # Max length of original URL to be shown, rest is chopped
646b2fd67312 urllog: Improve documentation of different settings.
Matti Hamalainen <ccr@tnsp.org>
parents: 70
diff changeset
92 # off if the URL is longer than the specified amount.
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
93 set urllog_shorturl_orig 30
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
94
73
646b2fd67312 urllog: Improve documentation of different settings.
Matti Hamalainen <ccr@tnsp.org>
parents: 70
diff changeset
95 # Web server URL that handles redirects of ShortURLs
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
96 set urllog_shorturl_prefix "http://tnsp.org/u/"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
97
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
98
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
99 ###
81
17e542b7985a urllog, quotedb: Improve documentation.
Matti Hamalainen <ccr@tnsp.org>
parents: 73
diff changeset
100 ### Message texts (informal, errors, etc.)
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
101 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
102
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
103 # No such host was found
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
104 set urlmsg_nosuchhost "ei tommosta oo!"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
105
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
106 # Could not connect host (I/O errors etc)
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
107 set urlmsg_ioerror "kraak, virhe yhdynnässä."
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
108
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
109 # HTTP timeout
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
110 set urlmsg_timeout "ei jaksa ootella"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
111
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
112 # No such document was found
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
113 set urlmsg_errorgettingdoc "siitosvirhe"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
114
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
115 # URL was already known (was in database)
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
116 set urlmsg_alreadyknown "wanha!"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
117 #set urlmsg_alreadyknown "Empiiristen havaintojen perusteella ja tällä sovellutusalueella esiintyneisiin aikaisempiin kontekstuaalisiin ilmaisuihin viitaten uskallan todeta, että sovellukseen ilmoittamasi tietoverkko-osoite oli kronologisti ajatellen varsin postpresentuaalisesti sopimaton ja ennestään hyvin tunnettu."
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
118
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
119 # No match was found when searched with !urlfind or other command
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
120 set urlmsg_nomatch "Ei osumia."
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
121
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
122
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
123 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
124 ### Things that you usually don't need to touch ...
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
125 ###
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
126
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
127 # What IRC "command" should we use to send messages:
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
128 # (Valid alternatives are "PRIVMSG" and "NOTICE")
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
129 set urllog_preferredmsg "PRIVMSG"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
130
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
131 # The valid known Top Level Domains (TLDs), but not the country code TLDs
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
132 # (Now includes the new IANA published TLDs)
90
a9a4456eb213 urllog: Add .xxx TLD to supported list.
Matti Hamalainen <ccr@tnsp.org>
parents: 89
diff changeset
133 set urllog_tlds "org,com,net,mil,gov,biz,edu,coop,aero,info,museum,name,pro,int,xxx"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
134
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
135
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
136 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
137 # No need to look below this line
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
138 ##########################################################################
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
139 set urllog_name "URLLog"
107
92b42a867af4 urllog: Bump version.
Matti Hamalainen <ccr@tnsp.org>
parents: 106
diff changeset
140 set urllog_version "2.2.1"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
141
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
142 set urllog_tlds [split $urllog_tlds ","]
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
143 set urllog_httprep [split "\@|%40|{|%7B|}|%7D|\[|%5B|\]|%5D" "|"]
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
144
102
5425dc418505 urllog: Entity data is now in UTF-8, but TCL source files are interpreted with current system locale, which may not be UTF-8. We must therefore "convert" the entity mapping string to UTF-8 to be certain of TCL's interpretation of its encoding.
Matti Hamalainen <ccr@tnsp.org>
parents: 101
diff changeset
145
111
e09c791b2a48 urllog: Improve entity handling.
Matti Hamalainen <ccr@tnsp.org>
parents: 110
diff changeset
146 set urllog_ent_str "&#45;|-|&#39;|'|—|-|&rlm;||&#8212;|-|&#8211;|--|&#x202a;||&#x202c;||"
e09c791b2a48 urllog: Improve entity handling.
Matti Hamalainen <ccr@tnsp.org>
parents: 110
diff changeset
147 append urllog_ent_str "&lrm;||&aring;|å|&Aring;|Å|&eacute;|é|&#58;|:|&nbsp;| |"
e09c791b2a48 urllog: Improve entity handling.
Matti Hamalainen <ccr@tnsp.org>
parents: 110
diff changeset
148 append urllog_ent_str "&#8221;|\"|&#8220;|\"|&laquo;|<<|&raquo;|>>|&quot;|\"|"
117
d40a0f3af7ab urllog: Add some entities for conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 116
diff changeset
149 append urllog_ent_str "&auml;|ä|&ouml;|ö|&Auml;|Ä|&Ouml;|Ö|&amp;|&|&lt;|<|&gt;|>|"
d40a0f3af7ab urllog: Add some entities for conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 116
diff changeset
150 append urllog_ent_str "&#228;|ä|&#229;|ö"
102
5425dc418505 urllog: Entity data is now in UTF-8, but TCL source files are interpreted with current system locale, which may not be UTF-8. We must therefore "convert" the entity mapping string to UTF-8 to be certain of TCL's interpretation of its encoding.
Matti Hamalainen <ccr@tnsp.org>
parents: 101
diff changeset
151 set urllog_html_ent [split [encoding convertfrom "utf-8" $urllog_ent_str] "|"]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
152
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
153 ### Require packages
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
154 package require sqlite3
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
155 package require http
7
50b52294e93e urllog: Strip &rlm; entities from titles; Some work on SSL/https support.
Matti Hamalainen <ccr@tnsp.org>
parents: 4
diff changeset
156
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
157 ### Binding initializations
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
158 if {$urllog_search != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
159 bind pub - !urlfind urllog_pub_urlfind
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
160 bind msg - urlfind urllog_msg_urlfind
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
161 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
162
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
163 bind pubm - *.* urllog_checkmsg
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
164 bind topc - *.* urllog_checkmsg
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
165
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
166
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
167 ### Initialization messages
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
168 set urllog_message "$urllog_name v$urllog_version (C) 2000-2011 ccr/TNSP"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
169 putlog "$urllog_message"
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
170
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
171 ### HTTP module initialization
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
172 ::http::config -useragent "$urllog_name/$urllog_version"
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
173 if {$http_proxy != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
174 ::http::config -proxyhost $http_proxy_host -proxyport $http_proxy_port
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
175 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
176
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
177 if {$http_tls_support != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
178 package require tls
89
77e05ce9e9b8 urllog: Add certdir option setting.
Matti Hamalainen <ccr@tnsp.org>
parents: 87
diff changeset
179 ::http::register https 443 [list ::tls::socket -request 1 -require 1 -cadir $http_tls_cadir]
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
180 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
181
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
182 ### SQLite database initialization
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
183 if {[catch {sqlite3 urldb $urllog_db_file} uerrmsg]} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
184 putlog " Could not open SQLite3 database '$urllog_db_file': $uerrmsg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
185 exit 2
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
186 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
187
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
188
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
189 if {$http_proxy != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
190 putlog " (Using proxy $http_proxy_host:$http_proxy_port)"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
191 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
192
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
193 if {$urllog_check != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
194 putlog " (Additional URL validity checks enabled)"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
195 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
196
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
197 if {$urllog_verbose != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
198 putlog " (Verbose mode enabled)"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
199 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
200
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
201 if {$urllog_search != 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
202 putlog " (Search commands enabled)"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
203 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
204
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
205 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
206 ### Utility functions
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
207 proc urllog_log {arg} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
208 global urllog_logmsg urllog_name
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
209
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
210 if {$urllog_logmsg != 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
211 putlog "$urllog_name: $arg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
212 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
213 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
214
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
215
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
216 proc urllog_ctime { utime } {
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
217
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
218 if {$utime == "" || $utime == "*"} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
219 set utime 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
220 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
221
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
222 return [clock format $utime -format "%d.%m.%Y %H:%M"]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
223 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
224
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
225
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
226 proc urllog_isnumber {uarg} {
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
227
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
228 foreach i [split $uarg {}] {
65
31c8c4f50aa6 urllog: Improve urllog_isnumber function.
Matti Hamalainen <ccr@tnsp.org>
parents: 62
diff changeset
229 if {![string match \[0-9\] $i]} { return 0 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
230 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
231
65
31c8c4f50aa6 urllog: Improve urllog_isnumber function.
Matti Hamalainen <ccr@tnsp.org>
parents: 62
diff changeset
232 return 1
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
233 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
234
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
235
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
236 proc urllog_msg {apublic anick achan amsg} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
237 global urllog_preferredmsg
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
238
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
239 if {$apublic == 1} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
240 putserv "$urllog_preferredmsg $achan :$amsg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
241 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
242 putserv "$urllog_preferredmsg $anick :$amsg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
243 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
244 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
245
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
246
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
247 proc urllog_verb_msg {anick achan amsg} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
248 global urllog_verbose
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
249
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
250 if {$urllog_verbose != 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
251 urllog_msg 1 $anick $achan $amsg
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
252 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
253 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
254
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
255
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
256 proc urllog_convert_ent {udata} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
257 global urllog_html_ent
115
5db02af76016 urllog: Improve entity conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
258 return [string map -nocase $urllog_html_ent [string map $urllog_html_ent $udata]]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
259 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
260
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
261
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
262 proc urllog_escape { str } {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
263 return [string map {' ''} $str]
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
264 }
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
265
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
266
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
267 proc urllog_sanitize_encoding {uencoding} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
268 regsub -- "^\[a-z\]\[a-z\]_\[A-Z\]\[A-Z\]\." $uencoding "" uencoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
269 set uencoding [string tolower $uencoding]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
270 regsub -- "^iso-" $uencoding "iso" uencoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
271 return $uencoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
272 }
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
273
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
274
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
275 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
276 proc urllog_get_short {utime} {
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
277 global urllog_shorturl_prefix
13
e06d41fb69d5 Begin work on converting urllog.tcl to use an SQLite3 database instead of flat file.
Matti Hamalainen <ccr@tnsp.org>
parents: 8
diff changeset
278
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
279 set ustr "ABCDEFGHIJKLNMOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
280 set ulen [string length $ustr]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
281
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
282 set u1 [expr $utime / ($ulen * $ulen)]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
283 set utmp [expr $utime % ($ulen * $ulen)]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
284 set u2 [expr $utmp / $ulen]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
285 set u3 [expr $utmp % $ulen]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
286
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
287 return "\[ $urllog_shorturl_prefix[string index $ustr $u1][string index $ustr $u2][string index $ustr $u3] \]"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
288 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
289
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
290
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
291 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
292 proc urllog_chop_url {url} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
293 global urllog_shorturl_orig
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
294
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
295 if {[string length $url] > $urllog_shorturl_orig} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
296 return "[string range $url 0 $urllog_shorturl_orig]..."
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
297 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
298 return $url
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
299 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
300 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
301
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
302 #-------------------------------------------------------------------------
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
303 proc urllog_exists {urlStr urlNick urlHost urlChan} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
304 global urldb urlmsg_alreadyknown urllog_shorturl
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
305
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
306 set usql "SELECT id AS uid, utime AS utime, url AS uurl, user AS uuser, host AS uhost, chan AS uchan, title AS utitle FROM urls WHERE url='[urllog_escape $urlStr]'"
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
307 urldb eval $usql {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
308 urllog_log "URL said by $urlNick ($urlStr) already known"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
309 if {$urllog_shorturl != 0} {
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
310 set qstr "[urllog_get_short $uid] "
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
311 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
312 set qstr ""
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
313 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
314 append qstr "($uuser/$uchan@[urllog_ctime $utime])"
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
315 if {[string length $utitle] > 0} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
316 set qstr "$urlmsg_alreadyknown - '$utitle' $qstr"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
317 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
318 set qstr "$urlmsg_alreadyknown $qstr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
319 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
320 urllog_verb_msg $urlNick $urlChan $qstr
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
321 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
322 }
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
323 return 1
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
324 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
325
18
1e2232135354 More changes for SQLite support.
Matti Hamalainen <ccr@tnsp.org>
parents: 13
diff changeset
326
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
327 #-------------------------------------------------------------------------
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
328 proc urllog_addurl {urlStr urlNick urlHost urlChan urlTitle} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
329 global urldb urllog_shorturl
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
330
93
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
331 if {$urlTitle == ""} {
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
332 set uins "NULL"
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
333 } else {
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
334 set uins "'[urllog_escape $urlTitle]'"
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
335 }
4e02c0219afe urllog: Insert NULL into title column when we didn't get a title.
Matti Hamalainen <ccr@tnsp.org>
parents: 92
diff changeset
336 set usql "INSERT INTO urls (utime,url,user,host,chan,title) VALUES ([unixtime], '[urllog_escape $urlStr]', '[urllog_escape $urlNick]', '[urllog_escape $urlHost]', '[urllog_escape $urlChan]', $uins)"
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
337 if {[catch {urldb eval $usql} uerrmsg]} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
338 urllog_log "$uerrmsg on SQL:\n$usql"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
339 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
340 }
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
341 set uid [urldb last_insert_rowid]
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
342 urllog_log "Added URL ($urlNick@$urlChan): $urlStr"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
343
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
344
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
345 ### Let's say something, to confirm that everything went well.
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
346 if {$urllog_shorturl != 0} {
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
347 set qstr "[urllog_get_short $uid] "
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
348 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
349 set qstr ""
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
350 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
351 if {[string length $urlTitle] > 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
352 urllog_verb_msg $urlNick $urlChan "'$urlTitle' ([urllog_chop_url $urlStr]) $qstr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
353 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
354 urllog_verb_msg $urlNick $urlChan "[urllog_chop_url $urlStr] $qstr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
355 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
356
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
357 return 1
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
358 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
359
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
360
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
361 #-------------------------------------------------------------------------
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
362 proc urllog_http_handler {utoken utotal ucurr} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
363 upvar #0 $utoken state
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
364
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
365 # Stop fetching data after 3000 bytes, this should be enough to
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
366 # contain the head section of a HTML page.
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
367 if {$ucurr > 64000} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
368 set state(status) "ok"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
369 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
370 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
371
84
fa1e95c2a0bc urllog: Bump version to 2.1.
Matti Hamalainen <ccr@tnsp.org>
parents: 83
diff changeset
372
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
373 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
374 proc urllog_checkurl {urlStr urlNick urlHost urlChan} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
375 global urllog_tlds urllog_check urlmsg_nosuchhost urlmsg_ioerror
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
376 global urlmsg_timeout urlmsg_errorgettingdoc urllog_httprep
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
377 global urllog_shorturl_prefix urllog_shorturl urllog_encoding
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
378 global http_tls_support
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
379
96
e5a6c27be365 urllog: Comments and cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 95
diff changeset
380 ### Try to guess the URL protocol component (if it is missing)
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
381 set u_checktld 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
382 if {[string match "*www.*" $urlStr] && ![string match "http://*" $urlStr] && ![string match "https://*" $urlStr]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
383 set urlStr "http://$urlStr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
384 } elseif {[string match "*ftp.*" $urlStr] && ![string match "ftp://*" $urlStr]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
385 set urlStr "ftp://$urlStr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
386 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
387
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
388 ### Handle URLs that have an IPv4-address
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
389 if {[regexp "(\[a-z\]+)://(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})\\.(\[0-9\]{1,3})" $urlStr u_match u_proto ni1 ni2 ni3 ni4]} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
390 # Check if the IP is on local network
92
f6f4595856ff urllog: Cosmetics. Remove useless parenthesis.
Matti Hamalainen <ccr@tnsp.org>
parents: 91
diff changeset
391 if {$ni1 == 127 || $ni1 == 10 || ($ni1 == 192 && $ni2 == 168) || $ni1 == 0} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
392 urllog_log "URL pointing to local or invalid network, ignored ($urlStr)."
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
393 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
394 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
395 # Skip TLD check for URLs with IP address
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
396 set u_checktld 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
397 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
398
96
e5a6c27be365 urllog: Comments and cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 95
diff changeset
399 ### Check now if we have an ShortURL here ...
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
400 if {$urllog_shorturl != 0 && [string match "*$urllog_shorturl_prefix*" $urlStr]} {
98
fbbe7ee40e2f urllog: Improve one informational / error message.
Matti Hamalainen <ccr@tnsp.org>
parents: 97
diff changeset
401 urllog_log "Ignoring ShortURL from $urlNick: $urlStr"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
402 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
403 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
404
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
405 ### Get URL protocol component
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
406 set u_proto ""
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
407 if {[regexp "(\[a-z\]+)://" $urlStr u_match u_proto]} {
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
408 }
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
409
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
410 ### Check the PORT (if the ":" is there)
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
411 set u_record [split $urlStr "/"]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
412 set u_hostname [lindex $u_record 2]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
413 set u_port [lindex [split $u_hostname ":"] end]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
414
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
415 if {![urllog_isnumber $u_port] && $u_port != "" && $u_port != $u_hostname} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
416 urllog_log "Broken URL from $urlNick: ($urlStr) illegal port $u_port"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
417 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
418 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
419
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
420 # Default to port 80 (HTTP)
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
421 if {![urllog_isnumber $u_port]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
422 set u_port 80
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
423 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
424
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
425 ### Is it a http or ftp url? (FIX ME!)
97
366e68ad94df urllog: Use u_proto variable to check for if the protocol is supported instead of doing useless additional string checking.
Matti Hamalainen <ccr@tnsp.org>
parents: 96
diff changeset
426 if {$u_proto != "http" && $u_proto != "https" && $u_proto != "ftp"} {
366e68ad94df urllog: Use u_proto variable to check for if the protocol is supported instead of doing useless additional string checking.
Matti Hamalainen <ccr@tnsp.org>
parents: 96
diff changeset
427 urllog_log "Broken URL from $urlNick: ($urlStr) UNSUPPORTED protocol class ($u_proto)."
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
428 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
429 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
430
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
431 ### Check the Top Level Domain (TLD) validity
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
432 if {$u_checktld != 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
433 set u_sane [lindex [split $u_hostname "."] end]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
434 set u_tld [lindex [split $u_sane ":"] 0]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
435 set u_found 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
436
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
437 if {[string length $u_tld] == 2} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
438 # Assume all 2-letter domains to be valid :)
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
439 set u_found 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
440 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
441 # Check our list of known TLDs
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
442 foreach itld $urllog_tlds {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
443 if {[string match $itld $u_tld]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
444 set u_found 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
445 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
446 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
447 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
448
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
449 if {$u_found == 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
450 urllog_log "Broken URL from $urlNick: ($urlStr) illegal TLD: $u_tld."
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
451 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
452 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
453 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
454
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
455 set urlStr [string map $urllog_httprep $urlStr]
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
456
91
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
457 ### Does the URL already exist?
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
458 if {![urllog_exists $urlStr $urlNick $urlHost $urlChan]} {
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
459 return 1
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
460 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
461
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
462 ### Do we perform additional optional checks?
95
687bdd74dfac urllog: Check if TLS support is enabled when checking if we can fetch title information via HTTP or SSL/HTTP.
Matti Hamalainen <ccr@tnsp.org>
parents: 93
diff changeset
463 if {$urllog_check == 0 || ($http_tls_support == 0 && $u_proto == "https") || $u_proto != "http"} {
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
464 # No optional checks, just add the URL, if it does not exist already
91
6f4bfd8e9447 urllog: Reorder code and make it simpler by removing duplicate checks.
Matti Hamalainen <ccr@tnsp.org>
parents: 90
diff changeset
465 urllog_addurl $urlStr $urlNick $urlHost $urlChan ""
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
466 return 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
467 }
7
50b52294e93e urllog: Strip &rlm; entities from titles; Some work on SSL/https support.
Matti Hamalainen <ccr@tnsp.org>
parents: 4
diff changeset
468
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
469 ### Does the document pointed by the URL exist?
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
470 if {[catch {set utoken [::http::geturl $urlStr -progress urllog_http_handler -blocksize 1024 -timeout 3000]} uerrmsg]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
471 urllog_verb_msg $urlNick $urlChan "$urlmsg_ioerror ($uerrmsg)"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
472 urllog_log "HTTP request failed: $uerrmsg"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
473 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
474 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
475
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
476 if {[::http::status $utoken] == "timeout"} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
477 urllog_verb_msg $urlNick $urlChan "$urlmsg_timeout"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
478 urllog_log "HTTP request timed out ($urlStr)"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
479 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
480 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
481
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
482 if {[::http::status $utoken] != "ok"} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
483 urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ([::http::error $utoken])"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
484 urllog_log "Error in HTTP transaction: [::http::error $utoken] ($urlStr)"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
485 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
486 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
487
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
488 # Fixme! Handle redirects!
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
489 set ucode [::http::ncode $utoken]
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
490 set udata [::http::data $utoken]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
491 array set umeta [::http::meta $utoken]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
492 ::http::cleanup $utoken
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
493
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
494 if {$ucode >= 200 && $ucode <= 309} {
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
495 set uenc_doc ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
496 set uenc_http ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
497 set uencoding ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
498
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
499 # Get information about specified character encodings
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
500 if {[info exists umeta(Content-Type)] && [regexp -nocase {charset\s*=\s*([a-z0-9._-]+)} $umeta(Content-Type) umatches uenc_http]} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
501 # Found character set encoding information in HTTP headers
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
502 }
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
503
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
504 if {[regexp -nocase -- "<meta.\*\?content=\"text/html.\*\?charset=(\[^\"\]*)\".\*\?/>" $udata umatches uenc_doc]} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
505 # Found old style HTML meta tag with character set information
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
506 } elseif {[regexp -nocase -- "<meta.\*\?charset=\"(\[^\"\]*)\".\*\?/>" $udata umatches uenc_doc]} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
507 # Found HTML5 style meta tag with character set information
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
508 }
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
509
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
510 # Make sanitized versions of the encoding strings
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
511 set uenc_http2 [urllog_sanitize_encoding $uenc_http]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
512 set uenc_doc2 [urllog_sanitize_encoding $uenc_doc]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
513
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
514 # KLUDGE!
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
515 set uencoding $uenc_http2
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
516
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
517 # Check if the document has specified encoding
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
518 if {$uenc_doc != ""} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
519 # Does it differ from what HTTP says?
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
520 if {$uenc_http != "" && $uenc_doc != $uenc_http && $uenc_doc2 != $uenc_http2} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
521 # Yes, we will try reconverting
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
522 set uencoding $uenc_doc2
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
523 }
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
524 } elseif {$uenc_http == ""} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
525 # If _NO_ known encoding of any kind, assume the default of iso8859-1
86
4c2b6482c08c urllog: Different strategy for charset encoding conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 84
diff changeset
526 set uencoding "iso8859-1"
4c2b6482c08c urllog: Different strategy for charset encoding conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 84
diff changeset
527 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
528
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
529 # Get the document title, if any
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
530 set urlTitle ""
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
531 if {[regexp -nocase -- "<title>(.\*\?)</title>" $udata umatches urlTitle]} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
532 # If character set conversion is required, do it now
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
533 if {$uencoding != ""} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
534 putlog "conversion requested from $uencoding"
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
535 if {[catch {set urlTitle [encoding convertfrom $uencoding $urlTitle]} cerrmsg]} {
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
536 urllog_log "Error in charset conversion: $cerrmsg"
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
537 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
538 }
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
539
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
540 # Convert some HTML entities to plaintext and do some cleanup
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
541 set utmp [urllog_convert_ent $urlTitle]
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
542 regsub -all "\r|\n|\t" $utmp " " utmp
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
543 regsub -all " *" $utmp " " utmp
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
544 set urlTitle [string trim $utmp]
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
545 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
546
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
547 # Rasiatube hack
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
548 if {[string match "*/rasiatube/view*" $urlStr]} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
549 set rasia 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
550 set umatches [regexp -nocase -inline -- "<link rel=\"video_src\"\.\*\?file=(http://\[^&\]+)&" $udata]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
551 if {[llength $umatches] > 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
552 set urlStr [lindex $umatches 1]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
553 regsub -all "\/v\/" $urlStr "\/watch\?v=" urlStr
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
554 set rasia 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
555 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
556 set umatches [regexp -nocase -inline -- "SWFObject.\"(\[^\"\]+)\", *\"flashvideo" $udata]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
557 if {[llength $umatches] > 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
558 set urlStr [lindex $umatches 1]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
559 regsub "http:\/\/www.dailymotion.com\/swf\/" $urlStr "http:\/\/www.dailymotion.com\/video\/" urlStr
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
560 set rasia 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
561 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
562 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
563
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
564 if {$rasia != 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
565 urllog_log "RasiaTube mangler: $urlStr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
566 urllog_verb_msg $urlNick $urlChan "Korjataan haiseva rasiatube-linkki: $urlStr"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
567 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
568 }
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
569
83
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
570 # Check if the URL already exists, just in case we had some redirects
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
571 if {[urllog_exists $urlStr $urlNick $urlHost $urlChan]} {
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
572 urllog_addurl $urlStr $urlNick $urlHost $urlChan $urlTitle
f171a9fb7b7b urllog: Split urllog_add function to urllog_exists for checking whether given URL already exists in the database. Use urllog_exists where appropriate.
Matti Hamalainen <ccr@tnsp.org>
parents: 82
diff changeset
573 }
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
574 return 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
575 } else {
116
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
576 urllog_verb_msg $urlNick $urlChan "$urlmsg_errorgettingdoc ($ucode)"
4f3edcf72987 urllog: Improvements in document / HTTP encoding handling and conversion.
Matti Hamalainen <ccr@tnsp.org>
parents: 115
diff changeset
577 urllog_log "$ucode - $urlStr"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
578 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
579
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
580 ::http::cleanup $utoken
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
581 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
582
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
583
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
584 #-------------------------------------------------------------------------
87
97c56d1e9ce2 urllog: Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 86
diff changeset
585 proc urllog_checkmsg {unick uhost uhand uchan utext} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
586 ### Check the nick
87
97c56d1e9ce2 urllog: Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 86
diff changeset
587 if {$unick == "*"} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
588 urllog_log "urllog_checkmsg: nick was wc, this should not happen."
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
589 return 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
590 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
591
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
592 ### Do the URL checking
87
97c56d1e9ce2 urllog: Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 86
diff changeset
593 foreach str [split $utext " "] {
97c56d1e9ce2 urllog: Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 86
diff changeset
594 if {[regexp "(ftp|http|https)://|www\..+|ftp\..*" $str]} {
97c56d1e9ce2 urllog: Cleanups.
Matti Hamalainen <ccr@tnsp.org>
parents: 86
diff changeset
595 urllog_checkurl $str $unick $uhost $uchan
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
596 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
597 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
598
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
599 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
600 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
601
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
602
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
603 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
604 ### Parse arguments, find and show the results
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
605 proc urllog_find {unick uhand uchan utext upublic} {
62
6428b1bcb34b urllog: Remove some global variable references where they are not used.
Matti Hamalainen <ccr@tnsp.org>
parents: 50
diff changeset
606 global urllog_shorturl urldb
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
607 global urllog_showmax_pub urllog_showmax_priv urlmsg_nomatch
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
608
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
609 if {$upublic == 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
610 set ulimit 5
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
611 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
612 set ulimit 3
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
613 }
19
9cf22053e5da Repair !urlfind functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 18
diff changeset
614
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
615 ### Parse the given command
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
616 urllog_log "$unick/$uhand searched URL: $utext"
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
617
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
618 set ftokens [split $utext " "]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
619 set fpatlist ""
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
620 foreach ftoken $ftokens {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
621 set fprefix [string range $ftoken 0 0]
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
622 set fpattern [string range $ftoken 1 end]
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
623
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
624 if {$fprefix == "-"} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
625 lappend fpatlist "url NOT LIKE '%[urllog_escape $fpattern]%'"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
626 } elseif {$fprefix == "%"} {
110
4aa1e1d545ed urllog and quotedb: Use SQL LIKE operator for username search terms to avoid unnecessary case-sensitivity.
Matti Hamalainen <ccr@tnsp.org>
parents: 109
diff changeset
627 lappend fpatlist "user LIKE '[urllog_escape $fpattern]'"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
628 } elseif {$fprefix == "@"} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
629 # foo
112
fae3dd7a8b20 urllog: Oops, a typo in variable name. Fixed.
Matti Hamalainen <ccr@tnsp.org>
parents: 111
diff changeset
630 } elseif {$fprefix == "+"} {
109
74cb254dbf09 urllog and quotedb: Handle "+"-prefix in searches as it is documented.
Matti Hamalainen <ccr@tnsp.org>
parents: 107
diff changeset
631 lappend fpatlist "url LIKE '%[urllog_escape $fpattern]%'"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
632 } else {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
633 lappend fpatlist "url LIKE '%[urllog_escape $ftoken]%'"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
634 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
635 }
19
9cf22053e5da Repair !urlfind functionality.
Matti Hamalainen <ccr@tnsp.org>
parents: 18
diff changeset
636
27
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
637 if {[llength $fpatlist] > 0} {
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
638 set fquery "WHERE [join $fpatlist " AND "]"
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
639 } else {
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
640 set fquery ""
6e381916b016 Some fixes in the query mechanisms of QuoteDB and URLLog.
Matti Hamalainen <ccr@tnsp.org>
parents: 20
diff changeset
641 }
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
642
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
643 set iresults 0
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
644 set usql "SELECT id AS uid, utime AS utime, url AS uurl, user AS uuser, host AS uhost FROM urls $fquery ORDER BY utime DESC LIMIT $ulimit"
68
3762c621d1c3 urllog: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 65
diff changeset
645 urldb eval $usql {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
646 incr iresults
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
647 set shortURL $uurl
82
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
648 if {$urllog_shorturl != 0 && $uid != ""} {
1bbc79f41a1c urllog: Rename few variables for clarity.
Matti Hamalainen <ccr@tnsp.org>
parents: 81
diff changeset
649 set shortURL "$shortURL [urllog_get_short $uid]"
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
650 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
651 urllog_msg $upublic $unick $uchan "#$iresults: $shortURL ($uuser@[urllog_ctime $utime])"
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
652 }
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
653
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
654 if {$iresults == 0} {
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
655 # If no URLs were found
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
656 urllog_msg $upublic $unick $uchan $urlmsg_nomatch
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
657 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
658
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
659 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
660 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
661
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
662
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
663 #-------------------------------------------------------------------------
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
664 ### Finding binded functions
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
665 proc urllog_pub_urlfind {unick uhost uhand uchan utext} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
666 urllog_find $unick $uhand $uchan $utext 1
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
667 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
668 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
669
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
670
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
671 proc urllog_msg_urlfind {unick uhost uhand utext} {
28
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
672 urllog_find $unick $uhand "" $utext 0
a59e312b1513 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 27
diff changeset
673 return 0
3
8003090caa35 Lots of code cleanups, add "fixer" for RasiaTube links (which suck) to point directly to Youtube.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
674 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
675
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
676 # end of script