annotate fetch_feeds.tcl @ 425:e5810c52d376

Bump some copyright years and versions.
author Matti Hamalainen <ccr@tnsp.org>
date Sun, 08 Jan 2017 03:57:05 +0200
parents 825cac46b1cb
children dbe249968591
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
1 #!/usr/bin/tclsh
1
bdb2b1fd6601 Add some comments.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
2 #
bdb2b1fd6601 Add some comments.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
3 # NOTICE! Change above path to correct tclsh binary path!
bdb2b1fd6601 Add some comments.
Matti Hamalainen <ccr@tnsp.org>
parents: 0
diff changeset
4 #
268
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
5 ##############################################################################
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
6 #
425
e5810c52d376 Bump some copyright years and versions.
Matti Hamalainen <ccr@tnsp.org>
parents: 424
diff changeset
7 # FeedCheck fetcher v1.1 by Matti 'ccr' Hamalainen <ccr@tnsp.org>
e5810c52d376 Bump some copyright years and versions.
Matti Hamalainen <ccr@tnsp.org>
parents: 424
diff changeset
8 # (C) Copyright 2008-2017 Tecnic Software productions (TNSP)
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
9 #
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
10 # This script is freely distributable under GNU GPL (version 2) license.
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
11 #
268
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
12 ##############################################################################
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
13
265
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
14 ### The configuration should be in config.feeds in same directory
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
15 ### as this script. Or change the line below to point where ever
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
16 ### you wish. See "config.feeds.example" for an example config file.
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
17 source [file dirname [info script]]/config.feeds
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
18
422
880a07485275 Add utl_ctime() to utillib and use it elsewhere.
Matti Hamalainen <ccr@tnsp.org>
parents: 350
diff changeset
19 ### Required utillib.tcl
880a07485275 Add utl_ctime() to utillib and use it elsewhere.
Matti Hamalainen <ccr@tnsp.org>
parents: 350
diff changeset
20 source [file dirname [info script]]/utillib.tcl
880a07485275 Add utl_ctime() to utillib and use it elsewhere.
Matti Hamalainen <ccr@tnsp.org>
parents: 350
diff changeset
21
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
22
268
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
23 ##############################################################################
139
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
24
423
44c9128097cd feeds: Remember to require sqlite3 package.
Matti Hamalainen <ccr@tnsp.org>
parents: 422
diff changeset
25 package require sqlite3
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
26 package require http
271
f47b41d2be64 feeds: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 268
diff changeset
27
265
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
28 if {[info exists http_user_agent] && $http_user_agent != ""} {
296
a7455b0dc144 Fix feed fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 292
diff changeset
29 ::http::config -urlencoding utf8 -useragent $http_user_agent
265
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
30 } else {
322
b4adc56446f6 feeds: Update user agent.
Matti Hamalainen <ccr@tnsp.org>
parents: 321
diff changeset
31 ::http::config -urlencoding utf8 -useragent "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"
265
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
32 }
271
f47b41d2be64 feeds: Cosmetics.
Matti Hamalainen <ccr@tnsp.org>
parents: 268
diff changeset
33
268
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
34 if {[info exists http_use_proxy] && $http_use_proxy != 0} {
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
35 ::http::config -proxyhost $http_proxy_host -proxyport $http_proxy_port
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
36 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
37
268
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
38 if {[info exists http_tls_support] && $http_tls_support != 0} {
265
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
39 package require tls
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
40 ::http::register https 443 [list ::tls::socket -request 1 -require 1 -tls1 1 -cadir $http_tls_cadir]
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
41 }
908edc54005a feeds: Move configuration to separate file.
Matti Hamalainen <ccr@tnsp.org>
parents: 159
diff changeset
42
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
43
268
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
44 ##############################################################################
96310b1c88fa feeds: Improve config resiliency.
Matti Hamalainen <ccr@tnsp.org>
parents: 265
diff changeset
45
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
46 proc fetch_dorequest { urlStr urlStatus urlSCode urlCode urlData urlMeta } {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
47 upvar 1 $urlStatus ustatus
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
48 upvar 1 $urlSCode uscode
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
49 upvar 1 $urlCode ucode
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
50 upvar 1 $urlData udata
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
51 upvar 1 $urlMeta umeta
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
52
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
53 if {[catch {set utoken [::http::geturl $urlStr -timeout 6000 -binary 1 -headers {Accept-Encoding identity}]} uerrmsg]} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
54 puts "HTTP request failed: $uerrmsg"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
55 return 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
56 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
57
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
58 set ustatus [::http::status $utoken]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
59 if {$ustatus == "timeout"} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
60 puts "HTTP request timed out ($urlStr)"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
61 return 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
62 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
63
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
64 if {$ustatus != "ok"} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
65 puts "Error in HTTP transaction: [::http::error $utoken] ($urlStr)"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
66 return 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
67 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
68
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
69 set ustatus [::http::status $utoken]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
70 set uscode [::http::code $utoken]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
71 set ucode [::http::ncode $utoken]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
72 set udata [::http::data $utoken]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
73 array set umeta [::http::meta $utoken]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
74 ::http::cleanup $utoken
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
75
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
76 return 1
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
77 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
78
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
79
139
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
80 proc add_entry {uname uprefix uurl utitle} {
142
4c51eeba993f Rename table.
Matti Hamalainen <ccr@tnsp.org>
parents: 140
diff changeset
81 global currclock feeds_db nitems
292
9f90d6918626 feeds: Also use the html entity conversion from utillib here.
Matti Hamalainen <ccr@tnsp.org>
parents: 271
diff changeset
82 set utmp [utl_convert_html_ent $uurl]
147
48460e925a8c Fix feed getter.
Matti Hamalainen <ccr@tnsp.org>
parents: 146
diff changeset
83 if {[string match "http://*" $utmp] || [string match "https://*" $utmp]} {
48460e925a8c Fix feed getter.
Matti Hamalainen <ccr@tnsp.org>
parents: 146
diff changeset
84 set utest "$utmp"
48460e925a8c Fix feed getter.
Matti Hamalainen <ccr@tnsp.org>
parents: 146
diff changeset
85 } else {
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
86 if {[string range $uprefix end end] != "/" && [string range $utmp 0 0] != "/"} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
87 set utest "$uprefix/$utmp"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
88 } else {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
89 set utest "$uprefix$utmp"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
90 }
147
48460e925a8c Fix feed getter.
Matti Hamalainen <ccr@tnsp.org>
parents: 146
diff changeset
91 }
139
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
92
296
a7455b0dc144 Fix feed fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 292
diff changeset
93 set usql "SELECT title FROM feeds WHERE url='[utl_escape $utest]' AND feed='[utl_escape $uname]'"
140
b0648e05c855 Change some variable names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 139
diff changeset
94 if {![feeds_db exists $usql]} {
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
95 # puts "NEW: $utest : $utitle"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
96 set usql "INSERT INTO feeds (feed,utime,url,title) VALUES ('[utl_escape $uname]', $currclock, '[utl_escape $utest]', '[utl_escape [utl_convert_html_ent $utitle]]')"
142
4c51eeba993f Rename table.
Matti Hamalainen <ccr@tnsp.org>
parents: 140
diff changeset
97 incr nitems
140
b0648e05c855 Change some variable names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 139
diff changeset
98 if {[catch {feeds_db eval $usql} uerrmsg]} {
139
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
99 puts "\nError: $uerrmsg on:\n$usql"
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
100 exit 15
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
101 }
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
102 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
103 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
104
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
105
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
106 proc add_rss_feed {datauri dataname dataprefix} {
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
107 if {[catch {set utoken [::http::geturl $datauri -binary 1 -timeout 6000 -headers {Accept-Encoding identity}]} uerrmsg]} {
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
108 puts "Error getting $datauri: $uerrmsg"
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
109 return 1
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
110 }
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
111 set upage [::http::data $utoken]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
112 ::http::cleanup $utoken
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 423
diff changeset
113
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
114 set umatches [regexp -all -nocase -inline -- "<item>.\*\?<title><..CDATA.(.\*\?)\\\]\\\]></title>.\*\?<link>(http.\*\?)</link>.\*\?</item>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
115 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
116 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
117 add_entry $dataname $dataprefix [lindex $umatches [expr $n+2]] [lindex $umatches [expr $n+1]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
118 }
424
825cac46b1cb Cosmetic / stray trailing whitespace cleanup.
Matti Hamalainen <ccr@tnsp.org>
parents: 423
diff changeset
119
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
120 if {$nmatches == 0} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
121 set umatches [regexp -all -nocase -inline -- "<item>.\*\?<title>(.\*\?)</title>.\*\?<link>(http.\*\?)</link>.\*\?</item>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
122 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
123 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
124 add_entry $dataname $dataprefix [lindex $umatches [expr $n+2]] [lindex $umatches [expr $n+1]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
125 }
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
126 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
127
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
128 if {$nmatches == 0} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
129 set umatches [regexp -all -nocase -inline -- "<item \[^>\]*>.\*\?<title>(.\*\?)</title>.\*\?<link>(http.\*\?)</link>.\*\?</item>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
130 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
131 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
132 add_entry $dataname $dataprefix [lindex $umatches [expr $n+2]] [lindex $umatches [expr $n+1]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
133 }
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
134 }
143
96b42289f1e7 Fixes in feeds checker.
Matti Hamalainen <ccr@tnsp.org>
parents: 142
diff changeset
135
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
136 return 0
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
137 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
138
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
139
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
140 ##############################################################################
69
df3230f8aa46 Translate some comments to english and cosmetic fixes.
Matti Hamalainen <ccr@tnsp.org>
parents: 63
diff changeset
141 ### Fetch and parse Halla-aho's blog page data
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
142 proc fetch_halla_aho { } {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
143 set datauri "http://www.halla-aho.com/scripta/";
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
144 set dataname "Mestari"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
145 if {![fetch_dorequest $datauri ustatus uscode ucode upage umeta]} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
146 return 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
147 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
148
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
149 set umatches [regexp -all -nocase -inline -- "<a href=\"(\[^\"\]+\.html)\"><b>(\[^<\]+)</b>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
150 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
151 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
152 add_entry $dataname $datauri [lindex $umatches [expr $n+1]] [lindex $umatches [expr $n+2]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
153 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
154
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
155 set umatches [regexp -all -nocase -inline -- "<a href=\"(\[^\"\]+\.html)\">(\[^<\]\[^b\]\[^<\]+)</a>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
156 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
157 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
158 add_entry $dataname $datauri [lindex $umatches [expr $n+1]] [lindex $umatches [expr $n+2]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
159 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
160 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
161
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
162
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
163 ### The Adventurers
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
164 proc fetch_adventurers { } {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
165 set datauri "http://www.peldor.com/chapters/index_sidebar.html";
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
166 set dataname "The Adventurers"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
167 if {![fetch_dorequest $datauri ustatus uscode ucode upage umeta]} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
168 return 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
169 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
170
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
171 set umatches [regexp -all -nocase -inline -- "<a href=\"(\[^\"\]+)\">(\[^<\]+)</a>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
172 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
173 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
174 add_entry $dataname "http://www.peldor.com/" [lindex $umatches [expr $n+1]] [lindex $umatches [expr $n+2]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
175 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
176 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
177
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
178
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
179 ### Order of the Stick
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
180 proc fetch_oots { } {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
181 set datauri "http://www.giantitp.com/comics/oots.html";
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
182 set dataname "OOTS"
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
183 if {![fetch_dorequest $datauri ustatus uscode ucode upage umeta]} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
184 return 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
185 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
186
63
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
187 set umatches [regexp -all -nocase -inline -- "<a href=\"(/comics/oots\[0-9\]+\.html)\">(\[^<\]+)</a>" $upage]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
188 set nmatches [llength $umatches]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
189 for {set n 0} {$n < $nmatches} {incr n 3} {
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
190 add_entry $dataname "http://www.giantitp.com" [lindex $umatches [expr $n+1]] [lindex $umatches [expr $n+2]]
7b03971c6d28 Remove tabs and reindent.
Matti Hamalainen <ccr@tnsp.org>
parents: 1
diff changeset
191 }
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
192 }
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
193
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
194
350
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
195 ### Poliisi tiedotteet
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
196 proc fetch_poliisi { datauri dataname dataprefix } {
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
197 if {![fetch_dorequest $datauri ustatus uscode ucode upage umeta]} {
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
198 return 0
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
199 }
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
200
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
201 set umatches [regexp -all -nocase -inline -- "<div class=\"channelitem\"><div class=\"date\">(.*?)</div><a class=\"article\" href=\"(\[^\"\]+)\">(\[^<\]+)</a>" $upage]
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
202 set nmatches [llength $umatches]
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
203 for {set n 0} {$n < $nmatches} {incr n 4} {
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
204 set stmp [string trim [lindex $umatches [expr $n+3]]]
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
205 add_entry $dataname $dataprefix [lindex $umatches [expr $n+2]] "[lindex $umatches [expr $n+1]]: $stmp"
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
206 }
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
207 }
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
208
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
209
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
210
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
211
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
212 ### Open database, etc
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
213 set nitems 0
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
214 set currclock [clock seconds]
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
215 global feeds_db
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
216 if {[catch {sqlite3 feeds_db $feeds_dbfile} uerrmsg]} {
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
217 puts "Could not open SQLite3 database '$feeds_dbfile': $uerrmsg."
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
218 exit 2
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
219 }
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
220
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
221
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
222 ### Fetch the feeds
350
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
223 fetch_poliisi "http://www.poliisi.fi/oulu/tiedotteet/1/0?all1/0" "Poliisi/Oulu" "http://www.poliisi.fi"
51c08336d7b1 feeds: Add support for Poliisi.fi information reports.
Matti Hamalainen <ccr@tnsp.org>
parents: 323
diff changeset
224
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
225 fetch_halla_aho
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
226
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
227 fetch_adventurers
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
228
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
229 fetch_oots
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
230
143
96b42289f1e7 Fixes in feeds checker.
Matti Hamalainen <ccr@tnsp.org>
parents: 142
diff changeset
231 #add_rss_feed "http://www.kaleva.fi/rss/145.xml" "Kaleva/Tiede" ""
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
232
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
233 add_rss_feed "http://www.effi.org/xml/uutiset.rss" "EFFI" ""
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
234
143
96b42289f1e7 Fixes in feeds checker.
Matti Hamalainen <ccr@tnsp.org>
parents: 142
diff changeset
235 add_rss_feed "http://static.mtv3.fi/rss/uutiset_rikos.rss" "MTV3/Rikos" ""
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
236
321
d8b957796121 feeds: Refactor the feeds fetching.
Matti Hamalainen <ccr@tnsp.org>
parents: 296
diff changeset
237 #add_rss_feed "http://www.blastwave-comic.com/rss/blastwave.xml" "Blastwave" ""
0
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
238
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
239 #add_rss_feed "http://lehti.samizdat.info/feed/" "Lehti" ""
1c4e2814cd41 Initial import.
Matti Hamalainen <ccr@tnsp.org>
parents:
diff changeset
240
139
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
241
3305e142eecc Change feed fetcher to use SQLite3 backend.
Matti Hamalainen <ccr@tnsp.org>
parents: 114
diff changeset
242 ### Close database
140
b0648e05c855 Change some variable names, etc.
Matti Hamalainen <ccr@tnsp.org>
parents: 139
diff changeset
243 feeds_db close
142
4c51eeba993f Rename table.
Matti Hamalainen <ccr@tnsp.org>
parents: 140
diff changeset
244
4c51eeba993f Rename table.
Matti Hamalainen <ccr@tnsp.org>
parents: 140
diff changeset
245 puts "$nitems new items."