# HG changeset patch # User Matti Hamalainen # Date 1315781453 -10800 # Node ID 4c2b6482c08c348774e64ada114989280b3efdca # Parent a8278d55c6dbe9b833ca4712b3e85aa19b45d4af urllog: Different strategy for charset encoding conversion. diff -r a8278d55c6db -r 4c2b6482c08c urllog.tcl --- a/urllog.tcl Mon Sep 12 01:41:00 2011 +0300 +++ b/urllog.tcl Mon Sep 12 01:50:53 2011 +0300 @@ -138,7 +138,7 @@ set urllog_tlds [split $urllog_tlds ","] set urllog_httprep [split "\@|%40|{|%7B|}|%7D|\[|%5B|\]|%5D" "|"] -set urllog_html_ent [split "‏||—|-|‪||‬||‎||å|å|Å|Å|é|é|:|:|ä|ä|ö|ö|ä|ä|ö|ö| | |-|-|”|\"|“|\"|»|>>|"|\"|ä|ä|ö|ö|Ä|Ä|Ö|Ö|&|&|<|<|>|>|ä|ä|ö|ö|Ä|Ä" "|"] +set urllog_html_ent [split "—|-|‏||—|-|‪||‬||‎||å|Ã¥|Å|Ã…|é|é|:|:| | |”|\"|“|\"|»|>>|"|\"|ä|ä|ö|ö|Ä|Ä|Ö|Ö|&|&|<|<|>|>" "|"] ### Require packages package require sqlite3 @@ -474,17 +474,19 @@ if {[llength $umatches] > 0} { set uencoding [lindex $umatches 1] if {[string length $uencoding] > 3} { + regsub -nocase "-" $uencoding "" uencoding set uconvert 1 } } + if {$uconvert == 0} { + set uencoding "iso8859-1" + } set umatches [regexp -nocase -inline -- "(.\*\?)" $udata] if {[llength $umatches] > 0} { set urlTitle [lindex $umatches 1] - if {$uconvert != 0} { - if {[catch {set urlTitle [encoding convertfrom $uencoding $urlTitle]} cerrmsg]} { - urllog_log "Error in charset conversion: $cerrmsg" - } + if {[catch {set urlTitle [encoding convertfrom $uencoding $urlTitle]} cerrmsg]} { + urllog_log "Error in charset conversion: $cerrmsg" } set urlTitle [urllog_convert_ent $urlTitle] regsub -all "(^ *| *$)" $urlTitle "" urlTitle