Alter.Org.UA
 << Back Home UK uk   Donate Donate

url_rewrite -- Win32 helper for Squid storeurl_rewrite_program

About

As it was already discussed in YouTube caching article, different ULRs may often point to same resources. Squid has special option storeurl_rewrite_program, which points to special helper programm, which tells Squid whether given URL points points to already cached data with other name or not. It helps to reduce unnecessary network requests. This programm generates resource-unique key for each URL by extracting significant part. We can treat it like a kind of URL normalization. The main idea is that each resource has own unique key, regardless of URL deviations and invariants. It is necessary to use this technique for sites having mirrors and/or using CDNs. Also, it is necessary for sites, those generate references to resources with extra (but not significant) client information.

Examples

http://vec04.maps.yandex.net/tiles?l=map&v=2.39.0&x=1197&y=693&z=11&lang=uk_UA
http://vec01.maps.yandex.net/tiles?l=map&v=2.39.0&x=1197&y=693&z=11&lang=uk_UA
http://vec01.maps.yandex.net/tiles?l=map&x=1197&y=693&z=11&lang=uk_UA&v=2.39.0

http://r12---sn-5hn7sb7k.c.youtube.com/videoplayback?algorithm=.....
...&id=d2e4d35a7a16e4c9&ip=162.25.15.93&....
http://r14---sn-57re8b9k.d.youtube.com/videoplayback?algorithm=.....
...&id=d2e4d35a7a16e4c9&ip=205.2.151.19&....

Usage

Unpack archive and copy url_rewrite.exe to C:\Squid\libexec

For better performance we should not pass all requests to helper programm. We should keep list of resources those need normalization. Each acl must come as single line without breaks. For better performance it is recommended to use simple regexps here.

c:\Squid\etc\squid.conf
acl store_rewrite_list url_regex -i \.youtube\.com\/get_video\?
acl store_rewrite_list url_regex -i \.youtube\.com\/videoplay.*
acl store_rewrite_list url_regex -i \.youtube\.[a-z][a-z]\/videoplayback
acl store_rewrite_list url_regex -i \.youtube\.[a-z][a-z]\/get_video\?
acl store_rewrite_list url_regex -i \.googlevideo\.com\/videoplayback\?
acl store_rewrite_list url_regex -i \.googlevideo\.com\/get_video\?
acl store_rewrite_list url_regex -i \.google\.com\/videoplayback
acl store_rewrite_list url_regex -i \.google\.com\/get_video\?
acl store_rewrite_list url_regex -i \.google\.[a-z][a-z]\/videoplayback
acl store_rewrite_list url_regex -i \.google\.[a-z][a-z]\/get_video\?
acl store_rewrite_list url_regex -i \.ytimg\.com\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)
acl store_rewrite_list url_regex -i \.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\/(videoplayback|videoplay|get_video)\?
acl store_rewrite_list url_regex -i (kh|mt)(.?)\.google\.com
acl store_rewrite_list url_regex -i maps\.googleapis\.com
acl store_rewrite_list url_regex -i mt\d+.googleapis\.com
acl store_rewrite_list url_regex -i vec([0-9]+)\.maps\.yandex\.net\/tiles
acl store_rewrite_list url_regex -i pvec([0-9]+)\.maps\.yandex\.net
acl store_rewrite_list url_regex -i static\.video\.yandex\.net\/(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4|wmw|avi|mpg|mpeg)
acl store_rewrite_list url_regex -i tub-ua\.yandex\.net
acl store_rewrite_list url_regex -i s\d+\.dotua\.org\/fsua_items.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)
acl store_rewrite_list url_regex -i fbcdn\.net.*\.(jpg|jpeg|gif|png|ico)
acl store_rewrite_list url_regex -i cdn(.?)/[0-9a-zA-Z_-]*.?\.flv
acl store_rewrite_list url_regex -i st(.*)\.userapi\.com
acl store_rewrite_list url_regex -i ecn\.dynamic\..*\.tiles\.virtualearth\.net\/comp
acl store_rewrite_list url_regex -i static\.video\.yandex\.ru\/swf
acl store_rewrite_list url_regex -i video\.meta\.ua\/players
acl store_rewrite_list url_regex -i \.vkadre\.ru\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)
acl store_rewrite_list url_regex -i (st|cs)(.?)\.vk\.me\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)
acl store_rewrite_list url_regex -i img\d+.slando\.ua\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)
acl store_rewrite_list url_regex -i .*s\d*\.staticclassifieds\.com\/static

Define rewrite rules in separate file. Config uses Perl regexps. You can find sample rewrite.pl in archive. Given sample config is generated from perl helper script, used under unix-systems for same purposes.
Format @<regexp>@<replacement>@
If you are going to update rules, don't forget to update store_rewrite_list too.

c:\Squid\etc\rewrite.pl
        =~s@^http://(.*/videoplayback\?.*redirect=yes&.*)@http://$1@){}
        =~s@^http://(.*)/videoplayback\?(.*)&id=([a-zA-Z\d\-_]+)&.*(&range=\d*-\d*)&.*@squid://videos.youtube.INTERNAL/ID=$3$4@){}
        =~s@^http://(.*)/videoplayback\?(.*)&id=([a-zA-Z\d\-_]+)&.*@squid://videos.youtube.INTERNAL/ID=$3@){}
        =~s@^http://(.*)\.ytimg\.com\/(.*)@squid://ytimg.INTERNAL/ID=$2@){}
        =~s@^http://(.*)/videoplay\?(.*)&id=([a-zA-Z\d\-_]+)&.*(&range=\d*-\d*)&.*@squid://videos.youtube.INTERNAL/ID=$3$4@){}
        =~s@^http://(.*)/videoplay\?(.*)&id=([a-zA-Z\d\-_]+)&.*@squid://videos.youtube.INTERNAL/ID=$3@){}
        =~s@^http://(.*)/videoplay\?(.*)&id=([a-zA-Z\d\-_]+)$@squid://videos.youtube.INTERNAL/ID=$3@){}
        =~s@^http://(.*)/get_video\?(.*)video_id=([a-zA-Z\d\-_]+)&.*@squid://videos.youtube.INTERNAL/ID=$3@){}
        =~s@^http://(.*)/get_video\?(.*)video_id=([a-zA-Z\d\-_]+)@squid://videos.youtube.INTERNAL/ID=$3@){}
        =~s@^http://(.*)rapidshare(.*)/files/(.*)/(.*)/(.*)@squid://files.rapidshare.INTERNAL/$5@){}
        =~s@^http://(.*)fbcdn\.net\/(.*)\/([sp]200x200)/([a-zA-Z\d\-_]*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)@squid://files.facebook.INTERNAL/sz$4.$5@){}
        =~s@^http://(.*)fbcdn\.net\/(.*)\/([sp]\d*x\d*/[a-zA-Z\d\-_]*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)@squid://files.facebook.INTERNAL/$3.$4@){}
        =~s@^http://(.*)fbcdn\.net\/(.*)\/(.*)/([a-zA-Z\d\-_]*)\.(jpg|jpeg|gif|png|ico|mp3|flv)@squid://files.facebook.INTERNAL/$4.$5@){}
        =~s@^http://(.*)fbcdn\.net/(.*)\/([a-zA-Z\d\-_]*)\.(jpg|jpeg|gif|png|ico|mp3|flv)@squid://files.facebook.INTERNAL/$3.$4@){}
        =~s@^http://(kh|mt)(.*)\.google\.com/(.*)\/(.*)(&s=.*)@squid://$1.google.maps.INTERNAL/$3/$4@){}
        =~s@^http://(kh|mt)(.*)\.googleapis\.com/(.*)\/(.*)(&s=.*)@squid://$1.googleapis.maps.INTERNAL/$3/$4@){}
        =~s@^http://st(.*)\.userapi\.com/(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)@squid://userapi.com.INTERNAL/$2.$3@){}
        =~s@^http://contenidos2(.*)/(.*)@squid://files.contenidos2.INTERNAL/$2@){}
        =~s@^http://cdn(.*)/([0-9a-zA-Z_-]*?\.flv)@squid://files.cdn.INTERNAL/$2@){}
        =~s@^http://web.vxv.com/data/media/(.*)@squid://files.vxv.INTERNAL/$1@){}
        =~s@^http://ecn\.dynamic\.t(\d*).tiles\.virtualearth\.net\/comp/ch/(.*)@squid://bing.maps.INTERNAL/$2@){}
        =~s@^http://(.*)megaupload\.com/files/(.*)/(.*)@squid://files.megaupload.INTERNAL/$3@){}
        =~s@^http://(.*)mediafire\.com/(.*)/(.*)@squid://files.megaupload.INTERNAL/$3@){}
        =~s@^http://(.*)depositfiles\.com/(.*)/(.*)/(.*)@squid://files.megaupload.INTERNAL/$4@){}
        =~s@^http://(.*)\.files\.youporn\.com\/(.*)\/([0-9a-zA-Z_-]*?\.flv)\?.*@squid://videos.youporn.INTERNAL/$3@){}
        =~s@^http://(.*)\.files\.youporn\.com\/(.*)\/([0-9a-zA-Z_-]*?\.flv)@squid://videos.youporn.INTERNAL/$3@){}
        =~s@^http://(.*)\.tube8\.com\/(.*)\/([0-9a-zA-Z_-]*?\.flv)\?.*@squid://videos.tube8.INTERNAL/$3@){}
        =~s@^http://(.*)\.tube8\.com\/(.*)\/([0-9a-zA-Z_-]*?\.flv)@squid://videos.tube8.INTERNAL/$3@){}
        =~s@^http://(.*)megaporn\.com\/files\/(.*)\/(.*)@squid://files.megaporn.INTERNAL/$3@){}
        =~s@^http://static\.video\.yandex\.ru\/swf\/.*&r=(\d*).*@squid://video.yandex.INTERNAL/$1@){}
        =~s@^http://vec0\d\.maps\.yandex\.net\/tiles\?l=map&v=([\d\.]*)&x=(\d*)&y=(\d*)&z=(\d*)&g=.*@squid://vec.yandex.INTERNAL/l=map&x=$2&y=$3&z=$4&lang=uk_UA@){}
        =~s@^http://vec0\d\.maps\.yandex\.net\/tiles\?l=map&v=([\d\.]*)&x=(\d*)&y=(\d*)&z=(\d*)&lang=(.*)@squid://vec.yandex.INTERNAL/l=map&x=$2&y=$3&z=$4&lang=$5@){}
        =~s@^http://\d+\.pvec\.maps\.yandex\.net\/(.*)@squid://pvec.yandex.INTERNAL/$1@){}
        =~s@^http://static\.video\.yandex\.net\/(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4|wmw|avi|mpg|mpeg)\?*.*@squid://static.video.yandex.yandex.INTERNAL/$1.$2@){}
        =~s@^http://video\.meta\.ua\/players\/video\/.*fileID=([\da-zA-Z]*)&.*@squid://video.meta.ua.INTERNAL/$1@){}
        =~s@^http://s\d+\.dotua\.org\/fsua_items(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)@squid://dotua.org.INTERNAL/fsua/$1.$2@){}
        =~s@^http://img\d+.slando\.ua\/(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)@squid://img.slando.ua.INTERNAL/$1.$2@){}
        =~s@^http://(.*)\.s\d*\.staticclassifieds\.com\/static(.*)@squid://staticclassifieds.com.INTERNAL/$1/$2@){}
        =~s@^http://(st|cs)\d+\.vk\.me\/(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4).*@squid://$1.vk.me.INTERNAL/$2.$3@){}
        =~s@^http://video\d*\.vkadre\.ru\/assets\/(.*)\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)@squid://vkadre.ru.INTERNAL/$1.$2@){}

Declare use of helper programm only for listed URLs. Since we work under Windows, double slash should be used in helper programm paramenters to specify path to rewrite rules.

storeurl_access allow store_rewrite_list
storeurl_access deny all
storeurl_rewrite_program C:/Squid/libexec/url_rewrite.exe C:\\Squid\\etc\\rewrite.pl
storeurl_rewrite_children 5
storeurl_rewrite_concurrency 10 # since v1.3

Also, it is necessary to enable forced caching for the following resources

c:\Squid\etc\squid.conf
#	usage: refresh_pattern [-i] regex min percent max [options]
#	The refresh_pattern lines are checked in the order listed here.
refresh_pattern ^ftp:		1440	20%	10080
refresh_pattern ^gopher:	1440	0%	1440
refresh_pattern youtube.*videoplay  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern youtube.*get_video  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern google.*videoplay   14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern googlevideo.*get_video  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern ytimg\.com\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern (mt|kh|pap).*\.google\.com  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern (mt|kh|pap).*\.googleapis\.com  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern s\d+\.dotua\.org\/fsua_items.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern .*static\.video\.yandex\.ru\/swf\/.*&r=.*  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern vec.*\.maps\.yandex\.net\/tiles\?	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern static.*\.maps\.yandex\.	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern pvec.*\.maps\.yandex\.net	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern lrs\.maps\.yandex\.net\/tiles\?		14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern yandex\.st\/.*(jpg|jpeg|gif|png|ico|mp3|flv|mp4)		14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern static\.video\.yandex\.net\/.*(jpg|jpeg|gif|png|ico|mp3|flv|mp4).*		14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern .*ecn\.dynamic.*\.tiles\.virtualearth\.net\/comp   14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern fbcdn\.net.*\.(jpg|jpeg|gif|png|ico|mp3|flv)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern static\.ak\.fbcdn\.net.*\.(jpg|jpeg|gif|png|ico|mp3|flv)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern (st|cs)\d+\.vk\.me\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern img\d+.slando\.ua\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern .*s\d*\.staticclassifieds\.com\/static	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern \.vkadre\.ru\/assets\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern .*\.(css)$	1440	90%	1440 ignore-no-cache override-expire override-lastmod ignore-private
refresh_pattern .*\.(js)$	1440	90%	1440 ignore-private
refresh_pattern -i (/cgi-bin/|\?) 10	20%	120
refresh_pattern .		10	20%	4320
#	see also refresh_pattern for a more selective approach.

Restart Squid and enjoy.

Download

Ready to use.
No additional modules/libraries required. Statically linked with PCRE 7.2.

squid_url_rewrite_v1d.rar/tgz (113.1 Kb/168.9 Kb)

History

Added -v option. Enables verbose output on init. Since now no extra messages are printed when reading config.

squid_url_rewrite_v1d.rar/tgz (113.1 Kb/168.9 Kb)
2013.09.24

Added storeurl_rewrite_concurrency support, see squid.conf and Squid Wiki

squid_url_rewrite_v1c.rar/tgz (113 Kb/168.8 Kb)
2013.09.24

Fixed bug with "large" ruleset (more than 32) support.

squid_url_rewrite_v1b.rar/tgz (110.6 Kb/166.4 Kb)
2013.09.01

The 1st version. Appeared, that it cannot handle more than 32 rules in config.

squid_url_rewrite_v1.rar/tgz (109.7 Kb/165.5 Kb)
2012.12.31

See also:



Please, send your comments and propositions here: FB or mail alterX@alter.org.ua (remove X)   Share

designed by Alter aka Alexander A. Telyatnikov powered by Apache+PHP under FBSD © 2002-2024