Alter.Org.UA  
 << Back Home RU ru   Donate Donate www/www1/www2

HTTP Caching

Intro

In general, effective cache reduce page load time. It was already discussed here, but the main idea is that even on very good free channels server response time become significant. Ping 30ms cause load of 50 icons to take about 1.5 seconds. When local cache (Proxy) is used, it would happen almost imemdiately. If I use ISP proxy with ping 1ms - it will take me for 50ms, also almost immediately. So, it is reasonable to cache content on client side or on intermediate server, close to the client. Also, local proxy helps a lot on slow channels (GPRS, DSL). One more benefit - it is possible to filter out undesirable content (e.g. banners), this is rather important on slow channels too. See Local Proxy.

ISP-level content caching gives fast access to relatively slow, overloaded and simply distant resources with high ping. In contrast to client, it is unnececessary for ISP to cache all HTTP traffic. It would be much more efficient to distinguish between static or rarely updated resources and dynamic. Dynamic content may go free. It should not affect ISP cache porformance. Cached static content would save bandwith, especially on popular video/audio from youtube, rutube, vkontakte, etc.

And still one question is what to cache. How to aggregate different URIs pointing to the same content. Some days ago I've performed a little investigation about caching youtube video.

HTTPS caching

HTTPS is intended for relatively secure transfer of private data. Relatively - because state security officers have ability to read. However, it is not simple. Actually there are not too much really private data in the content. Passwords, private messages, anonymous posts really are. But banners, service java-scripts, design elements, published pictures/video/audio seems do not need encryption. They are originally public. HTTPS may help with validation. We may be sure, that data was not modified by the man in the middle. The same time HTTPS definitly prevents effective caching.

  • for a long time protocol denied caching of HTTPS data even on client side. So, older versions of browsers can't cache HTTPS at all, and many newer one can after special tuning.
  • there is no way for ISP to effectively perform transparent caching.
  • caching of encrypted content seems to be not reasonable, because cache-management ant time-to-live options are not available.
  • data compression in inefficient on encrypted data. Encryption must be performed after compression.

It would be reasonable to separate private and public content and use https and http respectively. Also, it is possible to send unencrypted content checksum over https. There are some difficulties. Most browsers do not like mixed content. And there is no defined way to transfer checksums.

I like the way Facebook did it. Authorization and personal settings are carried out over HTTPS, all the rest over HTTP.

Caching YouTube online video

I tried to discover why YouTube video is not cached. Seems, that it should because each time I play clip, received content is always same. Squid is configured to cache requests with GET parameters (like http://test.net/data.php?id=12345). But video is not cached and downloaded each time. Appeared that

  • several GET parameters are unique for each request. Some statistic and client information it passed there.
  • each time different storage-server is used.
Practically, we need some more complicated (e.g. regexp-based) resource identification methods. For example youtube URI looks like:
http://r12---sn-5hn7sb7k.c.youtube.com/videoplayback?algorithm=throttle-factor&
burst=40&cp=U0hUSlZMVl9NU0NONF9ORlpHOjM0RE5Ea0FhNzh0&expire=1355618689&factor=1.25&
fexp=922401%2C920704%2C912806%2C925703%2C928001%2C922403%2C922405%2C929901%2C913605
%2C913546%2C913556%2C920201%2C913302%2C919009%2C914903%2C911116%2C910221%2C901451%2C902556&
gcr=ua&id=d2e4d35a7a16e4c9&ip=62.205.155.193&ipbits=8&itag=35&key=yt1&ms=au&mt=1355594472
&mv=m&newshard=yes&
signature=B09349257DBD3FB48F1F4EC0C062F69E6311B60D.AB3F761505B087D3C0A56C250DEDE2EFB809F228&
source=youtube&sparams=algorithm%2Cburst%2Ccp%2Cfactor%2Cgcr%2Cid%2Cip%2Cipbits%2Citag%2Csource
%2Cupn%2Cexpire&sver=3&upn=kUlkGtNR8a4&ptk=youtube_multi&cpn=CYOwEfaq23k1BTh7
but actually video data is identified by the following regexp
*.youtube\.com\/videoplayback\?.*&id=(\d*)&.*

There is url_rewrite_program option in squid.conf. It allows to analyze requested URI with external module and redirect to prevoiusly used source if necessary. This should improve cache hit probability. Looks like it is possible to write module which extracts actual resource "identificator" from URI and use previous URI with same "identificator" instead of new. However, there is a small probles - old URI may expire. Also, cache management flags may prevent resource from caching at all. Such case cannot be handled with single url_rewrite_program. Seems, we need some plugin which can analyze and adjust both URI and caching options.

Also, Squid 2.7 has similar option - storeurl_rewrite_program. It almost solves our task. External module analyzes URI and generates so called resource identifier. It is used to identify object inside local cache.

Note: there is no such option in Squid 3.x. There is no estimation for implementation. It is also difficult to port code, since 3.0 was completly rewritten in C++ with wide use of classes. Also note, that 2.x doesn't support IPv6, while 3.x does.

Almost ready to use code from nabble.com by Lucas Diaz

When I tried to use this, appeared that some changes are needed in squid.conf to force videostreams and pictures caching. Many sites uses cacne management options preventing caching.

#	usage: refresh_pattern [-i] regex min percent max [options]
#	The refresh_pattern lines are checked in the order listed here.
refresh_pattern ^ftp:		1440	20%	10080
refresh_pattern ^gopher:	1440	0%	1440
refresh_pattern youtube.*videoplay  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern youtube.*get_video  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern google.*videoplay   14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern googlevideo.*get_video  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern ytimg\.com\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern (mt|kh|pap).*\.google\.com  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern (mt|kh|pap).*\.googleapis\.com  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern s\d+\.dotua\.org\/fsua_items.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern .*static\.video\.yandex\.ru\/swf\/.*&r=.*  14400   90%     24400   ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern vec.*\.maps\.yandex\.net\/tiles\?	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern static.*\.maps\.yandex\.	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern pvec.*\.maps\.yandex\.net	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern lrs\.maps\.yandex\.net\/tiles\?		14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern yandex\.st\/.*(jpg|jpeg|gif|png|ico|mp3|flv|mp4)		14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern static\.video\.yandex\.net\/.*(jpg|jpeg|gif|png|ico|mp3|flv|mp4).*		14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private ignore-auth
refresh_pattern .*ecn\.dynamic.*\.tiles\.virtualearth\.net\/comp   14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern fbcdn\.net.*\.(jpg|jpeg|gif|png|ico|mp3|flv)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern static\.ak\.fbcdn\.net.*\.(jpg|jpeg|gif|png|ico|mp3|flv)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern (st|cs)\d+\.vk\.me\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern img\d+.slando\.ua\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern .*s\d*\.staticclassifieds\.com\/static	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern \.vkadre\.ru\/assets\/.*\.(jpg|jpeg|gif|png|ico|mp3|flv|mp4)	14400	90%	20080 ignore-no-cache override-expire override-lastmod ignore-reload ignore-private
refresh_pattern .*\.(css)$	1440	90%	1440 ignore-no-cache override-expire override-lastmod ignore-private
refresh_pattern .*\.(js)$	1440	90%	1440 ignore-private
refresh_pattern -i (/cgi-bin/|\?) 10	20%	120
refresh_pattern .		10	20%	4320
#	see also refresh_pattern for a more selective approach.

cache_mem 400 MB
maximum_object_size_in_memory 4 MB
maximum_object_size 32 MB
read_ahead_gap 64 KB
range_offset_limit -1

and add to rewrite.pl rules for name-based object joining.


One more solution: yt-cache

Results

URL Joining efficiency (rewrite.pl)
HostJoin-hitsHitsTotalEfficiency, %
videos.youtube.INTERNAL18181735368150
vk.me.INTERNAL8840148918
files.facebook.INTERNAL33127550
vec.yandex.INTERNAL018022390
  • Host - Internal name of joined host group.
  • Join-hits - New URLs those could be associated with already cached objects.
  • Hits - URL was previously cached.

Heroes and anomalies

Domain Cache/Refresh
traffic
Cache/Refresh
hits
Total trafficTotal hits Traffic
Efficiency, %
Hit
Efficiency, %
Original cache
efficiency
Weight
sb.google.com ~1Gb880 000 ~2Gb930 000 95560% 48%

Domain Blocked
traffic
Blocked
hits
Total trafficTotal hits Traffic
Efficiency, %
Hit
Efficiency, %
Original cache
efficiency
local.com.ua ~4Mb3270 ~5Mb3340 00

Domain Traffic
Efficiency, %
Hit
Efficiency, %
Original cache
efficiency
Weight
cs*.vk.me 23370% 88%
youtube.com 78940% 4%
st*.vk.me 99980% 1%

Domain Traffic
Efficiency, %
Hit
Efficiency, %
Original cache
efficiency
Weight
vec*.maps.yandex.net 4544 1%
api-maps.yandex.ru 4544 1%
photos-*.fbcdn.net 21 1%
www.google.com 1077% 1%
en.wikipedia.org 1522%
video*.vkadre.ru 75500% 1%
platform.twitter.com 753737%

History

Squid-related topics are moved to separate page.

2013.09.16

The 1st version.

2012.12.16


2012.12.16, updated 2013.09.16

See also:




Mail to alterX@alter.org.ua (remove X)  
<< Back designed by Alter aka Alexander A. Telyatnikov powered by Apache+PHP under FBSD © 2002-2017