cs161 Lecture 25: Content Distribution Networks (Coral) Users want data, not machines Want a content-addressable network Ground up, route based on names. Hierarchal (lots of names!) Flat (Chord, p2p in general) Use existing points of indirection Load balancing DNS tricks (akamai) HTTP redirect tricks (coral (which also does DNS)) What's the high-level problem to be solved? You have cooperating caching proxies scattered over the Internet. Direct browser to nearest cached copy. If not cached nearby, fetch from real server into a nearby cache. Why is this helpful? Might reduce server load. Might reduce delay visible to user. Doesn't Akamai already solve this problem? They use extensive network knowledge For everyone, without website mods How might you use chord? 1 Simplistic - just moves the hotspot 2 put(url+num, content) 3 blocks, not whole files 4 cache along the lookup 5 coral also uses locality What are the constraints that make it hard? No support from browser. No support from final server. What tools are available? We only get to see DNS and HTTP requests. Assuming "Coralized" names like www.cnn.com.nyucd.edu What can we achieve with just a bunch of DNS servers for nyucd.edu? Browser probably chooses a random DNS server. That DNS server can send the browser an A record for one of the proxies? But which one? Idea 1: if DNS server is close (low ping time) to browser, then DNS server can return any proxy close to the DNS server. So we'd want to somehow cause browser to use nearby Coral DNS server. Idea 2: build a database mapping IP net numbers to nearby proxies, each proxy registers its net number, then DNS server looks up browser's IP net number to find proxy. What about browsers not on the same net as a proxy? Might still be nearby proxy. How does Coral cause browser to use a nearby Coral DNS server? L2.L1.L0 trick to have one chance per hierarchy level My giving a response for, say, L1.L0.nyud.net, client "locks in" nodes(level,count,target) to find good "next" DNS server traceroute and hints in DHT to implement nodes() join bigger clusters (oscillations defeated by preference function) why CAN'T we get directed to a proxy that has the content? How does Coral find a nearby cached copy of a URL? Inserts/Deletes (search the high level clusters first (no penalty)) "sloppy" part gets rid of hot spots hierarchal part gives you nearby nodes What does Coral store in the DHT? router IP addresses (found w/ traceroute) -> nearby proxy 24-bit IP prefixes -> nearby proxy URL -> proxy If browser is at Brown, and nearest proxy is at RISD, will we find it? 5 hops to www.cnn.com takes us to... Does Coral handle flash crowds (very popular URLs) well? What might go wrong? Every proxy fetches the URL direct from server. DHT hot-spots. What does Coral do about it? What DHT techniques did they use? Hierarchy for locality. Why don't they just cache along the path? How do they choose clusters?