What is a Content Delivery Network(CDN)?
This article deep dives into what is Content Delivery Network and how does it help in delivering the content
Introduction
CDN or Content Delivery Network is a group of geographically distributed servers that speed up the delivery of web content by bringing it closer to where users are.
“geographically distributed servers“: this means that there are servers throughout the world, for example: one CDN server individually installed in the US region, Asia region and other regions depending upon the source of your traffic.
But why do we need a CDN?
If every request for loading the website goes to the application servers, then some users who are closer to the application servers will get faster responses (or low latency) because of less distance between the request origin and hosted server location. Other users who are far away from application-hosted servers will face a bad user experience due to high latency in loading the website.
When a client sends a request to any website, then most of the content on the website is static only. For example: icons, HTML/CSS files, images, videos, and other static data that do not change based on who is requesting to load the website.
Since the website's content is NOT meant for any processing on the servers and is only meant to be returned to the end-user as it is, that’s why we propose installing a CDN closer to the user that stores all the static data, so that the website can be accessed with low latency as much as possible.
Impact?
By bringing the website’s content closer to the end consumer, things like watching a movie, playing songs, checking posts on social media, etc, can be done with low latency and provide a high-performance web experience.
Remember: low latency for serving the web traffic means more likelihood of retaining the users on the website and converting them into paid customers.
Glossary
“Origin Servers”: These servers are the source of truth for any content. They contain the original version of the content that needs to be delivered to the user. An origin server may be owned or managed by the content provider or it may be hosted using 3rd party providers like AWS S3, Google Cloud Storage, etc.
“Edge Servers“: These servers are located in multiple geographical locations throughout the world also known as points of presence (PoP). The edge servers contain the cached data from the origin server and are responsible for serving this cached content to the end user. CDN edge servers are owned or managed by CDN hosting providers like Akamai, Amazon CloudFront, Cloudflare, etc.
Edge servers can get the cached content from the origin servers in multiple ways: either through “push strategy” where origin servers push the content to be cached in regular intervals to the edge servers, or through “pull strategy” where edge servers request the content to cache from the origin servers on-demand basis.
How does a CDN work?
There are two popular ways how CDN can be incorporated into a network to serve the traffic: 1) Dynamic DNS resolution and 2) Anycast.
To keep it simple, I’ll take an example of how Dynamic DNS Resolution works through a CDN and how it helps in delivering the content to the end user’s machine. Here are the steps:
A user or client requests content by entering the website name in the browser: let’s say “www.example.com”
The client’s browser through the operating system resolver asks the configured DNS name server of the computer for the IP address of “www.example.com“. If the IP address already exists in the browser cache or the OS cache, then the content is requested from the IP address.
If the IP address is not present in the cache, the user’s request goes to the ROOT server which gives the IP address of Top-level domain servers(TLD servers), for example, .com, .net, .org, etc servers.
Then the client’s DNS nameserver’s request goes to TLD servers which gives the IP address of the “example.com“ Authoritative name servers.
If the “example.com“ Authoritative name servers have the IP address of “www.example.com”, then they return the hostname of the CDN provider “www.example.cdnprovider.com“, otherwise, return the error saying that the query cannot be answered.
Companies who want to incorporate CDN to serve traffic configure the CDN nameserver on the website’s Authoritative name servers. Thus, instead of returning the IP address directly for origin servers that host “www.example.com”, the authoritative nameservers return the hostname of the CDN provider “www.example.cdnprovider.com“
The client’s browser through the operating system resolver asks the configured DNS name server of the computer for the IP address of “www.example.cdnprovider.com“
Then the client’s DNS nameserver’s request goes to “.com“ TLD servers which gives the IP address of the “cdnprovider.com“ Authoritative name servers. This time request didn’t go to the ROOT servers, because it got cached in the user’s DNS nameserver.
The user’s request goes to the “cdnprovider.com” ANS to get the location of “www.example.cdnprovider.com“
The Authoritative name server determines which edge server is closest to the user using a geolocation database and returns the IP address of that edge server.
The user’s browser connects to the CDN’s edge server using the HTTP protocol and gets the desired content.
That is how usually the CDNs get incorporated to serve the majority of a website’s traffic.
Which CDN is closest to the user?
In the previous explanation of how content gets served to the user’s machine, the main question becomes: how does the CDN provider know which edge server is closest to the user for requesting the content?
When the user’s browser requests an IP address for “www.example.cdnprovider.com“, the authoritative nameserver ”cdnprovider.com” which belongs to the CDN provider will return an IP address that is in the same geographical region as the user’s DNS nameserver. The authoritative name server(ANS) determines the location of the user and then using IP location data tables, the ANS decides which IP address of the closest CDN edge server needs to be returned.
Benefits of CDN
Mainly, there are 4 benefits of using a CDN in your architecture:
Less Latency: Bringing content closer to the user means less distance traveled for delivery of the content to the user’s machine. This essentially means less latency required to load website pages. The biggest benefit is the high engagement of users and thus more probability of converting them into a paid customer.
Less Bandwidth Cost: Every incoming request consumes the network bandwidth of the origin server to get a successful response. CDNs through caching and other optimizations (for example: minification) deliver the content in less bandwidth and thus reduce hosting costs (on the origin servers) for website owners which is primarily due to large bandwidth.
High Availability: Even in case of abnormally high traffic, one or two failures of CDNs might not affect the entire network as traffic can be served from other CDNs and thus ensures high availability of the system.
High Security: DDoS attacks attempt to take down a service by sending a large amount of fake traffic to the website. Modern CDNs provide you with the right mechanism in place to provide enhanced security for your website.
Case Study: “Netflix Open Connect”
Open Connect is a CDN(content distribution/delivery network) developed by Netflix specifically for their users to deliver content by avoiding network congestion.
The idea is simple: Netflix installs Open Connect appliances (physical devices) to some eligible Internet Service Providers so that network congestion is reduced to a minimum. These open-connect appliances as typical CDNs contain highly popular movies and TV shows as per the region's popularity.
Usually, there are multiple copies of different resolutions being sent to the Open Connect devices. So, if a user’s internet connection is poor, the user’s device can download relatively low-resolution content from these OCAs. This is called adaptive bit-rate streaming.
Thus, you can imagine the massive amount of time saved for the DNS resolution process for the majority of the users. The impact is less network congestion for Netflix’s users and thus low latency for delivering the content to end customers. Of course, it’s a big investment to enhance customer experience, but probably it’s all worth it given the popularity of Netflix.
That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!
Book exclusive 1:1 with me here.
Resources
What are Content Delivery Networks by Akamai?
What is a CDN by Cloudflare?
What is CDN by Amazon?
Open Connect - Wikipedia
Content Delivery Networks Explained by Juan