CDN is a network topology used to distribute content. Before we can fully understand it, we need to understand the other two terms.
1. Node
Before users use a CDN network, CDN providers will deploy multiple nodes nationwide/globally. The nodes here can be considered as computer rooms or server clusters, and are professionally referred to as PoP (Point of Presence) or Origin Server. Each node is used to serve users around it, and because of its proximity, its response time is short. In addition, the node is a cluster composed of many servers, so it can withstand huge traffic.
2. Source Server
The source server is the server where you deploy your website, which provides initial content for the entire CDN network. Without using CDN, every request from a user will reach your source server. The farther the distance between the user and the source server, the more packets are forwarded, and the longer it takes to wait for the page to load. In addition, each forwarding of data packets may be affected by network congestion, further delaying the loading time of web pages.
Especially when visiting overseas websites, due to the long distance and network congestion, users often need to wait for more than 3 seconds, which is unbearable for all users. If your website uses CDN acceleration, when a user requests your website, CDN will find the node closest to the user and see if the user’s desired content is cached in this node. If the node caches the user’s desired content, it will directly send it to the user without having to request the source server; If the node does not cache the content the user wants, the node will automatically request the source server to obtain the desired content, and then send it to the user; At the same time, the node will also cache the content obtained from the source server. The next time a user requests the same content, the node will directly send it to the user instead of requesting the source server. “Only when a user requests content for the first time, the node will access the source server. Later, users (including new and old users) will request the same content. Because the node has already cached it, it will directly send it to the user and will not access the source server again. At this point, the source server will be” blocked “and it will not know that there is any content requested by the user.”.
The process by which a page pulls content from the source server, commonly known as a Reverse Proxy, requests content from the server cluster closest to you. This speed is extremely fast, usually in seconds. Currently, more than 70% of global traffic is provided by CDNs, and this proportion is still rising rapidly.
Now let’s take a look at the term “content distribution network”, which actually includes three aspects of content:
- Content
CDNs are used to process content. What content? This includes dynamic/static web pages, CSS, JavaScript, images, videos, audio, downloadable files, and more.
- Distribute
CDN distributes content to nodes. How to distribute? When a node needs content, it automatically pulls it from the source server, which is also known as a reverse proxy.
- Network
From an internal perspective, CDN is a network topology, or a private network/local area network. Users only exchange data with nodes and do not know the internal network structure of the CDN. Only the institutions that build the CDN know it. Generally speaking, CDN is a private network built by institutions, which can distribute content deployed by users on the source server to various nodes, thereby improving the response speed of websites. Users and webmasters do not need to know the internal topology of CDN, but can enjoy the benefits brought by CDN.
Although the principle of CDN is very simple, its internal architecture is actually very complex, involving many scientific research challenges such as multi-level caching, load balancing, big data processing, distributed storage, health monitoring, hot swapping, and so on.
Benefits of using CDN
- Increase website access speed
This is the original intention of CDN, and it is also the main role of CDN. CDN can greatly or exponentially improve website access speed, which is more obvious for users who are far away from the source server.
- Reduce server pressure
Almost 90% of the traffic is taken over by edge nodes, and only when content updates or cache expiration occurs does the source server need to be accessed, which reduces the pressure on the source server and allows inexpensive configurations to sustain significant traffic.
- Enable national/global user access to the website
CDN nodes are spread across the country/globe, and overseas users or users in remote areas can also access your website. Without using CDN, some regions may not be able to access your website due to network reasons.
- Improve server security
The user requests to reach the outermost layer of the CDN network first. The user does not know where the source server is, and the location of the source server is blocked by the CDN network. You cannot attack without knowing the location of the source server.
In addition, CDN networks typically come with their own firewalls or security measures, and even if hackers attempt rough DDoS attacks, they cannot overwhelm websites. CDN will detect such attacks and shield suspicious IPs. Even if IP is not shielded, with the large clustering and load balancing capabilities of CDN networks, DDoS traffic cannot overwhelm CDN.
- Load balancing
CDN comes with a load balancing function that eliminates the need to worry about sudden peak traffic, and your source server will not feel pressure.
- 7 * 24-hour service
CDN has cached the content of the website, allowing users to access the website even if the source server goes down.
- Reduce expenses
To cope with peak traffic, you must rent a powerful server and purchase sufficient bandwidth, which is a significant expense. In most cases, the server configuration is excessive, and you need to pay multiple times the cost to cope with extreme scenarios. With CDN, you can reduce the server configuration to a very low level. CDN can generally pay as much as you want, and you can pay as much as you consume.
- Lower maintenance cost
Unstable websites and frequently attacked websites not only annoy users, but are also unpopular with search engines. Many small companies or individual webmasters do not have the ability to ensure the safety of their websites, and the servers that run naked are taken down every minute. CDN hides your server origin, making it difficult for hackers to find the source. CDN can also help you resist rough DDoS attacks. Without using CDN, DDoS attacks are basically unsolvable.
How to structure a website
A reasonable architecture makes the website more suitable for CDN, reducing costs while also improving security. If your website was not considered using CDN at the beginning of development, you may need to make some adjustments.
Generally speaking, there are two main principles for websites with a suitable architecture for CDN:
1. Dynamic and static separation
Separate static and dynamic content from a website.
Static content refers to content that does not change with user behavior, such as articles, product introductions, etc. These content are the same regardless of whether the user logs in or not. Dynamic content, on the other hand, changes with user behavior, such as user information, message lists, favorite buttons, and so on.
A page often mixes static and dynamic content, and we can use JavaScript to load dynamic content on the page.
To reduce the number of requests and improve the SEO effect, you can treat user comments, clicks, and likes as static content, as long as you refresh the CDN cache regularly or according to rules.
The CDN cache can have an expiration time set or can be actively refreshed. Generally, CDNs support manual refresh or API refresh (programmatic refresh).
In addition, for security reasons, it is best to prepare two servers, one for placing static content and enabling CDN, and the other for placing dynamic content without enabling CDN. Servers that place dynamic content are more vulnerable to attacks, and even if the dynamic content server hangs up, it does not affect access to static content.
Since you use different servers, you should also set different domain names for static and dynamic content.
2. Resource file separation
Resource files refer to files whose content rarely changes, such as images, videos, compressed packages, and so on.
Due to many reasons such as website template changes, content changes, user comments, and placement of advertisements, static content needs to be updated frequently, which requires refreshing the CDN cache. We can usually refresh the CDN cache for a URL, directory, or domain name.
Especially when refreshing the CDN cache for a domain name, if the resource file and static content are located under the same domain name, the cache for all resource files will also be refreshed. The volume of resource files is often several times that of static content. Refreshing resource files not only consumes a lot of traffic, but also imposes considerable pressure on the source server.
If you view user comments, clicks, and favorites as static content in order to improve SEO results, you need to frequently refresh all static content under the domain name, and the resource file will also be refreshed at this time.
To prevent resource files from being refreshed, I highly recommend setting a new domain name for the resource file, which only stores resource files. However, enabling a new domain name for a resource file can increase development costs. Readers are advised to weigh this against themselves