Amazon On AWS

Gagan Kr.
8 min readSep 22, 2020

Netflix is the world’s leading streaming entertainment service with 193 million paid memberships in over 190 countries enjoying TV series, documentaries and feature films across a wide variety of genres and languages. Members can watch as much as they want, anytime, anywhere, on any internet-connected screen. Members can play, pause and resume watching, all without commercials or commitments.

Netflix was originally a DVD shipping business where they would send out DVDs of your chosen programs to you. This was going well until 2008 where they experienced a major database loss and for 3 days could not ship out any DVDs to their customers. That was when the senior management at Netflix realized that they had to shift from continuous vertical scaling which leads to single points of failure to a more reliable and scalable horizontal scaling system. They chose Amazon Web Services despite having Amazon as a competitor (Amazon has their own streaming service known as Amazon Prime) because AWS provided them with the greatest scaling capabilities and the biggest set of available features. It took 7 years of migration for Netflix to shut down their last remaining data centres and move completely to the cloud.

Who doesn’t knows netflix in today’s world, the type of taste it provides in its shows is beyond any person’s imagination, every show,series,movie by it is new, more realistic and teeth biting, Netflix has established itself as much more than a simple streaming service for third-party content with its quickly growing library of originally produced movies and TV shows which are more realistic, teeth biting, and full of excitement. Netflix has its own studios and creates a wide range of its own branded content, many of which are highly regarded by both critics and audiences.

Netflix provides you Drama series, Documentaries , Anime Series, Comedy Series, Kids Series, Movies, Stand-Up Comedy and more flavours are to be added yet.

But, I am not going here to explain you the flavours it serve or the famous and latest upcoming web series by Netflix.But instead let me ask you a question haven’t you ever thought how it manages such a huge content which comprises of more than 13,941 titles out of which it has more than 1,500 original titles since it began producing original content in 2013.

You may say everything would be hosted in a place i.e its server and from there it serves the content. This may be the case of your college or school website or any other organisation and most of your might have got and would be very very tired searching your result during result time and banging your head sitting in front your device for hours and getting the error “Server failed to response “ or “We are facing technical issues due to server overload” and the issues list is never ending.

Well this is what such companies avoid as the traffic fluctuates also as they have have millions of customer paying or using their services for free so, they never wanna to lose their customer at any cost. And also its very challenging for startups to maintain the scalability as the web traffic may boom at once or their idea gets flopped. And their machine parts which they have bought will be useless if their idea flops and if there is sudden rise in the traffic they will go out of cash in a sudden and it takes time setup the components of server and configure them. Also a huge technical team is required to maintain these things. So, to overcome these things world is shifting towards cloud say cloud computing and absolutely Netflix is powered by the cloud of AWS(Amazon Web Services ) though there are many in market but still AWS has no competitor which alone shares 51 % of market value, its lot ahead of google in the cloud infrastructure.

Before going on how Netflix cleverly uses AWS, lets understand first what is cloud computing is . Cloud computing is a set of tools that help you to spend less time in managing and more time in enhancing your creativity.For example, you have your own server you need a pool of professional persons who will take care of this including its maintenance, security, load balancing , power and internet cost and most important the emergency backup because you have to never go down any how.So this technique was a long back and still exists known as On Premises hosting.

Later if large organisation like IBM,google share their resources for a particular time then you can use them which is known as time sharing. And then if they give you a dedicated and full support of the compute power , storage and give all sorts of flexibility it comes under the cloud. To this you can buy server say laptops and merge their computing power as and when required and also enjoy the services given by the cloud companies.

ARCHITECTURE

Netflix Realizes Multi-Region Resiliency Using Amazon Route 53

What happens when you need to move 89 million viewers to a different AWS region? Netflix’s infrastructure, built on AWS, makes it possible to be extremely resilient, even when the company is running services in many AWS Regions simultaneously. In this episode of This is My Architecture, Coburn Watson, director of performance and reliability engineering at Netflix, walks through the company’s DNS architecture — built on Amazon Route 53 and augmented with Netflix’s Zuul — that allows the team to evacuate an entire region in less than 40 minutes. Watch the video to know more

Diagram showing architecture used by Netflix

Application Monitoring on a Massive Scale

Netflix uses Amazon Web Services (AWS) for nearly all its computing and storage needs, including databases, analytics, recommendation engines, video transcoding, and more — hundreds of functions that in total use more than 100,000 server instances on AWS.

This results in an extremely complex and dynamic networking environment where applications are constantly communicating inside AWS and across the Internet. Monitoring and optimizing its network is critical for Netflix to continue improving customer experience, increasing efficiency, and reducing costs. In particular, Netflix needed a solution for ingesting, augmenting, and analyzing the multiple terabytes of data its network generates daily in the form of virtual private cloud (VPC) flow logs. This would enable Netflix to identify performance-improvement opportunities, such as identifying apps that are communicating across regions and collocating them. The company would also be able to increase uptime by quickly detecting and mitigating application downtime.

Each log record carries information about the communications between two IP addresses. However, in a dynamic environment like the one at Netflix, where an IP address can float between applications from day to day or even minute to minute, IP addresses alone don’t have much meaning. “The data sources we had before we took on this initiative were one sided,” says John Bennett, senior software engineer at Netflix. “We’d know an application was connecting to others, but we didn’t know both sides of the conversation and how to optimize those communications or the placement of the applications on the network.”

Netflix set out to establish a new data source that could give it more insight into communication among applications and regions by combining VPC flow logs with application metadata.

Centralizing Flow Logs Using Amazon Kinesis Data Streams

From the outset, AWS enabled Netflix to experiment with different approaches to analyzing its network data. “Early in the design process, the flexibility to try different ways of processing the data was important,” says Bennett. “We experimented with multiple designs and used many AWS products to get here.”

The solution Netflix ultimately deployed — known internally as Dredge — centralizes flow logs using Amazon Kinesis Data Streams. The application reads the data from Amazon Kinesis Data Streams in real time and enriches IP addresses with application metadata to provide a full picture of the networking environment. “Usually, we would put the data into a database, which would build an index to enable faster querying,” says Bennett. “Dredge joins the flow logs with application metadata as it streams and indexes it without using a database, which eliminates a lot of the complexity.”

The enriched data lands in an open-source analytics application called Druid. Netflix uses the OLAP querying functionality of Druid to quickly slice data into regions, availability zones, and time windows to visualize it and gain insight into how the network is behaving and performing.

AWS was the logical choice for Dredge in part because the data was already resident in the AWS Cloud. “It would have been daunting to publish, stream, and consume that much information from an external system such as Kafka,” says Bennett. “It took just a few API calls to centralize multiple terabytes of flow logs into Amazon Kinesis Data Streams. Now we can focus on getting insights from the data rather than simply getting access to it.”

The scalability of Amazon Kinesis Data Streams was a good fit for the Dredge application because of the cyclical and elastic nature of network usage at Netflix. “When it comes to our networking data, it’s more cost efficient to be able to scale up and down, which is not as easy to do with alternatives to Amazon Kinesis Data Streams,” says Bennett.

Improving Customer Experience with Real-Time Network Monitoring

Netflix’s Amazon Kinesis Data Streams-based solution has proven to be highly scalable, each day processing billions of traffic flows. Typically, about 1,000 Amazon Kinesis shards work in parallel to process the data stream. “Amazon Kinesis Data Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds,” says Bennett. “We can discover and respond to issues in real time, ensuring high availability and a great customer experience.”

Netflix is now able to identify new ways to optimize its applications, whether that means moving an application from one region to another or changing to a more appropriate network protocol for a specific type of traffic. “Our solution built on Amazon Kinesis enables us to identify ways to increase efficiency, reduce costs, and improve resiliency for the best customer experience,” says Bennett.

Although a streaming data solution is not new to the IT industry, it is an innovation in the networking space. “Netflix is heavily invested in AWS in part because it abstracts the underlying network, so we don’t have to deal with switches and routers,” says Bennett. “We’re monitoring, analyzing, and optimizing at a higher level of the stack — in ways we would never even consider if we were running our own data centers.”

How Netflix Encodes at Scale

In this session, Netflix explores the various strategies employed by the encoding service to automate management of a heterogeneous collection of Amazon EC2 Reserved Instances, resolve compute contention, and distribute instances based on priority and workload. The Netflix encoding team is responsible for transcoding different types of media sources to a large number of media formats to support all Netflix devices. Transcoding these media sources has compute needs ranging from running compute-intensive video encodes to low-latency, high-volume image and text processing. The encoding service may require hundreds of thousands of compute hours to be distributed at moment’s notice where they are needed most.

How Netflix Tunes Amazon EC2 Instances for Performance

Netflix uses Amazon EC2 instance types and features to create a high- performance cloud, achieving near-bare-metal speed for its workloads. This session summarizes the configuration, tuning, and activities for delivering the fastest possible Amazon EC2 instances. Brendan Gregg, a member of the performance and OS engineering team at Netflix, shows how to choose Amazon EC2 instance types, how to choose between Xen modes (HVM, PV, or PVHVM), and the importance of Amazon EC2 features such SR-IOV for bare-metal performance. He also covers basic and advanced kernel tuning and monitoring, including the use of Java and Node.js flame graphs and performance counters.

Watch the video

Conclusion

This the way how Netflix uses cleverly for his organisational purpose. Hence, the world in now shifting towards cloud for its infrastructure needs and is a must technology for every techy guy.

--

--

Gagan Kr.

Passionate towards the Cloud and DevOps tools and technologies. I love to integrate these techs. Go to the core of it and share with my audience !