RSS Feeds
Posts
Comments

<prev | toc | next>

Transcoding, and format conversion more generally, is a common step in many video handling applications.  Transcoding periodically becomes a “holy grail” for video delivery systems seeking to support ad insertion and dynamic program assembly.  The need to provide content in multiple formats is driven by a number of factors, including (1) advances in content representation and compression techniques, (2) content originating in a variety of both old and new formats, (3) content delivery over a variety of communications paths – sometimes even switching paths dynamically – that differ in their qualities of service, and (4) a wide range of playback devices that differ in the formats and/or format parameters they support. Solutions to this problem range from pre-computing and storing all format variants of each piece of content to storing only a single format and transcoding to a target format in real time when the content is requested.  The combinatorial effects of format and protocol proliferation lead many to assume that realtime transcoding is necessary and cost effective.  In the past, however, we have found that the economies of cheap storage greatly favor the pre-compute and store approach.  While our updated analysis below leads to a similar conclusion, we see novel ways that transcoding in the cloud may be applied.

The problem is to assess the viability of realtime transcoding in light of both cheap processing made available by cloud computing and performance gains made possible by the use of Graphics Processing Units (GPUs).  While GPU equipped machines are not currently available in the cloud, some of the applications that benefit the most from GPUs have also been moving into the cloud (visualization and analysis of sensor data sets, for example).  Thus we have included a possible future GPU enabled cloud in this analysis.

Model System Configuration

We developed a model that maps a collection of service attributes onto a target system configuration to produce a net annualized cost.  Four system configurations are considered: three configurations consistent with the Bronze, Silver, and Gold systems described for Video in the Container, and one configuration consistent with published or reported Amazon AWS capabilities.

The model AWS configuration employs one class of machines to provide access to data striped across a set of Elastic Block Store volumes and another class of machines with greater processing capability to perform transcoding.  To produce a meaningful comparison we assume that I/O performance and network throughput on AWS machines can sustain a 2Gbps peak.  While this is a premature assumption, we know even higher throughput is feasible, and the objective here is to investigate what should be possible if Amazon (or some other provider) makes I/O performance one of the selectable machine characteristics, a feature reportedly under consideration.

Figure 1 illustrates a simplified three tier system configuration.  The same basic structure is modeled both in dedicated and in AWS environments.  The first (top) tier is the storage access tier: a number of machines (or EC2 instances) with a set of direct attached storage devices (EBS volumes).  While EBS volumes are accessed over the network, the maximum size of an EBS volume is 1TB and EBS volumes can only be attached to one virtual machine at a time.  Thus for nominal operation it is reasonable to model them as direct attached storage.  At first these EBS constraints may seem onerous to constructing a system for handling such large amounts of data, but not only does it make the modeling easier, it gives the system engineer a level of control over storage access contention that they would not ordinarily get in a NAS or SAN storage subsystem.

The second (middle) tier is a transcode tier.  Not all content needs to be transcoded and the model allows for no content to be transcoded (producing an empty transcode tier), for all content to be transcoded, or for only some of the content to be transcoded – more on this latter option later.  The last (bottom) tier provides for load balancing.  Requests flow through the load balancers, but the data may travel directly from storage to clients or from storage to transcoders to clients depending upon the situation.

Figure 1 - Model System Configuration

Figure 1 - Model System Configuration

There are a number of thorny details, but the main factors driving the cost of a configuration are represented. GPUs may or may not be employed in the trancoding tier of a configuration. The processing gain provided by a GPU is expressed as a multiple relative to a dual core Xeon 2.5 GHz class machine – initially we use a gain factor of five. Further, in the Amazon environment, we assume the costs for using a dual core machine equipped with a GPU would roughly be comparable to the costs for their current 8 core machine.

Model Parameters

The main parameters of the model are

  • Number of titles in the title library
  • Average duration of a title
  • Minutes of video delivered by the system in a month

Though system designers need to ascertain a system’s peak streaming capacity, here we use the number of minutes delivered in a month which is an easier metric to use in some situations. Using this metric, we can for example, model what would be required to support an online movie service (such as Netflix). If the service made 100K titles available on-demand to over 10 million subscribers and each subscriber watched the traditional average of 4 movies per month, this would equate to 5 billion minutes (5 Bm) of content delivered per month. Translating from average or aggregate delivery to a system’s required peak delivery capacity is a fairly involved process [1].

In addition to these parameters, we initially assume that each title needs to be available in 16 different formats with a given distribution of expected usage. This far exceeds the number of formats currently supported by virtually all providers, and in this respect, the model is biased against the pre-transcode and store approach. Nevertheless, the argument can be made that even a greater number of combinations is needed to support the arbitrary mixing of different audio, video, and bit rate variations. In our experience, however, only a relatively manageable subset of combinations is used. Of course, if fewer formats are needed, the cost of storing the pre-transcoded content will go down, but only until the number of storage units needed to support the necessary storage access capacity is reached. In this arguably more realistic scenario, the nearly 10x advantage of the silver and gold versus bronze configurations becomes evident.

Preliminary Results

The hypothetical service described above postulates a future when a movie service makes a sizeable library of movie content (100K titles) available on demand to a significant population of subscribers (10 million). Netflix currently makes available something over 17,000 titles on demand and it has more than 10 million subscribers, though only a fraction of their subscribers currently use their play instantly feature [2]. Using these parameters in the model, we get the following results for the different system configurations:

Preliminary Results without GPUs

† Capital and operational expenses area included. Capital expenses are amortized over three years.

The above chart shows the annualized cost (the Y-axis) for six different system configurations: four base configurations that perform no transcoding (three dedicated bronze, silver, and gold systems and one AWS system) and two configurations that transcode all but one format (a dedicated bronze system and an AWS system both equipped with transcoding machines). The comparison is between systems that store all 16 formats of each piece of content and systems that realtime transcode all but one format into the format needed for each streaming or download event. No GPUs were used. The results show that the transcoded bronze system is ~17 times more expensive than the stored bronze system and that the transcoded AWS system is ~27 times more expensive than the stored AWS system.

With GPUs, it is predictable that the cost of transcoding, a portion of the overall system cost, will be divided by the GPU performance gain (~5x):

Preliminary Results with GPUs

This chart compares the annual cost of the same four basic system configurations that do not perform transcoding with two transcoding systems that are equipped with GPUs. Using GPUs significantly reduces the cost of systems that transcode, but it is still clear that in this case the better course of action would be to transcode the content into all required formats once and store them on either a dedicated system or a similarly configured AWS system. Though the AWS system cost more, it’s not a lot more, and it offers interesting financial, operational, and architectural flexibility (see discussion below).

GPU Effects

Many data intensive applications are seeing significant, sometimes astonishing, improvements in performance by using the parallel processing units found in graphics processing units [3]. The applications that gain the most are those that can be organized as independent concurrent fine grain data flows. In these cases, it is not unusual to see an order of magnitude, or two, increase in performance. Publicly available data concerning the application of GPUs to video processing, let alone video transcoding, is scarce. The data that does exist, along with reports from colleagues, suggests GPUs might increase the performance of video transcoding by as much as a factor of five. This is in-line with our past measurements. Though at the time we were evaluating DSPs, the same problem exists: they are good for processing baseband video and audio content, but compressed content is another matter, especially when advanced compression techniques are used. The problem is fundamental: one of the main goals of compression is to identify and remove redundancy in the underlying data set [4]. If a compressed stream could be organized as a number of independent concurrent fine grain data flows, then by definition it would not be as compressed as it could be.

Cloud Effects

Mapping the model system configuration to AWS reveals some interesting opportunities. First, it should be noted that in the pre-transcode and store scenario, using the cloud to perform the pre-transcoding step presents clear advantages. Second, observe that the number of EBS volumes connected to a machine is dynamically reconfigurable. So, for example, tiers of storage access could be soft configured such that if the library contained a large amount of rarely used content, that content could be stored in EBS volumes with a comparatively small number of attached storage access machines. More generally, it should be possible to smoothly evolve the system towards a more optimal configuration in response to changing workloads. Taking this one step further, it should even be possible to handle component failures in the system so as to improve the data integrity and service resilience of the system as a whole. These kinds of thing are hard to do cost effectively in small dedicated environments. Another thing to consider. Service adoption is almost never constant, typically it increases and sometimes it decreases. If a dedicated system were deployed to handle the peak usage over a period of time, it could turn out less costly to use a cloud configuration if, for example, service adoption is expected to ramp up or down significantly over that period.

In the Part 2 of this thread, we will dig deeper and present a sensitivity analysis of the major model parameters, followed by an exploration of technology variables such as H.264 SVC.

Footnotes and References

[1] To get a sense for what needs to be done see “Erlang distribution” http://en.wikipedia.org/wiki/Erlang_distribution

[2] Information taken from Netflix investor relations page at http://ir.netflix.com/.

[3] NVidia maintains an extensive collection of applications that use CUDA to interface to NVidia GPUs. See http://www.nvidia.com/object/cuda_home.html

[4] Another main goal of audio and video compression is to remove data from the underlying data set while maintaining the fidelity of the content as perceived by humans.

Leave a Reply