June 7, 2005
Volume 3, issue 4

Download PDF version of this article PDF

Streams and Standards: Delivering Mobile Video

The era of video served up to mobile phones has arrived and threatens to be the next “killer app” after wireless calling itself.

TOM GERSTEL, TURNER BROADCASTING SYSTEM

Don’t believe me? Follow along…

Mobile phones are everywhere. Everybody has one. Think about the last time you were on an airplane and the flight was delayed on the ground. Immediately after the dreaded announcement, you heard everyone reach for their phones and start dialing.

A smaller subset of wireless subscribers (even that smaller slice is measured in millions of users) already use their phones to browse the Web for news, information, and entertainment. Anyone who has braved those interfaces knows that navigation is difficult and the experience can be frustrating compared with a PC Web browser. People still want to use their phones for browsing, however, chiefly because of proximity to information. Users have decided that they are willing to endure the shortcomings of the interface to get the information they are looking for right now, rather than waiting until they are in front of another media source.

Other forms of portable devices are available but have their drawbacks. Portable televisions have been around for years and have enjoyed a limited amount of success in the market. They’re reasonably mobile, but their chief limitation has been access to the content people want. Portable DVD players have reached an attractive price point for consumers and are appearing as near-standard equipment in many automobiles. They serve the need to pass time in transit, but require the user to consciously plan ahead and provide content for the device.

Combine people’s desire for a limitless supply of video content with the ability to receive it anywhere they go, and you’ll see why mobile video is an important step for both the content and mobile industries.

Mobile vs. Wireless

There’s a big difference between mobile and wireless, and it’s important to explain each in context. Wireless is a method of access and refers solely to how a device gets content. Wireless devices can be anything from a mobile phone to a laptop computer to a billboard sign.

Mobile devices can be thought of in terms of their size. Mobile operators and content providers generally refer to anything PDA-size and smaller as a mobile device. People view such devices as personal extensions of themselves. They are not meant to be shared with others. In that respect, mobile device usage is quite different from traditional television viewing.

Delivery Mechanisms

Akin to Internet streaming video, there are two major classifications of mobile video: live streaming and VOD (video-on-demand). VOD is further classified into subcategories: TVOD (true video on demand), where the content is streamed to the phone at the request of the users; and presumptive download, where the content offering is predownloaded to the phone.

TVOD makes the most efficient use of network resources, as only the content the user requests is ever streamed to the phone. It also allows for a wider variety of content to be made available, as the limitation of storage on the phone isn’t a factor. TVOD has several drawbacks in a mobile environment, however. To provide a satisfactory user experience, consistent bandwidth matching or exceeding the data rate of the video clip must be available on the user’s cellular site. Data connectivity at a cell site is a shared resource, so other users’ consumption of bandwidth can have a direct and adverse impact on your user experience.

Content providers need guidance from mobile operators on the real-world bandwidth conditions in the operator’s network. One operator’s network may support data-transfer rates up to 128 Kbps in optimal conditions, but real-world conditions dictate encode rates at 56 Kbps or less to prevent buffering of the content while it plays back on the phone. This is an important consideration, as mobile operators faced with bandwidth contention at a cell site will often favor voice traffic.

Presumptive downloading of content to phones ensures a higher quality of service for the end user. Every piece of content in the video service is downloaded and stored in the phone’s memory during periods of inactivity. When a user selects a piece of content, the playback is instant and not subject to current network conditions. This allows for the possibility of viewing content even without having a connection to the network (no coverage area, tunnels, airplanes, etc.). From the content provider’s standpoint, it also allows for higher-quality video. Pushing a 128-Kbps file to the phone need not occur in realtime, so higher-data-rate files can be placed on the phone, providing for a more compelling user experience. Although 128 Kbps seems reminiscent of the dial-up modem and ISDN era, given the screen sizes of today’s phones (generally 176 by 144 pixels), it provides what users consider reasonable image quality and motion. The video generally has a lower frame rate than broadband PC-based streaming (8-10 on the phone vs. 15-30 on PC-based video), but the small screen combined with visual perceptions in the eye compensate for the lower rate, resulting in the appearance of better quality.

While the quality of the content on the phone via presumptive download can be argued to be superior to streaming, the quantity of content available to the user is much narrower in scope. Current iterations of phones supporting this model have anywhere from 32 to 64 MB of memory for video storage. Data rates of 128 Kbps allow for approximately 34 to 68 minutes of video to be stored on the phone. For a single content provider this would be adequate; however, given the many (20-plus) brands already in the mobile content market, that leaves two to three minutes of content per provider on the phone.

Update frequency is also a problem for some content providers. Updates to the phone are done in “carousel” fashion to provide equal distribution scheduling for competing content owners, with a complete update cycle taking 60 to 90 minutes. For entertainment content, this is not an issue. For news content, however, waiting 90 minutes to push a breaking news story is a poor user experience. The nature of constantly updated news offerings (television, Web sites, radio) has trained users to expect that whatever they are reading, seeing, and hearing is the latest, best information available to them.

Live video streaming to phones is the true bridge between traditional television and the mobile market. Several cable networks are available via add-on services that allow you to take the network with you on your phone anywhere within range. The delivery parameters for live video streaming are similar to TVOD. A connection with a streaming server must exist during the entire session. The chief difference between the two is that the average session time is generally longer for live content, increasing the risk of a “dropped stream” as the user enters an area of less coverage.

Codecs and Formats

Content providers wishing to reach the widest possible mobile audience must consider producing their mobile content in a wide variety of codecs and formats. Just as Internet-based content providers face the dilemma of which formats to offer their video content in, the mobile market is equally as, if not more, fragmented and has a major drawback: unlike PC-based offerings where it is possible for the user to download additional software to view the content, mobile phones are fairly limited in CPU and memory, so application downloading is not as prevalent. Users expect that whatever they need to access video will be preloaded on the phone. Fewer than 10 percent of phones in the marketplace today are estimated to have “native” support for video playback.

On a positive note, mobile operators usually measure the average life of a mobile phone in months, not years. Updated software can and does make it into the marketplace in shorter cycles than one would expect for embedded systems.

The prevalent codecs and formats in today’s marketplace are broken into three categories: mobile standards, vendor standards, and proprietary codecs.

The mobile standards are specified mainly by 3GPP (3rd Generation Partnership Project), a collaboration of several telecommunications standards bodies. The scope of 3GPP is not limited to multimedia; it provides specifications for most of today’s GSM-based networks. For content providers, the series of 3GPP multimedia standards specifies the container for video content, as well as the supported video and audio codecs multiplexed within that container.

Today’s phones use two primary video codecs: H.263 and MPEG-4. H.263 is used in most videoconferencing appliances. This codec offers extremely low latency, making it well suited for live applications. The CPU requirements for decoding are also extremely light, matching well with the CPU power inside of today’s phones. H.263, however, is not considered a “modern” codec, and when compared with the other codecs available today, it is extremely bandwidth-inefficient.

MPEG-4, or more specifically MPEG-4 Simple Profile, is a modern equivalent of H.263. It has many of the same latency and decoding characteristics, with greater encoding efficiency. Most content providers in the mobile space are using MPEG-4 as their video codec for 3GPP content.

Over the next 6 to 12 months, a new de facto codec should come into wide commercial availability: H.264. It’s also known as advanced video coding. The industry plans to use H.264 encoding for everything from mobile devices to HD-DVDs to transmission of high-definition television networks. It promises two to three times more encoding efficiency when compared with MPEG-2, the current broadcast industry standard. Hardware-based H.264 encoders are commercially available today. Software-based encoders are in various stages of commercial development. It provides extraordinary image quality for a given data rate and is expected to ship standard for videophones in the next few months.

Given phone life cycles, however, content providers will need to continue to encode either H.263 or MPEG-4 video for the foreseeable future.

The 3GPP specification has several audio codecs that provide a good mix of dynamic range and encoding efficiency for the content provider:

AAC (advanced audio coding) is ideally suited for music content and provides efficient fidelity at low data rates (16 to 32 Kbps).
AMR (adaptive multi-rate) provides good voice reproduction at ranges of 4.75 to 12.2 Kbps.
QCELP (Qualcomm code excited linear predictive), based on the Qualcomm PureVoice codec, is included in the 3GPP2 standard for CDMA-based wireless networks.

The two prevalent vendor standards are RealNetworks’ RealVideo and Microsoft’s Windows Media codecs/formats. Windows Media is generally preloaded on phones and PDAs running either Pocket PC Phone Edition or Pocket PC 2003. RealNetwork’s mobile player is available for Symbian, Palm OS, and Pocket PC phones. Some Nokia models (9200 series Communicators and the 3650 and 7650) have the software in their embedded systems.

These formats provide the easiest entrance to the mobile market for content providers, as many of them are familiar with and already producing this format of content for the Web presence. Some slight tweaks to the encoding parameters are required for mobile audiences, but, overall, the barrier to entry is much lower than the mobile standards or proprietary codecs. Both of these standard codecs are also efficient encoders at low data rates, given their origins in the dial-up Internet space.

While Pocket PC- and Palm-based phones are gaining market share, they are by no means the majority of phones in circulation, and their current price points will prevent that from occurring. Using these codecs will help you reach part of your audience, but not the complete set.

The third category of formats and codecs is proprietary. Many of these are based on the J2ME virtual machine and are viable ways of providing video content to customers whose phones don’t have native video support. The codecs themselves are optimized for low-bandwidth operations and decoding efficiency. Encoding content in these formats is done with tools provided by the codec vendor. Standards-based J2ME decoders are difficult to implement, given the additional CPU overhead of the virtual machine combined with the decode operation, creating a niche for these optimized codecs.

Given that these players and codecs require download onto an end user’s phone, there is a higher barrier to entry and acceptance when compared with preloaded functionality. It does, however, allow enthusiastic users unwilling or unable to upgrade their phones to take advantage of your product.

There are many codecs and formats, and no simple mechanism for choosing which ones to support. Providers need to identify their audiences not only by demographic, but also by the devices they carry. For providers wishing to “be everywhere,” it’s not a single category choice. Most providers produce this content in most if not all of the codecs and formats in use today.

Serving Customers

Providing a successful video offering to mobile devices requires coordination with and cooperation of the mobile operator. Mobile-based Web sites can run without operator cooperation, but, given the bandwidth and network requirements for video, it’s not possible to “engineer for success” without the operator’s input. Most of today’s video services are developed in conjunction with the operator, who in turn helps to market the service to customers. That marketing can bring in hundreds of thousands of potential users, quickly raising the need to have an appropriately scaled serving infrastructure.

From a quality-of-service perspective, it’s preferable today to have the wireless operators service the request from the end user. They are better equipped to route traffic appropriately within their networks, and they have the knowledge of their network infrastructures. For one operator, Turner Broadcasting operates an origin streaming server complex that the operator’s streaming caches connect to for content. Each subsequent request is checked for cache validity and logged to the origin server complex, but fulfilled from the operator’s cache. In the event of cache invalidation, content is refetched at rates in several multiples over that required to stream to a device. The content provider gets accurate usage information, the operator can ensure most efficient delivery of the content, and the end user gets the best possible experience from having the content served topologically closest to them.

QA / Testing

The variables in a mobile offering are too numerous to perform both unit and end-to-end testing. More than 100 models of current-generation mobile phones are in active service today. Although the number of those phones that natively support video is relatively few, that number will expand dramatically over the next year as video playback becomes a default feature on many new phones now on their way to market. The offering tends to behave differently from phone to phone, requiring extra testing. Although there are “families” of phones that could limit the amount of necessary device testing, for true QA, each phone must be tested. Unfortunately, this is generally not practical for content providers, and arrangements for testing of initial rollouts will require the operators to test resources if that level of testing is required. Testing resources and standards vary by operator, so standardization may prove difficult, given the differences among devices, services, and operator networks.

Emulators are a reasonable resource for functional testing of the offering. Given the importance of nonfunctional behaviors (latency, buffering, and experience while moving in a car), however, the emulators fall short in many respects, and they are not available for all phones.

Some offerings depend on a new network service being available in a given market. Generally, mobile operators select two or three test markets to perform functional and nonfunctional testing. If the test markets don’t coincide with where your content development resources are, the length of your testing iterations must be altered to compensate for travel to the test market. Given the often rapid-fire iteration of some content offerings, this can keep test plans from on-time completion.

Depending on the offering, beta phones may be required to test your offering. Those are usually in short supply before commercial rollout of the phone. It’s quite possible that a single phone will be provided to the organization for testing/previewing. In addition to the technical demands of testing, business and marketing personnel need time on the phone to demonstrate features and understand the product offering. When scoping the development effort, you must consider scheduling of the actual phone for test windows.

Once the service is launched, service monitoring is possible, but not in as automated or complete fashion as you would expect when compared with Web-service monitoring. It’s not common for phones to perform programmatic operations from an external source (monitoring software), so the true end-to-end service can’t be automatically monitored. At Turner, we have a set of users who periodically check the offering on their phones and report any trouble to our operations center. Although our services have run with a high degree of reliability, we still find ourselves exposed to prolonged outages because of the length of time required to report a problem. We monitor certain components to ensure a reasonable degree of service level and to address the corresponding issues, as shown in table 1.

TABLE 1 Monitoring Components to Maintain Service Level
Monitoring Component	Diagnostic Value
XML metadata feed for phones passes well-formed and system-specific parse.	Display Errors on phone. Some phones display cryptic error if character limits exceeded in some data fields.
Incrementing log files on mobile RTSP streaming and HTTP metadata servers.	General service availability at points upstream. Was once used to discover issue at operator’s gateway.
Error rate and frequency in RTSP streaming logs.	Can identify content publishing issues (404, File Not Found, etc.).
Error rate and frequency in HTTP metadata logs.	Metadata access controlled via access control lists. Denied requests from “legitimate” users indicate ACL issues.

This provides us with a reasonable level of assurance that the resources we control are available to service requests. In that respect, it is on par with the monitoring Turner does for its Web properties. We can answer the question, “Are we reasonably sure that if users have valid access, they can get our content?” As with Web properties, we place the line of demarcation for support at our front door. Given the relationship between the content provider and the user, it would be nice to have a better “user’s view” of whether something between the phone and our content is inhibiting access.

With recent press as an indication, mobile video continues to evolve as a must-have product for both wireless operators and content producers. Consumer demand will drive the maturation of this product, moving it from early development to mainstream over the next year. New wireless devices (such as Sirius Satellite Radio’s announced plans to distribute video to rear-seat units in cars) will further drive market demand for content.

TOM GERSTEL is director of enhanced content systems for CNN Internet Technologies, a division of Turner Broadcasting System. He oversees the production systems for sites including CNN.com, NASCAR.com, and CartoonNetwork.com that produce streaming video, mobile video, and video on-demand for the cable television market. Gerstel has served at various positions within CNN, starting in the Web development group supporting the launch of CNN.com in 1995. He holds a B.S. in television/radio from Ithaca College.

Originally published in Queue vol. 3, no. 4—
Comment on this article in the ACM Digital Library