Although this worked well, we saw room for improvement. The question for us was: What is the cost of improving performance on the computing/networking resources? To answer that, let’s take a look at the four steps involved in streaming a desktop screen, for example.
Step 1. Capture
Certain resources are required to capture the desktop screen. This process can be fairly CPU intensive, however starting with Windows 8.1, Microsoft began providing a very efficient method of capturing a screen. Because of this, the CPU usage cost was lowered and became reasonable, at least from our point of view.
Step 2. Compress/encode
The raw data of a captured screen is too big to send over a network -- it needs to be compressed in some way. In our original Streamer, we used a light compression technology assisted by a modern GPU such as an Nvidia graphics card. This methodology required a more powerful GPU, but CPU usage was reduced. Compression became easy to render and display on the receiving side, but it also used a lot of network bandwidth.
Step 3. Sending over the network
Even though we compressed the data, it still required a massive amount of bandwidth to be sent over the network. Streaming a 1920 x 1080 screen at 30 FPS consumes about 51% of a gigabit network. With a properly configured switch, multiple streams from multiple sources could go to multiple destinations, but misconfigured switches could easily overwhelm the entire network.
Step 4. Render and display the stream
On the receiving end, a portion of the video wall that will display the stream receives it and renders it. Our old approach using light compression was extremely easy to render. Fairly low-level machines could be used to display streams without too much effort.
As you can see, our strength in utilizing resources was in Step 2 and 4. Our greatest weakness was in Step 3 where the network bandwidth became the bottleneck of streaming. Even though we could perfectly meet the customer demands of having multiple video streams on the LED video wall, the network infrastructure had to be setup appropriately and required special care.
Today, people want to stream higher resolution screens with higher frame rates. Upgrading to a larger network bandwidth such as a 10 gigabit network will accommodate this of course, but it doesn’t actually solve the problem.
So, we looked back into our design and decided to focus on improving Steps 2 and 4 in order to reduce network bandwidth usage. This makes the most sense since computing resources such as the CPU and GPU continue to get faster and more powerful, while network bandwidth remains mostly the same. In a perfect world 10 gigabit networks would be widely available, but we are not holding our breath for that to happen.
In Step 2, we added an industry-standard H.264 encoding engine which encodes a 1920 x 1080 screen at 60 FPS comfortably using modern CPU/GPUs. Then we added our own special patent pending sauce to serve our specific purpose. The result is that the network bandwidth (Step 3) is now reduced to about 50-100 Mb/s, or 5-10% of a gigabit network which reduces the burden on network infrastructure and enables more simultaneous video streams across the video wall. As an added benefit, the encoding engine also allows for much higher image quality than our old compression engine.
In Step 4, the receiving side decodes and renders the stream, which is more work than before. However, since most CPU/GPUs these days include very efficient methods of decoding H.264 streams, the CPU/GPU resource usage increases only by a little.
With this rebalancing of resource usage, we can achieve very efficient streaming with improved image quality and a network bandwidth reduction of about 10 fold.
We place a lot of emphasis on future-proofing. Our software technology can easily adapt to hardware improvement. When more powerful encoding/decoding hardware engines are available, our software engine will be able to utilize them to provide higher resolution streams with minimal network bandwidth increase.