Tag Archives: compression

XAVC-I or XAVC-L which to choose?

THE XAVC CODEC FAMILY

The XAVC family of codecs was introduced by Sony back in 2014.  Until recently all flavours of XAVC were based on H264 compression. More recently new XAVC-HS versions were introduced that use H265. The most commonly used versions of XAVC are the XAVC-I and XAVC-L codecs. These have both been around for a while now and are well tried and well tested.

XAVC-I

XAVC-I is a very good Intra frame codec where each frame is individually encoded. It’s being used for Netflix shows, it has been used for broadcast TV for many years and there are thousands and thousands of hours of great content that has been shot with XAVC-I without any issues. Most of the in flight shots in Top Gun Mavericks were shot using XAVC-I. It is unusual to find visible artefacts in XAVC-I unless you make a lot of effort to find them. But it is a high compression codec so it will never be entirely artefact free. The video below compares XAVC-I with ProResHQ and as you can see there is very little difference between the two, even after several encoding passes.


 

XAVC-L

XAVC-L is a long GOP version of XAVC-I. Long GoP (Group of Pictures) codecs fully encode a start frame and then for the next group of frames (typically 12 or more frames) only store any differences between this start frame and then the next full frame at the start of the next group. They record the changes between frames using things motion prediction and motion vectors that rather than recording new pixels, moves existing pixels from the first fully encoded frame through the subsequent frames if there is movement in the shot. Do note that on the F5/F55, the FS5, FS7, FX6 and FX9 that in UHD or 4K XAVC-L is 8 bit (while XAVC-I is 10 bit).

Performance and Efficiency.

Long GoP codecs can be very efficient when there is little motion in the footage. It is generally considered that H264 long GoP is around 2.5x more efficient than the I frame version. And this is why the bit rate of XAVC-I is around 2.5x higher than XAVC-L, so that for most types of  shots both will perform similarly. If there is very little motion and the bulk of the scene being shot is largely static, then there will be situations where XAVC-L can perform better than XAVC-I.

Motion Artefacts.

BUT as soon as you add a lot of motion or a lot of extra noise (which looks like motion to a long GoP codec) Long GoP codecs struggle as they don’t typically have sufficiently high bit rates to deal with complex motion without some loss of image quality. Let’s face it, the primary reason behind the use of Long GoP encoding is to save space. And that’s done by decreasing the bit rate. So generally long GoP codecs have much lower bit rates so that they will actually provide those space savings. But that introduces challenges for the codec. Shots such as cars moving to the left while the camera pans right are difficult for a long GoP codec to process as almost everything is different from frame to frame including entirely new background information hidden behind the cars in one frame that becomes visible in the next. Wobbly handheld footage, crowds of moving people, fields of crops blowing in the wind, rippling water, flocks of birds are all very challenging and will often exhibit visible artefacts in a lower bit rate long GoP codec that you won’t ever get in the higher bit rate I frame version.
Concatenation.
 
A further issue is concatenation. The artefacts that occur in long GoP codecs often move in the opposite direction to the object that’s actually moving in the shot. So, when you have to re-encode the footage at the end of an edit or for distribution the complexity of the motion in the footage increases and each successive encode will be progressively worse than the one before. This is a very big concern for broadcasters or anyone where there may be multiple compression passes using long GoP codecs such as H264 or H265.

Quality depends on the motion.
So, when things are just right and the scene suits XAVC-L it will perform well and it might show marginally fewer artefacts than XAVC-I, but those artefacts that do exists in XAVC-I are going to be pretty much invisible in the majority of normal situations. But when there is complex motion XAVC-L can produce visible artefacts. And it is this uncertainty that is a big issue for many as you cannot easily predict when XAVC-L might struggle. Meanwhile XAVC-I will always be consistently good. Use XAVC-I and you never need to worry about motion or motion artefacts, your footage will be consistently good no matter what you shoot. 

Broadcasters and organisations such as Netflix spend a lot of time and money testing codecs to make sure they meet the standards they need. XAVC-I is almost universally accepted as a main acquisition codec while XAVC-L is much less widely accepted. You can use XAVC-L if you wish, it can be beneficial if you do need to save card or disk space. But be aware of its limitations and avoid it if you are shooting handheld, shooting anything with lots of motion, especially water, blowing leaves, crowds etc. Also be aware that on the F5/F55, the FS5, FS7, FX6 and FX9 that in UHD or 4K XAVC-L is 8 bit while XAVC-I is 10 bit. That alone would be a good reason NOT to choose XAVC-L.

Banding in your footage. What Causes It, is it even there?

Once again it’s time to put pen to paper or fingers to keyboard as this is a subject that just keeps coming up again and again.

People really seem to have a lot of problems with banding in footage and I don’t really fully understand why as it’s something I only ever really encounter if I’m pushing a piece of material really, really hard in post production. General the vast majority of the content I shoot does not exhibit problematic banding, even the footage I shoot with 8 bit cameras.

First things first – Don’t blame it on the bits. Even an 8 bit recording  (from a good quality camera) shouldn’t exhibit noticeable banding. An 8 bit recording can contain up to 13 million tonal values. It’s extremely rare for us to shoot luma only, but even if you do it will still have 235 shades and these steps in standard dynamic range are too small for most people to discern so you shouldn’t ever be able to see them. I think that when most people see banding they are not seeing teeny, tiny almost invisible steps what most people see is something much more noticeable – so where is it coming from?

It’s worth considering at this stage that most TV’s, monitors and computer screens are only 8 bit, sometimes less! So if you are looking at one camera and it’s banding free and then you look at another and you see banding, in both cases you are probably looking at an 8 bit image, so it can’t just be the capture bit depth that causing the problem as you cant see 10 bit steps on an 8 bit monitor.

So what could it be?

A very common cause of banding is compression. DCT based codecs such as Jpeg, MJPEG, H264 etc break the image up into small blocks of pixels called macro blocks. Then all the pixels in each block is processed in a similar manner and as a result sometimes there may be a small step between each block or between groups of blocks across a gradient. This can show up as banding. Often we see this with 8 bit codecs because typically 8 bit codecs use older technology or are more highly compressed. It’s not because there are not enough code values. Decreasing the compression ratio will normally eliminate the stepping.

Scaling between bit depths or frame sizes is another very common cause of banding. It’s absolutely vital that you ensure that your monitoring system is up to scratch. It’s very common to see banding in video footage on a computer screen as the video data levels are different to computer data levels and in addition there may also be some small gamma differences so the image has to be scaled on the fly. In addition computer desktops runs at one bit range, the HDMI output another, so all kinds of conversions are taking place that can lead to all kinds of problems when you go from a video clip, to computer levels, to HDMI levels. See this article to fully understand how important it is to get your monitoring pipeline properly sorted. https://www.xdcam-user.com/2017/06/why-you-need-to-sort-out-your-post-production-monitoring/

Look Up Tables (LUT’s) can also introduce banding. LUT’s were never really intended to be used as a quick fix grade, the intention was to use them as an on-set reference or guide, not the final output. The 3D LUT’s that we typically use for grading break the full video range into bands and each band will apply a slightly different correction to the footage than the band above or below. These bands can show up as steps in the LUT’s output, especially with the most common 17x17x17 3D LUT’s. This problem gets even worse if you apply a LUT and then grade on top – a really bad practice.

Noise reduction – In camera or postproduction noise reduction will also often introduce banding. Very often pixel averaging is used to reduce noise. If you have a bunch of pixels that are jittering up and down taking an average value for all those pixels will reduce the noise, but then you can end up with steps across a gradient as you jump from one average value to the next. If you shoot log it’s really important that you turn off any noise reduction (if you can) when you are shooting because when you grade the footage these steps will get exaggerated. Raising the ISO (gain) in a camera also makes this much worse as the cameras built in NR will be working harder, increasing the averaging to compensate the increased noise.

Coming back to 8 bit codecs again – Of course a similar quality 10 bit codec will normally give you more picture information than an 8 bit one. But we have been using 8 bits for decades, largely without any problems. So if you can shoot 10 bit you might get a better end result. But also consider all the other factors I’ve mentioned above.

 

How can 16 bit X-OCN deliver smaller files than 10 bit XAVC-I?

Sony’s X-OCN (XOriginal Camera Negative) is a new type of codec from Sony. Currently it is only available via the R7 recorder which can be attached to a Sony PMW-F5, F55 or the new Venice cinema camera.

It is a truly remarkable codec that brings the kind of flexibility normally only available with 16 bit linear raw files but with a files size that is smaller than many conventional high end video formats.

Currently there are two variations of X-OCN.

X-OCN ST is the standard version and then X-OCN LT is the “light” version. Both are 16 bit and both contain 16 bit data based directly on what comes off the cameras sensor. The LT version is barely distinguishable for a 16 bit linear raw recording and the ST version “visually lossless”. Having that sensor data in post production allows you to manipulate the footage over a far greater range than is possible with tradition video files. Traditional video files will already have some form of gamma curve as well as a colour space and white balance baked in. This limits the scope of how far the material can be adjusted and reduces the amount of picture information you have (relative to what comes directly off the sensor) .

Furthermore most traditional video files are 10 bit with a maximum of 1024 code values or levels within the recording. There are some 12 bit codecs but these are still quite rare in video cameras. X-OCN is 16 bit which means that you can have up to 65,536 code values or levels within the recording. That’s a colossal increase in tonal values over traditional recording codecs.

But the thing is that X-OCN LT files are a similar size to Sony’s own XAVC-I (class 480) codec, which is already highly efficient. X-OCN LT is around half the size of the popular 10 bit Apple ProRes HQ codec but offers comparable quality. Even the high quality ST version of X-OCN is smaller than ProRes HQ. So you can have image quality and data levels comparable to Sony’s 16 bit linear raw but in a lightweight, easy to handle 16 bit file that’s smaller than the most commonly used 10 bit version of ProRes.

But how is this even possible? Surely such an amazing 16 bit file should be bigger!

The key to all of this is that the data contained within an X-OCN file is based on the sensors output rather than traditional video.  The cameras that produce the X-OCN material all use bayer sensors. In a traditional video workflow the data from a bayer sensor is first converted from the luminance values that the sensor produces into a YCbCr or RGB signal.

So if the camera has a 4096×2160 bayer sensor in a traditional workflow this pixel level data gets converted to 4096×2160 of Green plus 4096×2160 of Red, plus 4096×2160 of Green (or the same of Y, Cb and Cr). In total you end up with 26 million data points which then need to be compressed using a video codec.

Bayer-to-RGB How can 16 bit X-OCN deliver smaller files than 10 bit XAVC-I?However if we bypass the conversion to a video signal and just store the data that comes directly from the sensor we only need to record a single set of 4096×2160 data points – 8.8 million. This means we only need to store 1/3rd as much data as in a traditional video workflow and it is this huge data saving that is the main reason why it is possible for X-OCN to be smaller than traditional video files while retaining amazing image quality. It’s simply a far more efficient way of recording the data from a bayer camera.

Of course this does mean that the edit or playback computer has to do some extra work because as well as decoding the X-OCN file it has to be converted to a video file, but Sony developed X-OCN to be easy to work with – which it is. Even a modest modern workstation will have no problem working with X-OCN. But the fact that you have that sensor data in the grading suite means you have an amazing degree of flexibility. You can even adjust the way the file is decoded to tailor whether you want more highlight or shadow information in the video file that will created after the X-OCN is decoded.

Why isn’t 16 bit much bigger than 10 bit? Normally a 16 bit file will be bigger than a 10 bit file. But with a video image there are often areas of information that are very similar. Video compression algorithms take advantage of this and instead of recording a value for every pixel will record a single value that represents all of the similar pixels. When you go from 10 bit to 16 bit, while yes, you do have more bits of data to record a greater percentage of the code values will be the same or similar and as a result the codec becomes more efficient. So the files size does increase a bit, but not as much as you might expect.

So, X-OCN, out of the gate, only needs to store 1/3rd of the data points of a similar traditional RGB or YCbCr codec. Increasing the bit depth from the typical 10 bit bit depth of a regular codec to the 16 bits of X-OCN does then increase the amount of data needed to record it. But the use of a clever algorithm to minimise the data needed for those 16 bits means that the end result is a 16 bit file only a bit bigger than XAVC-I but still smaller than ProRes HQ even at it’s highest quality level.

The 8 bit or 10 bit debate.

Over the years there have been many, often heated debates over the differences between 8 bit and 10 bit codecs. This is my take on the situation, from the acquisition point of view.

The first thing to consider is that a 10 bit codec requires a 30% higher bitrate to achieve the same compression ratio as the equivalent 8 bit codec. So recording 10 bit needs bigger files for the same quality. The EBU recently evaluated several different 8 bit and 10 bit acquisition codecs and their conclusion was that for acquisition there was little to be gained by using any of the commonly available 10 bit codecs over 8 bit because of the data overheads.

My experience in post production has been that what limits what you can do with your footage, more than anything else is noise. If you have a noisy image and you start to push and pull it, the noise in the image tends to limit what you can get away with. If you take two recordings, one at a nominal 100Mb/s and another at say 50Mb/s you will be able to do more with the 100Mb/s material because there will be less noise. Encoding and compressing material introduces noise, often in the form of mosquito noise as well as general image blockiness. The more highly compressed the image the more noise and the more blockiness. It’s this noise and blockiness that will limit what you can do with your footage in post production, not whether it is 10 bit over 8 bit. If you have a 100Mb 10 bit HD compressed recording and comparable 100Mb 8 bit recording then you will be able to do more with the 8 bit recording because it will be in effect 30% less compressed which will give a reduction in noise.

Now if you have a 100Mb 8bit recording and a 130Mb 10 bit recording things are more evenly matched and possibly the 10 bit recording if it is from a very clean, noise free source will have a very small edge, but in reality all cameras produce some noise and it’s likely to be the camera noise that limits what you can do with the images so the 10 bit codec has little advantage for acquisition, if any.

I often hear people complaining about the codec they are using, siting that they are seeing banding across gradients such a white walls or the sky. Very often this is nothing to do with the codec. Very often it is being caused by the display they are using. Computers seem to be the worst culprits. Often you are taking an 8 bit YUV codec, crudely converting that to 8 bit RGB and then further converting it to 24 bit VGA or DVI which then gets converted back down to 16 bit by the monitor. It’s very often all these conversions between YUV and RGB that cause banding on the monitor and not the fact that you have shot at 8 bit.

There is certainly an advantage to be had by using 10 bit in post production for any renders, grading or effects. Once in the edit suite you can afford to use larger codecs running at higher bit rates. ProRes HQ or DNxHD at 185Mb/s or 220Mb/s are good choices but these often wouldn’t be practical as shooting codecs eating through memory cards at over 2Gb per minute. It should also be remembered that these are “I” frame only codecs so they are not as efficient as long GoP codecs. From my point of view I believe that to get something the equivalent of 8 bit Mpeg 2 at 50Mb/s you would need a 10 bit I frame codec running at over 160Mb/s. How do I work that out? Well if we consider that Mpeg 2 is 2.5x more efficient than I frame only then we get to 125Mb/s (50 x 2.5). Next we add the required 30% overhead for 10 bit (125 x 1.3) which gives 162.5Mb/s. This assumes the minimum long GoP efficiency of x2.5. Very often the long GoP advantage is closer to x3.

So I hope you can see that 8 bit still makes sense for acquisition. In the future as cameras get less noisy, storage gets cheaper and codecs get better the situation will change. Also if you are studio based and can record uncompressed 10 bit then why not? Do though consider how you are going to store your media in the long term and consider the overheads needed to throw large files over networks or even the extra time it takes to copy big files compared to small files.