by James Delhauer
Having grown up at the precipice of the digital revolution, I sometimes step back and marvel at what has become possible in the last twenty years. Limitations have been toppled like empires and as technology has disseminated to the masses, the definition of cinematic has shifted. Gone are the days of practical matte paintings and model-based set extensions. Where a grandiose set piece might have consisted of a few hundred extras running alongside a series of well-timed special effects, it’s now commonplace to see vast armies numbering in the tens of thousands clashing with one another or alien monsters tossing planets around like dodgeballs. The ante for what we see on screen has truly gone up over the years. But equally important are the developments for what we hear. Though not as obvious as the developments in cinematic visuals, digital audio technology has come just as far, and few innovations demonstrate this better than the rise of 32-bit float audio.
To understand this relatively new technology, some context is needed.
Digital audio is created by taking an analog signal and encoding it as a sequence of numerical samples using a method known as pulse-code modulation (PCM). Each sample represents the amplitude of the signal and individual samples are generated at even intervals so that they can be reassembled to create a facsimile of the original analog sound. A file’s bit depth represents the number of bits of information present in each sample, with larger bit depths resulting in an improved signal-to-noise ratio and dynamic range. In practical terms, this means that a signal captured at a higher bit depth will contain less distortion and can be manipulated to a greater degree than an identical signal captured at a lower bit depth.
Traditional uncompressed 16-bit audio files (the format used to encode music onto audio CD’s) store samples in a sequence with each sample being represented by a 16-digit binary number. The numerical value of this sequence represents a voltage level that corresponds to the signal amplitude, resulting in a dynamic range of 96.3 dB. 24-bit files (the format used most commonly in modern production environments) extend the binary number from 16 digits to 24 digits, resulting in a much greater dynamic range of 144.5 dB. This fifty percent increase in audio resolution has often been compared to the leap from standard-definition to high-definition video, with high-resolution audio allowing for far greater manipulation and signal recovery in post production.
However, the jump from 24-bit audio to 32-bit floating audio is far greater and more significant for our industry. Both 16-bit and 24-bit formats utilize what is known as a fixed-point file system, meaning that the representation of data is calculated based on a whole integer. The newly emerging 32-bit float file format calculates data using “floating” decimal points, allowing for a far greater range of values than even a fixed 32-bit profile would allow. The result is a file format that has a dynamic range of nearing 1600 dB—which could very well be the last necessary increase in audio resolution as the full range of sound believed to be possible within Earth’s atmosphere is about 210 dB.
As a result, properly recorded 32-bit float files have the ability to recover near inaudible data in the signal, as well as unclip sounds that exceed 0 dBFS (what was regarded as the loudest signal level achievable in a WAV file). What’s more, both can be done within a single file, allowing a whisper and a bomb blast to be successfully captured without a change in audio levels.
The accompanying graphic shows the audio waveform of a line of dialog recorded in 32-bit float format. The sound has clipped and become greatly distorted. The second image is the same file after its gain has been reduced by 26.1 dB, just enough to bring the entire waveform back into range. The distortion has been entirely removed and the dialog sounds crisp and clear. The file, despite significant clipping, is still perfectly usable. Compare this to the third image, in which the same adjustment was made after converting the original 32-bit float file down to a 24-bit fixed format. Even at reduced volume, the uniform wave pattern shows that the distortion is still present. This file would not be usable under any circumstances.
For Local 695 Production Mixers, this technology represents a useful tool in their arsenal. More forgiving files offer the ability to “split the difference” when recording scenes that vary widely in terms of sound levels being recorded on a single microphone, such as recording two performers who are speaking at different volumes off of one boom or capturing dialog that will be interrupted by a practical effect. This could mean the difference between a good take and a bad one, with takes that would have been unusable in a 24-bit format being perfectly acceptable today.
However, there are some misnomers regarding this technology that must be addressed. On a recent film set, a producer asked whether or not recording 32-bit float files meant that we’d no longer need to be “quiet on set,” in order to get good audio. The answer to this is a resounding NO. The files can’t magically distinguish between an actor giving a performance and the idle chatter of a conversation behind the camera. A cellphone dinging at the wrong time can still blow a take. Professional etiquette is still a must. That same producer went onto ask if production mixing would become unnecessary since levels could be adjusted in post production. Again, the answer is absolutely not. This is akin to suggesting that lighting is unnecessary now that cameras can capture high dynamic range images. Sure, an unlit scene can go through a degree of brightening and manipulation in post production, but at exorbitant cost and to the detriment of the final product. An incompetent production mix (or worse, an unmixed production track) would make dailies of little use outside of visual purposes, would hinder our brothers and sisters in Local 700 when they have to stop to adjust audio levels multiple times during each shot, and would extend the re-recording mix period. The relationship between production and post production has always been that post’s life is cheaper and easier when production does their job well and this technology, impressive though it may be, will not change that.
On a similar note, multiple articles that I read when researching this piece suggested that productions would be able to get away with capturing all of their on-set audio by planting a single microphone to capture an entire scene. That is also patently false. This technology changes how computers process audio signals in digital files, not the physics of how sound carries through the air on set. Someone being picked up by a microphone across the room is never going to sound the same as someone speaking into a dedicated mic that is being boomed directly in front of them or clipped to their lapel. In short, this technology acts as a safety net during difficult environments for capturing sound. It does not erase almost a century’s worth of best practices.
However, looking to the future, this technology will become critical as the world moves into virtual reality production. Spatial sound requires audio files to be manipulated in real time by whatever algorithm determines the listener’s proximity to a supported sound source. When in a virtual environment, listeners can be exposed to just as many disparate sound sources as they can in the real world, and having the ability to work across the entire scope of human hearing with any given audio source will be a necessity when crafting an immersive virtual soundscape.
At this time, only a handful of production recorders support the capture of 32-bit floating audio files, with those being the Sound Devices MixPre-3 II, MixPre-6 II, MixPre-10 II & A20 Mini; the Zoom F2 & F6; and the Tentacle Track E—though more are set to hit the market in 2022 and their prevalence will only continue to grow as the global chip shortage comes to an end. In the meantime, Local 695 mixers interested in investing in 32-bit float recorders are encouraged to download sample files and explore the benefits in a hands-on manner so that they are ready to work with producers whose productions might benefit from them.
As someone who remembers listening to the muffled sounds of VHS tapes when watching his favorite movies and playing video games with 8-bit audio, the distance we’ve traveled is truly staggering. While we may not find ourselves in need of an audio resolution greater than 32-bit until we figure out how to make movies on other planets, I find myself looking to the future with wonder and curiosity. How will the stories of tomorrow sound and, more significantly for us, what sort of tools will our brothers, sisters, and kin in Local 695 use to capture them?