Humphrey Chen on the Future of Video Consumption in the Workplace

May 20, 2021

Artificial intelligence and machine learning give us the ability to overcome the challenge of quickly finding the key moments in video

COVID-19 significantly disrupted daily life as we knew it, including the amount and the way in which people interact with and consume video content. However, it's hard to say that the recent spike in video consumption is exclusively pandemic-related. One could argue that we were already trending in this direction.

Before the lockdowns started in 2019, people were spending around 84 minutes a day on average watching videos, up from an average of 67 minutes a day in 2018. In 2020, the average jumped to around 100 minutes per day and is sure to continue growing in 2021 and beyond. Being stuck at home unable to socialize in-person surely contributed to this rise as the top streaming services (Netflix, Hulu, Amazon Prime, etc.) finished 2020 with more than a 50% increase in U.S. subscribers, however the increase in video consumption can also be attributed to the shift in our work habits.

The global pandemic and corresponding remote work movement has drastically accelerated the transition of enterprise video conferencing from a “nice-to-have” to a necessity. According to a Zoom blog post, their platform skyrocketed to 300 million daily meeting users in April 2020, up from 10 million in December 2019. Video content is central to everything enterprises are doing today from meetings and team building happy hours, to virtual events, training sessions and more.

Evolution of Video Consumption

Resulting from this shift in our video consumption habits, we are collectively experiencing “video fatigue,” and our preference for consuming content, especially long-form video, is changing yet again. So much content is being consumed both personally (DIY home improvement “how-to” videos, Twitch video game streams, YouTube unboxings, Facebook and Instagram Live feeds, etc.) and professionally (recorded video meetings, internal training videos, virtual event keynotes, webinars, etc.), that we have reached a breaking point. So, how will our preferences continue to change?

The rise in TikTok’s popularity gives us an indication of what the future of video consumption will look like. Smaller, bite-sized chunks of video are successful, especially with the younger generations. Not only can viewers process information quicker, it allows for greater resonance and “sticking power” of the information being provided. Shorter-form content also gives the ability to consume videos at your convenience, like when you’re waiting in the check-out line at the supermarket, or filling up your gas tank at the pump.

The same principles can be applied to the work environment. Remote work showed us that home and parenting responsibilities can easily disrupt our focus both during live video conferences and when watching recorded content, where constantly pausing and restarting a video is disruptive to information retention. Weekdays are stacked with so many video meetings that the last thing we want to do is watch more videos. In fact, video conferencing has become so stressful that Citigroup is rolling out “Zoom-free Fridays” in an effort to combat our collective video fatigue.

It used to be that video engagement was adequately measured by how long someone spends watching a video. However, that’s no longer an ideal metric because our time is at a premium and just because people watch a video from start to finish doesn’t mean they don’t wish they could get the desired information much more quickly. Instead, the focus should be on how quickly we can locate the most relevant topics from video, extract the information, and apply it to our tasks. In other words, spend more time doing and less time watching.

New Technologies Will Impact the Way We Engage with Video Content

Artificial intelligence and machine learning gives us the ability to overcome the challenge of quickly finding the key moments in video by breaking down audio and visual cues, thus making it easier to index, search, and recall specific moments. However, one size does not fit all. Defining the important moments in video is relative; everyone values different moments and an AI should also understand these nuances and distinctions. Training a machine to understand video is a deeply complex process that is based on an orchestration of triggers that are difficult even with current advancements in AI.

There are many factors that are taken into consideration in order to surface important video moments. Machine learning analyzes both video and audio elements such as topics, speakers, intonation and talk-time, body language, animations, and visual aids to identify the important moments in video content. Presently, AI can easily analyze key words and phrases; however, that is still too broad and doesn’t help to define specific actions or context. There are additional variables within those words and phrases, such as terminology and the distinction between multiple voices (for example, a presentation hosted by several people with an array of accents). To take it a step further, in order for AI to fully understand the importance of a moment, it must also take into account the responses or reactions to it. For example, if one of the presentation hosts makes a joke that engages the audience and makes them laugh, AI will designate the importance of that interaction. Outside of AI and ML, ultra-high-definition 4K video streaming, 5G wireless networks and increased battery life will also continue to impact the way we engage with and consume video.

The rise of video consumption signals that enterprises, content creators, marketers, relying on this medium need to react to this overload to operate effectively. Video may still be a commodity but people’s time to watch video is not. Companies will be forced to make it simpler to digest video, especially long-form content, or risk being left behind in the battle for people’s attention.

Source: Streaming Media