Synchronizing advanced audio and video features in mobile applications is one of the biggest challenges, especially in a cross-platform environment like React Native. Developers often rely on an ecosystem of off-the-shelf libraries, but what happens when these tools create more problems than they solve?
We faced this exact challenge while working on an innovative application for actors. The project required simultaneous video recording, playback of dialogue tracks, and speech recognition. This story is a case study of how the limitations of popular libraries forced us to go down to the native layer to deliver uncompromising quality and reliability.
The client's primary problem was seemingly simple but critical to the application's usability. Actors recording "self-tapes" needed to hear dialogue lines read by the app. However, this created chaos in the volume levels:
Initially, we considered two approaches: a simple hardcoding of volume levels as a temporary fix, or the much more difficult, long-term solution of audio normalization, either in real-time or in post-processing.
Our investigation revealed that the problem wasn't in the application's logic, but in the fundamental limitations and lack of cooperation between the three key libraries the app relied on.
expo-av
: This library suffered from slow audio loading and had settings that were either unavailable or non-functional. Worse, it drastically lowered the volume of all other playing sounds while in recording mode, which was the source of our main problem.react-native-voice
: It was characterized by an inconvenient API and poor error handling, which complicated implementation and debugging.react-native-vision-camera
: Despite its video handling capabilities, it lacked built-in audio normalization mechanisms and also had issues with preview orientation, which thankfully had a community-provided patch.The key conclusion was this: each of these packages is a separate entity. Each one managed the native audio session on its own, overwriting each other's configurations. Trying to find a "magic combination" of settings that would work for all of them was like hoping for a miracle. We had no cohesive control over the audio system as a whole.
To regain full freedom in managing audio settings, we decided to build our own native module for iOS using
AVFoundation
. This allowed us to bypass the libraries' limitations and create a solution perfectly tailored to the application's needs.
Our module's architecture is built on three pillars:
Vision Camera
's configuration.expo-av
.We also created the basic structure for an Android module to be implemented in the future.
The best summary of our work's impact is the feedback we received from the client. It confirms not only that we solved the key technical issues but also highlights the value of deep expertise combined with product awareness.
"I met The Widlarz Group when my team was in deep trouble. We needed to ship advanced AI audio features with a unique user interface but lacked the internal expertise. TWG guided us through complex challenges with precision, including audio normalization in react-native-vision-camera, slow sound loading in expo-av, and achieving cohesive control across multiple libraries. Beyond solving these deep technical issues, they seamlessly integrated everything into our app with exceptional UX/UI design. Thanks to their expertise, everything now runs smoothly, and the user experience is significantly better. They are a perfect blend of deep technical expertise and product awareness, and their work made all the difference in delivering a high-quality product."