2024 Calendar
TechTalk Daily

The Next Thing to Look For in AI Vendors: Interoperation

The Next Thing to Look For in AI Vendors: Interoperation

By Rob Enderle for TDWI

 When products first come to market, it isn’t unusual for them to play badly with others due to the proprietary nature of the development process and the need to keep a new product secret to avoid premature competition until launch. However, once launched, it becomes critical that the product pivot to being “open” to inform on its content, assure trust, and help other supportive applications interoperate with it.

This opportunity and goal was highlighted with the recent announcement of ElevenLabs’ AI-powered text-to-speech offering that gave ChatGPT videos sound. This video showcases the result, which is impressive but not as impressive as it might have been had the video and sound been created simultaneously rather than sequentially.

Let me explain.

Why Integration During Creation Is Important

If you look at this ChatGPT/ElevenLabs result, the issue is that ElevenLabs’ application doesn’t appear to have access to the code behind the ChatGPT video, so it is only reacting to the result, not to the underlying code that defines that result. So rather than knowing things such as the speed of the vehicles, their internal workings which provide for the realistic physics, and any notations on the code that might better inform how the sound should be created, they are only able to work off the resulting image.

This reduces the potential accuracy of the result and needlessly increases the time needed to create that result. If the two programs were integrated, the user could define both the visuals and the sound in the prompt, and then let the various AIs craft the complete result which should have a far higher degree of believable accuracy and provide a quicker/better result.

Initially, most of what we are seeing is from simple prompts, so the lack of information passthrough isn’t significant. That’s why we aren’t seeing the problem today. However, as we advance the use of AI, we will begin feeding it complete scripts with direction to create better commercial-grade offerings. Those scripts include vocal prompts, and if you were to start with book content, for example, emotional context isn’t conveyed in the video, resulting in far more iterations and time spent than otherwise would be required.

In this example, both ChatGPT and ElevenLabs are parts of a better solution, but only if they function as partners during the creation of the result will that result be optimized. Otherwise, whoever is going second is getting a degraded set of instructions that will result in a degraded outcome.

Let me give you an example. Let’s say you direct the AI to create a walking scene between two people with dialog and action in the directions -- for instance, by providing a paragraph from the book you are turning into a video. ChatGPT will see the entire paragraph, but ElevenLabs only sees the video, not the dialog, which then will need to be added in post production. If both applications get access to the source, that saves a step and any additional context ChatGPT has left out is provided to ElevenLabs, which could use it for vocal inference.

Finally, facial expressions don’t always convey the energy in the words being said -- for instance if the speaker is walking away or not always looking at the camera. Without seeing the original direction, which might have been in a book’s prior paragraph, ElevenLabs may not have the information it needs to initially get the tone right, which also would have to be addressed in post, adding time and effort and potentially introducing avoidable errors…

To learn more about how isolated AI vendors might fall behind and more, read the full article: The Next Thing to Look For in AI Vendors: Interoperation


About the Author

Rob Enderle is the president and principal analyst at the Enderle Group, where he provides regional and global companies with guidance on how to create a credible dialogue with the market, target customer needs, create new business opportunities, anticipate technology changes, select vendors, and products, and practice zero-dollar marketing. You can reach the author via email.

Interested in AI? Check here to see what TechTalk AI Impact events are happening in your area. Also, be sure to check out all upcoming TechTalk Events here.