Israeli AI company D-ID, which provided technology for projects like Deep Nostalgia, is launching a new platform where users can upload a single image and text to generate video. With this new site called Creative Reality Studio, the company is targeting sectors like corporate training and education, internal and external communication from companies, product marketing and sales.
The platform is pretty simple to use: Users can upload an image of a presenter or select one from the pre-created presenters to start the video creation process. Paid users can access premium presenters who are more “expressive” as they have better facial expressions and hand movements than the default ones. After that, users can either type the text from a script or simply upload an audio clip of someone’s speech. Users can then select a language (the platform supports 119 languages), voice and styles like cheerful, sad, excited and friendly.
The company’s AI-based algorithms will generate a video based on these parameters. Users can then distribute the video anywhere. The firm claims that the algorithm takes only half of the video duration time to generate a clip, but in our tests, it took a couple of minutes to generate a one-minute video. This could change depending on the type of presenter and language you selected.
“The COVID-19 pandemic has accelerated needs for digital content across the globe. A big problem for organizations is the creation of educational content. Reading documents and going through presentations could be dry and boring. Plus, they have to spend thousands of dollars to hire actors and create educational videos. So we are using our AI to create presenters and tutors to reenact humans and make the content more engaging and effective,” Gil Perry, D-ID’s CEO told TechCrunch in an interview.
Perry pointed out many use cases for this technology— ranging from a multilingual message from a CEO to employees to personalized wishes to an organization’s users.
D-ID launched the studio for testing in mid-August to iron out bugs before the public launch. And while its main focus is to cater to companies of all sizes, the company is seeing a lot of interest from creators on the platform.
The creation of offensive deepfake videos is a risk. That’s why the Israeli firm has put guardrails like filtration of swear words and racist remarks, as well as image recognition to avoid the usage of famous people’s faces. It uses the Microsoft Azure text moderation API to weed out sexual remarks and offensive language in video scripts. D-ID said the platform’s usage terms prohibit users from creating political videos. In case of a breach of any of these rules, the company can suspend the violator’s account and remove their video from the library.
D-ID raised $25 million in its Series B funding led by Macquarie Capital back in March — with a total of $47 million raised to date. Until now, the company had relied on others using its API to create content — Deep Nostalgia is a prime example of that — with clients like Modelez, Warner Bros., and India-based short video app Josh. Now, the company is expanding its money-making products by launching a PowerPoint plug-in along with this self-serve platform. The plug-in adds an interactive presenter to the deck, so users don’t just have to read out slides. They can choose between different avatars, voices, and languages — just like the self-serve platform. But there’s no provision to have a custom presenter at the moment.
At launch, users will be able to sign up for free for a 14-day trial account and create up to five minutes of AI-generated 720p video. After that, they can pay $49 a month to have access to 15 minutes of full HD AI-generated video, a PowerPoint plugin, and email support.
Users can also upload their own audio clips for voice cloning. Plus, the company is working on a tool to let users upload their own footage to train the AI to be more expressive so it can better imitate the person in the video. All these features will be limited to the company’s enterprise tier.
While the company faces competition from the likes of Rephrase.ai and Soul Machines in the AI-generated video area, it claims that there are hardly any companies that are claiming to generate high-quality videos from a single image.
Perry said that D-ID is not aiming to limit itself to corporate training, communication and marketing videos. It also has ambitions of facilitating real-time video call translation and clone presenters — that makes an avatar appear on video instead of you while you dictate the audio.
The company is also considering becoming a key player in web3/metaverse development. “Given that we have expertise in generating videos from a single image. We are thinking about ways to create digital avatars for the metaverse,” Perry said.