One day, on X (formerly Twitter), a video was streamed that had been translated into a language through an application called HeyGen. When you upload what you said in Japanese to the application called HeyGen, the video is translated into the specified language within a few minutes.
I was impressed by the excellent technology that can translate different languages with a voice that sounds exactly like the natural voice, but this tool also seems to move the mouth of the output video according to the translated language.
The future of simultaneous translation of multiple languages that we have dreamed of may be just around the corner. In this issue of "Editor-in-Chief Focus," I would like to take a deeper look at the currently popular "HeyGen."
◉ "Editor-in-Chief Focus"
The editor-in-chief of "Iolite," a business magazine that covers topics on next-generation technology and finance and economics, follows the hot topics and the forefront of the latest news.
—AI-powered video generation platform "HeyGen"
"HeyGen", which was officially released from beta in July 2022, is an AI-powered video generation platform that can generate avatars and create videos using AI.
A web-based application available in a browser, HeyGen has been growing at a rate of 50% per month since its launch. It is one of the most notable services among AI-related services.
It mainly offers Talking Photo, a function that re-draws the speaker's mouth (lip sync), and a service that automatically translates voice input in text and makes the selected avatar speak. The number of users is growing day by day, and as of August 2023, 3.8 million users visited the site monthly.
It also provides video production services for corporations, and HeyGen's services are used by world-famous companies such as Accenture, Amazon, and NVIDIA, as well as educational institutions such as Columbia University.
—Development team
But why was an application released in 2022 able to explosively increase the number of users in such a short period of time? In fact, CEO Jashua Xu has a history of working on AI development at Snapchat, which was once ranked number one among SNS chosen by American teenagers.
In addition, the fact that it was selected as one of the best artificial intelligence software of 2023 by the software review platform "Tekpon" shows that it has solid technical capabilities and extensive knowledge of AI.
Regarding Tekpon's selection, Joshua Xu said, "The results reflect the team's relentless efforts, and being listed on Tekpon has further motivated us to continue innovating and always provide excellent value to our customers."
—Characteristics of HeyGen
So, let's take a look at what characteristics HeyGen, whose user base is rapidly increasing, has.
[Features]
-Over 100 avatars to choose from
-A wide variety of video templates
-Videos can be created in over 40 languages
-Efficient creative production using ChatGPT and Canva
Over 100 avatars to choose from
You can choose from a variety of nationalities, genders, and costumes, and you can even select illustrations of Shakespeare or the Mona Lisa to read the text you enter. You can also combine the face of an uploaded photo as an avatar for an additional fee, allowing you to create your own custom avatar.
A wide variety of video templates
Video templates are divided into categories such as advertising, SNS, news, and education, and you can choose your favorite design from over 100 types. Some templates are prepared for vertical videos. Another good point is that you can choose vertical videos, which are in high demand on SNS. There is no deadline for downloading the videos you create, and there is also a function to share videos with other users.
You can create videos in over 40 languages
You can choose from over 40 languages, and it also has filters that allow you to narrow down the options by gender, age, and tone of voice, so you can choose the voice that best suits your video concept. You can also record your own voice and create a voice clone, just like with a custom avatar.
You can combine your voice with an avatar of your choice, or you can create a digital clone by combining your voice with an avatar that combines your face photo with an avatar.
There are also other services that allow you to translate videos of about 5 minutes into multiple languages with just one click. At the moment, it takes several minutes to output a video, but in the future, simultaneous multilingual translation may be possible.
You can create content efficiently using ChatGPT and Canva
You can also create content efficiently using existing AI-related services. You can use ChatGPT to output text to be read aloud and have HeyGen read it, or you can use the free and easy-to-create design service "Canva" to create designs using HeyGen's AI avatar. Depending on the combination with existing AI tools, it can be used to create creative works very efficiently.
—Side business ideas: whether or not it can be used commercially
To get straight to the point, HeyGen can be used commercially. However, the HeyGen logo will be displayed on videos created and downloaded with the free plan. As I will explain later, it seems that the logo will no longer be displayed when you sign up for the Creator Plan or higher.
The following are some ways to use the features at the time of writing (October 2023) for business purposes.
[Usage ideas]
- Use in educational content
- Create content using your own or your company's IP
- Personalized video distribution
Use in educational content
According to a report by Forrester Research, a US research company, the amount of information conveyed from a one-minute video is equivalent to 1.8 million words in characters, or about 3,600 pages of a typical web page. The 1.8 million words mentioned here are calculated based on the number of English words. In other words, it can be assumed that the amount of information is equivalent to more than 3 million characters in Japanese.
This report was published in 2014, so it is likely that the amount of information aggregated on modern web pages has increased, but in any case, there is a big difference between the amount of information obtained from text and the amount of information obtained from video.
Create content using your own or your own IP
As mentioned above, on HeyGen, you can upload videos of yourself speaking, or upload images and combine them with avatars. It is also possible to have Shakespeare or Moraliza speak in multiple languages, so you could use your own IP to mass-produce videos like a VTuber, or even hold lectures remotely using your own digital clone.
Personalized video distribution
You can change the avatar to match the distribution destination and expected audience of the video you have created, and distribute personalized content. For example, you could use a lively avatar for an advertising video for an energy drink, and an avatar that suits the age group in which eyesight begins to deteriorate for an advertising video for glasses, so viewers would be able to imagine using it themselves.
Although it's a future idea, it seems that services that allow you to change the speaker of video content such as news that you watch at a fixed time every day to your liking will also appear.
—Pricing Plans
The Talking Photo function will be available from the Creator Plan, and the Business Plan seems to have API access and priority video processing.
—Competing Services (HeyGen vs D-ID)
There are several other applications that allow images and avatars to talk, and one of them that is said to be a competitor is "D-ID". There was a video that actually compared videos generated by HeyGen and D-ID, so I will post it with the reference source.
GSaab Graphics
AI avator
HeyGen vs D - ID | Talkative AI vs. DID: Talking Photos
Personally, the video generated using D-ID seemed to have a slight distortion in the center of the avatar, and the video generated by HeyGen looked more natural.