The Obsession Behind a 0.85-Second Response: Why Noise-Defying AI "HITO" Chooses Hardware-Agnostic Software Deployment

2026/07/01 18:34 (Updated 2026/07/01 19:59)
Editors of Iolite
Written by Noriaki Yagi
SHARE
  • sns-x-icon
  • sns-facebook-icon
  • sns-line-icon
The Obsession Behind a 0.85-Second Response: Why Noise-Defying AI "HITO" Chooses Hardware-Agnostic Software Deployment

The Story Behind "HITO" and the Digital Transformation of Physical Spaces

HITO Demo picture1

——While the digitization of websites and internal corporate systems has advanced, "DX (Digital Transformation) in physical spaces"—such as in brick-and-mortar stores or reception areas—has largely been limited to the installation of impersonal touch-panel kiosks; I feel there are still significant hurdles to overcome. How do you envision "HITO" breaking through this stagnation in physical-space digitization, and what role do you see it playing as next-generation social infrastructure?

Takayuki Moriya (hereinafter "Moriya"): DX is progressing across society as a whole, but I believe that "DX in physical spaces" is the ultimate challenge. We have long been creating virtual humans, and we have a strong desire to fuse them with AI to achieve a level of realism that closely mimics actual human beings.

There are three main reasons why we are focusing on solving corporate challenges in physical spaces, not just online: handling inbound tourism, addressing population decline (labor shortages), and standardizing the quality of customer service.

For instance, it can resolve issues that depend heavily on specific individuals—such as inconsistencies in the information provided depending on the staff member handling the interaction. Above all, we view DX in physical spaces as an essential solution to the serious problems Japan currently faces regarding inbound tourism and labor shortages.

——So, the launch coincided perfectly with the alignment of Aww’s strengths and these societal challenges. What kind of response have you received since issuing the press release?

Moriya: Actually, even before launching this packaged service, we had already established a track record of solving problems for automotive and housing manufacturers using interactive solutions.

However, those were previously custom-development projects. To offer the service to a wider range of companies, we rapidly developed an in-house SaaS model. That is how we arrived at the launch of "HITO."

Fortunately, we have already received inquiries from various industries, and full-scale implementations are underway. For companies that are unsure about how to utilize the service, we also offer reasonably priced trial options, such as Proof of Concept (PoC) programs; so, if you are interested, please do not hesitate to contact us.

Aww Moriya picture1

An uncompromising commitment to ultra-low latency.

——What were the challenges in achieving an ultra-low-latency response—as fast as 0.85 seconds from the moment of speaking—while simultaneously rendering full 3DCG in real-time?

Moriya: We prioritized speed above all else. In a conversational experience, slow response times are the biggest source of frustration for users. Initially, we experimented with UI and UX tweaks to mitigate the perceived latency, but ultimately, we decided to focus relentlessly on raw speed itself.

Under the hood, we combine multiple AI models—including ASR (Automatic Speech Recognition), LLMs (Large Language Models), and TTS (Text-to-Speech)—and we had to shave off time in 0.1-second increments across every single process. When researching other services, we encountered the classic trade-off where increasing speed often compromises accuracy; deciding exactly what to prioritize and what to sacrifice was a major hurdle.

——What kind of testing did you conduct specifically?

Moriya: A service might look accurate on paper, but whether it actually works in a real-world setting is a different story. That’s why we physically tested over 120 ASR services and dozens of TTS services. We repeatedly verified which ones aligned with the conversational experience we envisioned, navigating cases where high speed came at the cost of accuracy, and vice versa.

One particularly difficult challenge was processing audio in noisy environments—a unique aspect of physical spaces. In settings filled with ambient noise or the voices of other people, it was crucial to accurately determine "end-of-speech"—recognizing exactly when the user had finished speaking. To address this, we trained our models using entirely proprietary data, enabling them to respond quickly and accurately even in noisy environments.

The aim of hardware-independent software deployment

——While there are competitors focusing specifically on creating a sense of "real presence" through dedicated hardware, HITO has deliberately chosen a software-first strategy that does not rely on specific hardware. Could you explain your technical strengths and strategic goals regarding future expansion across multiple platforms—such as smartphones, digital signage, and AR/VR environments?

Moriya: While there are certainly advantages to approaching this via dedicated devices, our greatest strength lies in the "virtual human" itself.

Although such installations are still relatively rare in Japan, cities overseas are already teeming with digital signage. As the cost of LED displays drops, we expect to see rapid adoption in Japan as well. This will result in a landscape where the necessary display infrastructure is already widely established.

Therefore, we adopted an approach that lowers the barrier to entry by utilizing existing devices, thereby avoiding the costs and inventory risks associated with developing proprietary hardware.

However, because the system is installed in physical spaces, the hardware inevitably interacts with architecture and interior design. We want to avoid unsightly setups—such as a monitor simply sitting there looking clunky—so we place great importance on how the technology is presented and integrated into the space, such as by wall-mounting it like a picture frame. Ultimately, we do envision rolling out a total package that includes the hardware itself.

imma's picture1
©上岡拓也「Imma Land.」— DIESEL「imma天」より、現在Aww officeにて展示

IP Investment Immune to Obsolescence and ROI Maximization

——When a company adopts a "HITO" (virtual human) as the "face of the brand" (a fully customized solution), how do you think the characteristics of this IP—specifically the ability to maintain complete control and the fact that it does not age or deteriorate over time—transform the company's long-term brand investment (ROI)?

Moriya: We don't just implement systems; we often get involved in projects starting from the creative production and consulting phases, leveraging AI technology.

In doing so, I frequently notice cases where companies lose sight of their core brand value because they are too focused on chasing short-term KPIs. Trying to hit immediate numerical targets can actually end up lowering LTV (Lifetime Value) and ROI.

This is where the unique strength of virtual humans—their potential to grow into long-lasting IP—comes into play.

Initially, they are adopted to solve immediate, clear challenges, such as cutting labor costs or handling inbound inquiries. In physical spaces, the ability to provide multilingual, 24-hour service—unaffected by shift schedules or physical health issues, and maintaining consistent quality in reception, guidance, and customer service—leads directly to ROI by reducing operational burdens and minimizing lost opportunities.

However, if customers respond positively to the character, its role can expand—perhaps to serving as an advertising model or handling the initial stages of job interviews—eventually evolving into the "face of the company" (an IP asset). Furthermore, through conversations with visitors, the company gains a deeper understanding of its customers—such as the questions they ask most frequently and what interests them—insights that are unique to the physical space environment.

We explain to companies that this approach offers the potential to achieve high ROI: operational efficiency and consistent service quality in the short term, and brand building and the accumulation of customer insight assets in the long term.

Building Character LLMs and Protecting Privacy

——When designing from scratch a “Character-LLM (self-generating character and humor)” that embodies a company’s brand and philosophy, please tell us about the process you are most particular about.

Moriya: Actually, rather than focusing on the unique characteristics of each character, we are most focused on ``how accurately we can reproduce the answers the client is looking for.''

We prepare a large number of personas, conduct thorough evaluations, and make adjustments to provide the best answers in every situation. Of course, we do the minimum level of characterization, such as speaking style and tone of voice, but the main premise is that we give top priority to ``establishing linguistic communication without failure.''

——We believe that the ability to operate on a closed network will expand the possibilities for implementation, especially in financial and medical institutions. In the future, as the construction of "sovereign AI (autonomous data infrastructure)" progresses, what kind of strategy do you have in mind for data privacy protection and collaboration with local AI infrastructure?

Moriya: We are already working on ways to use it in areas with extremely strict security requirements, such as financial institutions.

Currently, the basic models of AI are rapidly becoming commoditized, and we are entering an era where sufficiently accurate responses can be provided in a local environment without necessarily being connected to the cloud. In particular, voice conversations in physical spaces often contain personal information, so we believe that building a system that allows data to be used in a secure manner that protects privacy will be extremely important for further service improvements in the future.

As you pointed out, we believe that AI can be introduced in fields such as finance, medicine, and government. We believe that if AI is used not only internally, but also in the provision of services to customers, and the data is used with privacy protected, it will become possible to provide and improve data-driven services.

Aww Moriya picture2

Convergence of Multimodality and Blockchain

——The integration of visual information (multimodality) is considered key to future evolution. When HITO becomes capable of instantly reading a visitor's "facial expressions" and "attire" via camera and autonomously providing hospitality tailored to the context, what kind of developments do you envision?

Moriya: While we already use cameras for person recognition, we haven't yet fully implemented facial expression analysis. Right now, we are prioritizing translating the conversational nuances and strategies of top-tier sales representatives—such as knowing exactly how to phrase things to prompt action—into the system.

Of course, we will continue researching areas like emotion recognition, but we believe it is more important to first thoroughly refine our ability to naturally guide interactions and solve problems through conversation.

——When autonomous AI agents like HITO establish their own economic spheres as influencers, do you envision integrating blockchain technologies—such as "AI character identity (IP) verification" or "co-creation of value with fan communities (DAOs)" via tokens?

Moriya: I see that as a highly meaningful ultimate goal for our evolution. We envision a world where AI agents handle front-facing tasks while blockchain functions as the underlying infrastructure.

However, a fundamental premise for us is that while many blockchain projects originate in the crypto space, real-world business solutions that generate actual revenue should come first. We want AI (HITO) to be utilized to solve specific problems, with blockchain subsequently applied in a sound manner—for instance, in areas like character rights management or proof-of-visit verification.

We could even envision applications beyond customer support, such as managing Japanese IP on the blockchain. That is the ideal sequence for social implementation to unfold.

The Role Humans Should Play in the Age of AI

——There is a powerful message in the fact that this crystallization of advanced technology was named "HITO" (meaning "human"). In a future where multimodal capabilities advance and AI autonomously delivers even hospitality services, what role do you envision for flesh-and-blood humans?

Moriya: I believe we should reduce the amount of work humans do. It is not a matter of AI "stealing" human jobs, but rather "reducing" them. My hope is for people to lead lives that are more truly human.

To take an extreme example, there are already many tasks in society that do not necessarily require a human to perform them. By entrusting such tasks to AI, we should dedicate our time and talent to creating value that only humans can generate—such as high-level corporate sales or planning that demands creativity.

The greatest change AI brings is that it gives us time. We can use that freed-up time to focus on deep thinking and activities that are authentically human. That, I believe, is the ideal form for the society of the future.

Aww Moriya picture3

The Source of Input and Sensibility

——Your approach feels very artistic, and your perspective on branding is incredibly sharp. Is there anything specific you do to hone that sensibility?

Moriya: It might just be that I’ve naturally turned what I love into my career, but I do make a conscious effort to expose myself to a massive amount of information every day through YouTube, social media, and other channels.

Also, the sheer volume of experience I gained in my twenties forms the foundation of who I am today. When I was 24 or 25, I was entrusted with projects worth hundreds of millions of yen; I remember panicking but fighting hard to pull through them. That experience taught me how to handle even the largest-scale jobs with composure.

We live in an era where information is easily accessible, but back then, finding the specific information I needed required what might seem like "wasted time"—such as staying up late watching countless DVDs. However, I believe those inefficient, gritty experiences in gathering information are what ultimately inform my current sensibilities.

Another thing is that I dislike the idea of ​​getting older and relying solely on my own values. I make a conscious effort to stay up-to-date on what young people are looking at and what they find interesting—even if it’s just by observing a single social media post.

Media Contact
Aww Inc.
Sara Giusto, Public Relations
Email: info@aww.tokyo

Aww Moriya picture4

Image courtesy of Aww Inc. / Photo by Shogo Kurobe


Profile

◉ Takayuki Moriya

CEO, Aww Inc.

After launching a business in his twenties, he worked as a producer on numerous projects, including corporate advertising strategies, branding, consulting, commercial production, and music videos for international artists.

He founded Aww Inc. and created "imma," Asia’s first virtual human. She has been featured in over 8,000 media outlets worldwide and has grown to the point of hosting the Paralympic closing ceremony and the World Expo opening ceremony. Moriya himself was also responsible for the selection process for "Myaku-Myaku," the official mascot of the World Expo.


Related Articles

"COACH" x Virtual Human "imma" - AI Styling Experience Launches in Harajuku

Who is the virtual model "imma"? What is the modern world’s “new reality" she talks about?

SHARE
  • sns-x-icon
  • sns-facebook-icon
  • sns-line-icon
Side Banner
Side Banner
MAGAZINE
Iolite Vol.20

Iolite Vol.20

July 2026 issueReleased on 2026/05/29

Interview Michael Shaulov, CEO & Co-Founder of Fireblocks Marcus Infanger, SVP of RippleX PHOTO & INTERVIEW Ryoko Yonekura Special Features "The Future of Payments: Beyond the Gateway" "Innovation Without Taboos: The Dual-Use Shockwave" "The Future of Humanity Expanded by BMI: The 'Sixth Sense' Stemming from Brain-Computer Interface Devices" [Dialogue Series] The NISHI Talk: Crypto Conversations"The 'True Decentralization' of DeFi and the Challenges Facing the Crypto Industry" Kasou NISHI × Yoshihiko Uchida Series Tech and Future by Toshinao Sasaki... and more.

MAGAZINE

Iolite Vol.20

July 2026 issueReleased on 2026/05/29
Interview Michael Shaulov, CEO & Co-Founder of Fireblocks Marcus Infanger, SVP of RippleX PHOTO & INTERVIEW Ryoko Yonekura Special Features "The Future of Payments: Beyond the Gateway" "Innovation Without Taboos: The Dual-Use Shockwave" "The Future of Humanity Expanded by BMI: The 'Sixth Sense' Stemming from Brain-Computer Interface Devices" [Dialogue Series] The NISHI Talk: Crypto Conversations"The 'True Decentralization' of DeFi and the Challenges Facing the Crypto Industry" Kasou NISHI × Yoshihiko Uchida Series Tech and Future by Toshinao Sasaki... and more.