Voxygen Testimonial: How Can Speech Synthesis Improve User Experience?

Dydu works with technological partners, such as Voxygen to improve our callbots’ speech synthesis. Would you like to know how the creation of speech synthesis works? Understand the challenges of voice marketing? Read on to discover Christian Sassady’s experience, the sales manager at Voxygen.

Can you quickly describe your company?

Voxygen is a speech synthesis solution editor that uses artificial intelligence. We sell software solutions and create personalised voices for our clients. Voxygen was founded in 2011 as a spin-off from Orange Group. Our headquarters are in Brittany.

What use cases do your clients call on your services for? Which are the most represented sectors among your clients ?

Most of our projects are in customer relations and focus on call centre and IVR (Interactive Voice Servers) issues. We work with a wide range of use cases, such as one-off messages, emergency messages or dynamic self-care scenarios (IVR, callbots). We also work a lot with transport companies for customer announcements. We are best known for the digitalisation of Simone Hérault’s voice, who has been the voice of the French rail service, SNCF, voice for more than 35 years, and who has formed a genuine relationship with users through her voice. A digitalised voice enables a seamless omnichannel user experience.

There are more and more use cases linked to the development of voice assistants, connected speakers and mobile apps. The health and car industries are also using voice more. In the future, voice assistants will be installed in all vehicles. This is a service that all manufacturers are developing and will continue to do so in the years to come. For the health sector, speech synthesis is used for medical simulation, care and student/doctor training.

Can you describe Voxygen’s products and services?

We offer a speech synthesis software solution that converts text into speech, as well as a service that creates specific voices. We have a catalogue of standard voices but can also create personalised voices for clients with a strong voice identity, who want to optimise their customer experience. Our goal is not to create a big catalogue of digitalised voices available to everyone. Indeed, we firmly believe that all big companies will eventually have their own voice identity, just as they have a logo, slogan or graphic identity. We will therefore create personalised voices for them. They can then apply this voice identity to all their voice services and thus optimise the user’s customer experience.

Our voices are developed using artificial intelligence solutions and neural networks. These synthesised voices are then used by our clients and partners via our Text-to-Speech software. This software converts text into expressive speech that closely resembles a human voice.

Voxygen’s speech synthesis component is often interfaced with a more global product, which allows for a genuine man-machine interaction. An example of this is dydu and insurance provider MACSF’s callbot project. Dydu’s natural language processing software calls on Voxygen’s API to generate the callbot’s voice.

How do you create custom voices?

We have developed a comprehensive method for creating custom voices for our clients. There are several scenarios:

  • A client who has already identified a speaker (otherwise known as a “voice talent”) for their vocal identity. The SNCF is an example of this, as they already have lots of messages recorded by Simone.
  • A client who is starting from scratch. They haven’t identified a voice yet and need help for the entire project (from defining their needs to creating and delivering a digital voice).

For the second scenario, we run workshops with the client to understand their expectations. What style of voice are they looking for in relation to their use cases? What expressiveness? What brand values should the digital voice convey? Based on the client’s brief, we then work with recording studios to cast the voice and present it to the client.

Once we’ve identified the speaker, we spend several days in a recording studio reading a script (list of sentences). This script is prepared before the recording, based on Voxygen’s know-how in terms of linguistics and acoustics, as well as the client’s business data. The more contextualised the voice is, the more natural and smooth it will be for the client’s use cases. Our engineers oversee the recordings, and we create a kind of sound library for that voice.

The audio recordings are then processed to create the synthetic voice using AI technology and our tools. Once the voice has been created, it is delivered along with the software solution chosen by the client. The client can then use our solutions completely autonomously. When requests are sent to the solution, it delivers an audio message in real time, even if it isn’t one of the sentences recorded. Speech synthesis should be able to pronounce anything.

What’s your value proposition compared to players like Google?

Voxygen is a 100% French company. We’re still an SMB, which allows us to provide our clients and partners with responsive and flexible solutions. Players such as Google only work with the cloud, whereas we also offer on-premise or even embedded software for clients that request it.

The main difference is that we provide our clients with custom support when creating personalised voices. We also believe that we offer real added value in the quality of our voices and even more so in the creation of our personalised voices.

What are your next challenges in terms of product?

Our teams are always working on optimising the quality of our voices in French, as well as in other languages. We improve them using AI and neural technology. Clients are more and more demanding when it comes to the quality of voices. They want a fully automated, flexible and high-end service.

In the future, speech synthesis will make it possible to manage fully automated and personalised business use cases with user-specific data. Some of our clients have already deployed solutions for these types of uses.

What are your market prospects?

We’re mainly working on customer relations projects. But more and more use cases are emerging on mobile apps, connected speakers or home automation, for example. Companies can deploy their own voice identities on all their channels. Whether on a PC, smartphone or connected speaker such as Google Home or Alexa. Using the same voice everywhere will strengthen brand identity and provide customers with an optimised experience. It doesn’t make sense to use different voices on each channel.