My personal journey to text-to-speech accessibility bot

Marius Merkevičius
5 min readDec 3, 2021

--

I’ve had this project sitting in my garage for quite a while. That being said, this is not the only project with this fate as it works well in combination with others. So down below, I’ll expand on one of the tools.

The project’s primary purpose is to provide accessibility for people who don’t have access to text information.

For example, let’s take the news. It’s publicly available for anyone. But for the person that does not see, it’s a totally different story.

Photo by Wei-Cheng Wu on Unsplash

And I’m not talking in hypotheticals here. I have a real case scenario at home. I’m not going into too much detail, but my dad does not see that well. Like nearly nothing at all right now. That was when I thought I should make myself a bit more helpful and find a way to make the internet a bit more accessible. He’s at that age, where he didn’t have too much access to it at his time anyway.

I’m hoping to tell you the challenges it raised, possible tools to build it yourself. Or maybe you’ll find some hints or ideas if you’re working on something similar. In any case, there is already a 👉 working concept if you would like to try it out. Or you could always just ping me 🔔, if you have an idea, where this could be used in a more broad sense.

So what is this project?

I’ll cut to the chase. It is pretty simple. It’s just text-to-speech framework.

Now, you’re going to say there are a lot of text-to-speech services already. Yeah, I know. However, there are not a lot of services that would be in my native language (Lithuanian). And not that it would be free. Or could be accessible to a person with little knowledge of tech in general.

I’ve managed to put it together somehow. Lets jump into ’How’.

Build stuff

Working on this project, I did manage to run into various issues. Actually, one of the most fun parts of this project was to try to work around those problems.

From a side view, this may look like a ducktape programming style. And it is. But also, it does work in a stable way, so I’m happy with the result.

Let’s jump into the journey ahead.

Text to speech software

First of all, I did need to find a text-to-speech (TTS for short) software that would manage to synthesize the Lithuanian language. This wasn’t a challenge in itself. I’ve found native software that already does TTS called ’Liepa’. It’s built locally, and it’s free to use. It sounds too perfect, right? It is. It runs only on windows. And it might be hard to use it in a Linux server environment.

’Liepa’ over ’Wine’

Still, after tinkering around a bit, I did manage to run it using Wine. It does not work perfectly, but nothing that I cannot address using a bit of coding. Here are a couple of challenges that needed tinkering.

  • Text cannot be longer than 500 symbols or so

After going overboard, the tool does not synthesize it. Again, relatively easy. Just cut text into chunks, feed it to the tool, then gather it and join it together into one big file. A bit of work but straightforward.

  • Whole mechanism uses a file system

You have to save text to a specific file, launch a process. After the process finishes, you have to gather results. Rinse and repeat. This did cause a bit of thinking, but the solution was quite straightforward, even if it took a bit of polishing.

  • Running external processes using ’Wine’ leaves zombie processes

This one was really a problem for quite a while. This does not happen very often. And when caught, almost the whole system is broken down. Processes cannot be killed. After some time, processes cannot be launched anymore due to something called ’processes limit’, which I do not fully understand myself.

Solving this required a bit of creativity, in which Docker was essential. I’ve built an image that contains a Linux image and has a text synthesizer inside. When launched, it hooks up a local file system, does its job, and destroys the container. Clean and simple. This mechanism still runs to this day.

Making it useful

Now that I have a mechanism that can convert text to audio files, we need to convert something useful. One quick thought that comes to mind, almost all news services provide a RSS service.

It became a simple job of writing a bot that would

  • Fetch RSS
  • Store it into the database
  • Schedule a TTS service to synthesize the text
  • Fetch results from TTS
  • Provide an API or Web that would display the news that could be played over audio player

Putting everything together

To put everything together, I have 3 parts of the whole mechanism in place.

  • RSS service — responsible for gathering info and storing it (text and converted audio files)
  • TTS service — responsible for launching ’Docker’, takes text, converts it into an audio file
  • Messaging service — connects both services

You can find instructions on how to build the TTS service yourself 👉 here. Or if you run into a wall and have an idea that relates, you can always ping me 🔔.

Potential (?)

Even though I did not create anything unique, this is a huge potential in my eyes.

My dad uses my accessibility app (How and what app is another blog post), that lets him access various information, otherwise unavailable.

For a showcase scenario, I asked my friend to use his content from his web ➡️ europoszinios.lt to synthesize the text and use it as proof of concept. You can find results here on my web 🕷️.

As I have the text already synthesized, there’s a plan to back-link to the EZ web itself, to provide an ability for the web to be more accessible.

So the sky is the limit. Or how legal it is to convert content. Just imagine a perfect world where you can create access to various free books, newspapers, instructions, stories, and so on. One can only wish 🤔

Found anything that this tool could be used for? Ping me, and I’ll help you set it up.

Oh, and maybe you wanna try out the synthesizer yourself? Check out here to enter your own input 🔈. I did add some limitations, as my server is used for other various things as well.

--

--