book_controller.rb

My Technical Journal On Athena

Building an open-source mobile app that reads books back to you in your own voice.

May 29, 2026·6 min read

Why I Built Athena

For most of my life, I’ve been an avid reader. At some point, “The Librarian” was an inside joke among my friends, and it's a title I wear like a badge of honor.

But recently, I noticed something uncomfortable: I wasn’t reading nearly as much anymore.

Not because I stopped loving books, that’s never going to happen. Mostly, it was because life started fragmenting itself into smaller and smaller pieces. There was always something to do, somewhere to be, some tab left open in my brain demanding attention. Perhaps, I was drowing in side-quests.......

So, I started thinking about the concept of friction.

What if I could read while doing other things? What if listening felt as cognitively engaging as reading silently? What if the voice reading the book sounded exactly like me, and as a result, propagates some form of cognitive resonance?

That idea became Athena.

Athena is an open-source mobile app that clones your voice from a short recording, uses that as a reference, and reads books back to you in it. The core idea is simple and it's built on my belief that familiarity reduces friction. If the narration sounds like your own internal voice, your brain stays engaged differently. The experience stops feeling like passive listening and becomes more like an extension of your own thoughts.

This is my honest account of building it so far: what worked, what spectacularly broke, and what the process taught me about engineering.

The Stack

Athena is deliberately split into three isolated services. Each exists for a very specific reason:

  • athena-mobile (React Native): Handles the user experience.
  • athena-api (Rails 8): Handles authentication, libraries, background jobs, and the product layer.
  • athena-tts (Python FastAPI): Handles the voice synthesis pipeline.

Honestly, this project means a lot to me, because other than the fact that it's primarily something I built for myself, I’m more of a React and React Native developer, and I wanted to push myself out of my comfort zone.

After spending time contributing to an open-source Rails codebase, I gained a little insight into why Rails engineers move so fast and it's because the framework aggressively optimizes for convention.

On the voice side, Python was non-negotiable. The entire voice-cloning ecosystem (Kokoro-82M, KokoClone, Kanade) lives there. Trying to force Machine Learning inference workloads through Ruby would have created massive, unnecessary complexity... scratch that... It'd have been a disaster for me.

So, I implemented adequate seperation of concerns by letting Rails handles the product logic, Python crunches the synthesis, and the two communicate over HTTP.

How the Voice Cloning Actually Works

synthesis.py
A peek at the Python backend handling the audio synthesis logic.

This is probably the part I'm sure most people are curious about. Athena’s voice cloning pipeline is a two-stage process:

  1. First, Kokoro generates natural-sounding narration from the raw text using one of its preset baseline voices. It is just plain and has none of your vocal characteristics at this point.
  2. Next, Kanade takes that generated narration, maps it against your voice sample, and transfers your unique vocal characteristics onto the speech.This is where the magic happens.

The result is surprisingly convincing. It’s not flawless as it sounds slightly smoothed out, but it is unmistakably you.

Running on CPU inference, generation takes around 8–15 seconds per chunk. On a GPU, it’s near real-time. For a personal reading app, that CPU tradeoff is perfectly acceptable. If a paragraph takes 10 seconds to generate but buys you a full minute of listening time, the user experience still flows naturally.

The Architectural Decisions

My first implementation of the audio generation was entirely synchronous.

The mobile app would send text to the Rails API, wait for the TTS server to generate the audio, and sit there until the final result arrived.

This failed almost immediately because voice synthesis takes time, HTTP requests time out, everything felt broken.

So, I redesigned the pipeline. Now, Rails handles it asynchronously:

  • It creates a database record for the audio chunk.
  • It queues a background job using Solid Queue.
  • It immediately returns a 202 Accepted to the client.

The mobile app will simply poll the endpoint until the audio is ready.

That refactor taught me a crucial lesson: You do not truly understand queues, background processing, or async architecture until a synchronous system completely fails in your hands.

Lessons?

The most surprising takeaway from this project wasn’t on the ML side. It was how much my mental models shifted while building the backend.

Ruby is changing how I think about software design. Coming from JavaScript, Rails initially felt like magic, the methods appearing dynamically, conventions everywhere, abstractions stacked on abstractions. But eventually, the “magic” dissolved, and I started seeing the rigid, brilliant design underneath it. It really is a language designed to make programmers happy LOL.

Ironically, I think learning Ruby has actually improved how I'll go about writing JavaScript. I’ve started building smaller, cleaner, more composable systems.

More than anything else, Athena taught me this that systems design is not something you memorize. It’s something you absorb through repeated collisions with real problems. You can study queues, auth systems, and caching conceptually. But you only understand them once they fail in production and force you to rethink your architecture.

What’s Next?

The backend is complete. The TTS server is humming. The services communicate correctly.

And now it's time to initiate the aspect that users actually touch: the mobile app. React Native with Expo. The onboarding flow. The audio player. The reading experience itself. The exact moment someone hears their own voice reading Achebe or Austen or Machiavelli back to them.

That’s the next phase. Both backend repositories are public:

If you want to run Athena yourself, the READMEs should get you there.