Sana, Servers, and Spoons of Matcha

· Tetraslam's Blog

Fixing docs, cloning voices, and catching rebels- a regular Saturday in SF.

Build Log #

Today started off slow but picked up quick. I handled a small fire at Mosaic: a user tripped on our quickstart curl command because I forgot to include an application-type JSON header. Classic Pydantic misinterpretation—empty brackets as a string instead of an object. Fixed it, updated the docs, and the user’s workflow unblocked within the hour. Also tried out Chatterbox-TTS for Akiko, my in-progress local-first desktop companion, and cloned Sana’s (from TWICE) voice (because of course I did). The output isn’t totally perfect, but it’s NEARLY there. It's a zero-shot voice clone, so it's absolutely insane. I'm going to be fine-tuning the model and from there it should be basically identical, which is insane to think about.

mikarin_cry me when bug in docs

I spent a bit of time today polishing the blog automation system. Set up recurring reminders to send context dumps at 8 PM, then pushed the first full pipeline through: dump → task → blog post. Watching the automation trigger right on time was surprisingly satisfying. Also posted the full May Musings blog on my site (https://blog.tetraslam.world/may_musings). It went through multiple heavy edits from the original version, but now it feels very me :D scattered, enthusiastic, link-laced. I love prose.sh btw; it's what I use to host my blog (the original site is actually hosted on blog.tetraslam.world and then I use the rss feed to style it for this site!).

Other than that, I mostly chilled out and poked around on Akiko’s frontend. The Sana voice clone idea is giving me new feature thoughts: maybe add a “mood switch” toggle where the same character reads things in slightly different tones? Could be nice for customizing narration or casual dialogues. Also considering letting people train their own Akiko clones via a web interface, though I’m scared of the bandwidth bill. I'm also thinking about how to make Akiko more "local-first" and less reliant on the internet. Memory is a particularly hard problem to solve, but I have a few ideas.

No major deployments today,just a bunch of maintenance, tiny wins, and weird ideas I’ll chase later. I like having days like this. I had a nice matcha latte with breakfast :)

mikarin_satisfied

Media Diet #

Finished a few more episodes of Dr. Stone Season 2 today. I forgot how tightly paced it gets once they start building the “cellphone”—the science montages are equal parts ridiculous and genuinely well-explained. Genuinely laugh my ass off every time Kaseki comes on screen. Also resumed Andor, starting with episode 7 and watching through 8. Every dialogue feels like a chess match.

Music-wise, I listened to a ton of tropical jazz and japanese city pop. Didn't read anything substantial today—mostly blog drafts, Discord scrolls, and config file docs.

I’ve queued up some heavier stuff for tomorrow: a paper on whisper-finetuning for multilingual phoneme alignment, and scraping poly.pizza for a project I'm working on. Not sure I’ll get through both, but I like having too much on my to-do list. If I don't get through all of it, I'll at least get through some of it! One day I’ll finish it all and become unstoppable. Maybe.

Small Wins & Face-plants #

Wins:

Fails:

Surprises:

Voice Cloning as UX #

Cloning Sana’s voice wasn’t just for kicks (okay, maybe a little bit). It made me realize how much vocal tone affects perceived personality in agents. The same sentence, read with a smile, hits way different than the same thing flat. Especially for projects like Akiko, where tone is part of the character. Voice isn’t just TTS in this case since it’s also branding, interaction design, and emotional feedback.

I think more tools should embrace this: give users control over how their AI sounds, not just what it says. Let me pick the vibe—serious, playful, sleepy, chaotic. We already do this for UI themes and chat personalities. Audio is just the next frontier. What if my dev agent whispered suggestions conspiratorially like a noir sidekick? Or a voice assistant read out the weather like it was bedtime poetry? I think I saw a hilarious tweet from Patrick Collison about this:


The tech is catching up too. Elevenlabs v3 is absolutely insane.

Blog Engine Gets Brain #

Today’s automation tests were low-key the highlight. It’s dumb how good it feels when a bot reminds me to send context, then turns it into something I actually want to read later. Feels like training a tiny assistant version of myself. The blog task stack is clean now: recurring context ping, blog draft generation at 7 AM, and scheduled post publish if I confirm. All running locally, no bullshit.

I think this could be a real thing. Not a Substack killer, but a cool alternative: write a little every day, let an agent clean it up, and auto-publish on a static site. Kind of like a memory garden that grows in public. I’m imagining plugins next—like adding project updates, code snippets, or visual graphs from GitHub commits. Maybe even a timeline view.

One bonus idea: sentiment heatmaps across the month. I already tag my mood in the context dump—could color-code the posts. I bet patterns would emerge. It’s the same curiosity that made me track hours slept, lines coded, books read. Not for optimization; just out of curiosity.

I’m weirdly excited for tomorrow’s dump now. Feels like writing my own diary but less cringe.

mikarin_wow

last updated:

Check out my socials: #