Bridging Vision and Execution: My Path from CTO to AI Engineer (and Back Again) PART 1

 Bridging Vision and Execution: My Path from CTO to AI Engineer (and Back Again) PART 1

So, I’ve had a dream of becoming an AI engineer ever since I watched Person of Interest during my years at Strathmore. That’s partly why I jumped straight into a master’s right after undergrad. Unfortunately, I dropped out a little over a year later after several semesters of déjà vu, basically repeating the same courses from my undergrad. And just like that, the dream went dormant.

Fast forward a few years, I joined Kuze.ai and the spark reignited. But here’s the thing, being CTO doesn’t leave much time for focused learning. My days are full of roadmapping, system and architecture design (not just the AI parts, but everything else too), managing teams, approving UI/UX designs, and a myriad of other tasks. I’ve learned a ton, don’t get me wrong, but it’s mostly just enough to implement what we need, not the kind of deep dive I wanted.

So, I set myself a challenge. The best way I know to learn fast is to pick a project, box myself in with a few constraints, and make sure that while the end product matters, learning remains the main priority.

And what project did I choose? An invoicing system for my woodworking business, SF Duka. Why? Because quoting projects is my biggest headache in that business. Customers constantly ask for prices, but here’s the kicker: out of 20 quotes, only about 5 turn into actual sales. The problem is, you don’t know which 5. So you have to treat all 20 like potential revenue, and each quote takes way too much time and effort.

(Side note: in hindsight, maybe I should have built an AI system to predict which customers are actually serious buyers and which ones are just “window shopping.” And to those who ask for custom quotes with zero intention to buy, there’s a very special corner of hell reserved just for you!)

Anyway, project picked, it was time to begin.

Step 1: Scoping Like a Real Engineer

Like any proper project, I started with a PRD. My scope was simple but ambitious:

  • Input: an image plus a message (since that’s what customers usually send).
  • Message must include at least 3 dimensions: L × W × H.
  • Output: the overall cost, a list of materials used (with quantities), and estimated build time.

Constraints:

  • Only models that can run on my laptop (8GB RAM, yes, I like to make life difficult).
  • No “big LLMs” that solve the problem too easily, because this wasn’t just about results, it was about learning.
  • Python + FastAPI, because I wanted to build this like a real product, not just a pile of disconnected scripts.

And with that, I was off.


And with that, I was off.

Step 2: Reality Hits Fast

Within the first hour, I was already in trouble. I was so excited to play with models that I sped through the “boring” software engineering parts, especially text parsing. I figured it’d be simple, just some regex and if/else logic to extract furniture type, dimensions, and features.

Big mistake.

Turns out no matter how many rules you write, they’re brittle, unscalable, and guaranteed to miss the wild ways customers phrase requests. That’s when I realized I needed to stop thinking like just a software engineer (rules and edge cases) and start thinking like an AI engineer.

So I pivoted. Enter Gemma, a local LLM. With one neat function, it was parsing customer messages into clean JSON. And just like that, we had a breakthrough.

Step 3: Lazy but Effective Data Wrangling

Confession time: I’m lazy. But in a good way. The kind of lazy that writes 3 hours of automation code to save 15 minutes of daily manual work. Trust me, it pays off. I also like hiring engineers like me, they usually find elegant, simple solutions where others build massive, over-engineered contraptions.

Back to the project. I knew my system needed a materials database. Why?

  1. So the model wouldn’t recommend materials that don’t exist in Kenya or hallucinate.
  2. So I could actually cost the builds with real prices.

At SF Duka, we already had a semi-structured Excel sheet of materials, costs, and suppliers, like most small businesses do. My lazy side thought: “Perfect! I’ll just feed it line by line into my model via pandas. Problem solved.”

Yeah… no. Context got lost immediately, and the model had no idea what was going on. A friend (an actual AI engineer) suggested vector databases. They would have worked beautifully but came with too much overhead for what I needed. So, reluctantly, I gave in and spun up a Postgres database.

To save myself time, I even asked Claude (via VS Code’s Cline) to read my Excel file and generate the SQL insert statements. Boom, clean, structured database, ready to roll.

Step 4: Enter the Vision Model

For the vision part, I picked LLaVA:7B. Why? Honestly, I browsed Hugging Face for a few minutes, looked at model sizes, and went with one that seemed manageable. Accuracy wasn’t the priority yet, I just wanted the pipeline working end to end:

  1. Input: image + text via API.
  2. Model processes both.
  3. System outputs cost, materials list, and estimated build time.

After some matching logic between vision output and my Postgres materials database, I finally got my first end-to-end response. And let me tell you, it was glorious.

.

Step 5: The Not-So-Accurate but Working Prototype

So, how accurate was it? Short answer: not very. Long answer: hilariously off.

For testing, I used a photo of a luxury chessboard and its pieces. The system spat out a cost of about KES 6,208. The actual build cost? Around KES 47,000.

Yeah… not even close.

But here’s the thing: it worked. It identified 4 out of the 12 materials correctly, pulled costs from the database, and didn’t crash. That’s a win in my book. It’s not accurate yet, but it’s a functioning AI-powered invoicing system built under tight constraints.

Wrapping Up (For Now)

And that’s where I am. I’ve got a working pipeline: image + text -> parsed -> matched to materials -> cost estimate. It’s not yet ready for prime time, but it’s a huge step forward.

In the next part, I’ll dive into improving accuracy, refining material recognition, and (maybe) tackling build-time estimation.

Stay tuned. This journey is just getting started.