Part 2: The Pivot Nobody Saw Coming (Including Me)
I promised a follow-up post. Several months late, but here it is.
And before you judge me, let me explain.
When I last left you, I had limited success with my AI-powered invoice system that could output a number. That was the success part. The limited part? It was wildly inaccurate.
The obvious next step was to train the model on real historical build data. There was just one small problem: I didn’t have enough of it. I only had access to about 400 records and I was going to need more, a lot more!
I spent a few weeks scratching my head, looking for clever workarounds. Every solution I came up with leaned heavily into traditional engineering tricks, deterministic logic, rules and constraints. And while those would probably have improved performance, they would have robbed me of the opportunity to truly learn what I set out to learn: building and training models from the ground up.
So, naturally, I did what I do best.
I pivoted.
The Business Plot Twist
Here’s where things get interesting.
This project made me painfully aware of something at SF Duka: we were spending a massive amount of time preparing custom quotes with very little return. Twenty quotes. Maybe three converts.
That’s not a sales pipeline. That’s self-inflicted stress and bad business given how small our team is.
So I made a drastic proposal to my business partner: Let’s get rid of custom quoting entirely.
To my surprise, she agreed immediately.
We shifted focus to workshops, team-building experiences, and DIY kits. Instead of building one-off items and quoting endlessly, we now design prototypes, cost them properly once, and sell them repeatedly. Higher leverage. Better margins. Better use of our time, and frankly, a lot more fun!
Ironically, an AI side project ended up reshaping the business itself.
Ironically, an AI side project ended up reshaping the business itself.
Now, back to the AI journey.
The Next Challenge: Predicting Tweets
Once I let go of the invoice system, I decided to try something ambitious.
Could I train a model to predict what someone will tweet?
My thinking was simple: Surely there’s a dataset of millions of tweets somewhere.
Half right.
Large tweet repositories exist, but many are distributed as tweet IDs only, meaning you need to “hydrate” them by calling the Twitter (X) API to retrieve the full text.
The engineers reading this already see the problem:
Millions of API calls + aggressive throttling = you’re getting locked out before reaching even a fraction of your dataset.
And since America’s favorite billionaire space boy acquired X, API access has become significantly restricted. Rate limits and access tiers make large-scale hydration practically infeasible unless you’re willing to pay serious money.
And beyond the technical limitations, something else started bothering me.
I’m a strong believer in ethical AI and data privacy. Even if this was “just for learning,” the idea of training on millions of people’s tweets without their awareness or consent didn’t sit well with me.
So I dropped it.
New Criteria, New Direction
At this point, I changed the rules.
The project no longer needed to serve me commercially. Instead, it had to meet two criteria:
- I must have easy and legal access to the data.
- It must be and feel ethically clean. No gray lines.
That’s how I landed on something unexpected:
Formula 1.

Yes, I am now attempting to predict F1 race outcomes for the current season.
More specifically, my goals are:
- Predict race winners with >50% accuracy
- Predict top 3 finishers with >70% accuracy
- Predict driver and constructor championship standings with >80% accuracy at mid-season
Reasonable? Probably not.
Fun? Abso-f***king-lutely.
Reality Check: Building a Model Is a Different Beast
Let me be clear.
I’ve built complex AI-driven systems before, especially at Kuze. But I’ve always used existing models. Building one from scratch? That’s a different monster entirely.
Progress has been… slow. Slower than I expected.
Also, some of you might be wondering, why this approach? Isn’t it better if you just read first? And ordinarily, you would be right. But, I’m a “learn by doing” kinda guy. Picking up a book is the fastest way to make me lose interest. (PS: I’m dyslexic, so my relationship with reading has always been… weird).
And before you say it, I am writing these articles because I want to share this journey and the only thing worse than reading is the idea of me making a video and talking about this. But maybe one day!
My Setup
The stack:
- FastAPI (Python)
- VS Code
- Cline connected to Claude 3.5 Sonnet
- PostgreSQL for the database, there are better options but let’s remember what the main goal here is.
- Kaggle dataset (claimed 1950 to present F1 data… though it stops at 2022)
And before anyone accuses me of “vibe coding”, I’m using AI to automate boilerplate: DB connections, schema setup, validation scripts, tests and creating models based on the CSVs I have. Not to magically write the model for me.

AI paired with good engineering can actually increase productivity by huge margins. Ask Microsoft, they went from ‘security concerns’ or whatever reason it was, to ‘using AI is no longer optional for engineers’ in record time.
Data Ingestion
The dataset from Kaggle is massive and messy: multiple CSVs, relational dependencies, historical quirks. This plus a few free F1 API integrations for data like weather.
Cline helped generate ingestion pipelines quickly, but I still spent hours debugging. AI is helpful, not magical. That said, what would have taken me weeks a few years ago took days.
Small wins.
Feature Engineering (Where Things Got Real)
With data in Postgres, I thought: “Cool, now we train the model.”
Not quite.
You can’t just throw raw race results at a model and hope it figures out race dynamics. The raw data includes things like:
- Lap times
- Pit stop durations
- Final positions
- Status (finished, DNF, etc.)
But I need derived features such as:
- Percentage of races finished in Top 3 (per driver)
- Average qualifying delta vs teammate
- Did Not Finish (DNF) rate in wet conditions
So yes, I am computing engineered features before training.
And you’re correct to assume this is necessary. For tabular prediction tasks like this, apparently feature engineering is often more important than model choice.
Also, the irony is not lost on me: I’m using AI tools to learn how to build AI systems. So technically… It’s training me so I can train others like it. Isn’t that beautiful?!? 😂😂😂
The F1 Knowledge Gap
There was just one small issue.
To engineer meaningful features, I needed a deep understanding of what contributes to winning a race.
And despite being a car guy, I did not follow F1.
And to you who’s saying: Aren’t you a car guy? For shame! Not all car guys are the same, and not every car guy is into F1.
I guess I am now joining the herd since I started my research.
Unpopular opinion: F1 is not the pinnacle of racing. Rally is! F1 is the pinnacle of engineering, but if you want real racing, you go to the dirt. And I will absolutely die on this hill.

Where I Am Now
I’m nearly done implementing feature engineering for my MVP F1 prediction service.
Next step:
Develop a baseline race outcome prediction model. The step I thought I’d reach in week two.
Several months later… here we are.
Between this, life and other projects, progress hasn’t been lightning fast. But I’m committed to finishing this properly. Stay tuned.