Blog

Offline-First Architecture for Mobile Apps: A Practical Primer
Offline-first gets requested frequently and implemented poorly. The usual pattern: an app ships, users in areas with patchy connectivity complain, and someone adds try/catch around API calls with a toast that says “You’re offline.” That’s not offline-first — it’s offline-aware. There’s a big difference.

The core principle

Offline-first means the app reads from and writes to a local store first, then syncs to the server in the background. The user never waits for a network response to see or interact with their data. Connectivity is treated as a nice-to-have optimisation, not a requirement.

Choosing a local store

For React Native, WatermelonDB is our default for relational data. It’s SQLite-backed, fast on large datasets, and designed specifically for offline-first sync patterns. For simpler apps (mostly read, occasional write), MMKV with a manual sync queue works with less setup.

The sync queue pattern
1. Write to local DB, assign a client-generated UUID
2. Enqueue a sync operation
3. When connectivity is available, flush the queue against the API
4. On conflict, apply your resolution strategy (last-write-wins, merge, or user prompt)
The conflict resolution strategy is where most implementations get lazy. Define it explicitly before you write a line of sync code — it’s almost impossible to retrofit.

What to test

Test the offline paths as first-class scenarios: airplane mode on device, background sync after reconnect, sync failure and retry. Most teams test happy path in CI and assume offline works. It rarely does until you test it deliberately.
June 3, 2026
Why We Migrated a 200k-User SaaS from CRA to Next.js App Router
When Fluxen’s dashboard started shipping features weekly, our Create React App setup became the bottleneck. Cold builds crept past 90 seconds, bundle sizes ballooned past 2 MB, and Lighthouse scores on the marketing pages hovered in the 50s. Moving to Next.js App Router wasn’t a weekend project — it took six weeks and touched every corner of the codebase.

The trigger

The final straw was a client demo where the app took 8 seconds to become interactive on a 4G connection. We’d been papering over performance with skeleton loaders and optimistic UI, but we were losing the race. Next.js offered server components, streaming SSR, and built-in image optimisation — exactly what we needed.

What we migrated first

We started with the public-facing pages (marketing, pricing, docs) because they had the clearest performance story. Moving those to static generation with generateStaticParams cut time-to-first-byte from ~600 ms to under 80 ms. The wins were immediate and visible.

The authenticated dashboard was harder. We had hundreds of components with useEffect-heavy data fetching. We adopted a “server shell, client leaf” pattern: layout and navigation became server components; interactive widgets stayed as client components. This reduced the client bundle by 40%.

What broke

Context providers that lived at the app root needed to move inside a 'use client' boundary wrapper. Several third-party libraries (a charting lib and a drag-and-drop package) threw hydration errors until we wrapped them in dynamic imports with ssr: false.

Results after 8 weeks in production
- Lighthouse performance score: 54 → 91
- JS bundle (initial): 2.1 MB → 780 KB
- Build time (CI): 94 s → 31 s
- Support tickets about slowness: down 70%
The migration was painful but worth it. If you’re still on CRA, start with your public pages — the wins are fast and build the team’s confidence for the harder dashboard work ahead.
April 26, 2026
Reducing Mobile App Cold Start Time: Lessons from a Fintech Project
Cold start time is the metric most mobile teams ignore until a competitor’s app loads visibly faster in a side-by-side demo. For fintech apps where trust is paramount, a sluggish start signals instability before the user has seen a single number. When our client flagged a 4.2-second cold start on a mid-range Android device, we ran a structured optimisation sprint.

Profiling first

We used Android Studio’s CPU Profiler and the React Native startup logger to isolate where time was being spent. The breakdown was revealing:
- JS bundle parsing: 1.8 s
- Module initialisation (analytics, biometrics SDKs): 1.1 s
- First render: 0.9 s
- Other: 0.4 s
Bundle parsing: the biggest win

The bundle had grown to 4.1 MB uncompressed. We audited imports with react-native-bundle-visualizer and found three things: a full date library (moment.js, 280 KB) used for one format operation, a PDF viewer loaded eagerly but only used on one screen, and all Lottie animations included at startup.

Switching to date-fns functions only, lazy-loading the PDF screen, and deferring Lottie imports reduced the bundle to 2.6 MB. Bundle parse time dropped to 1.0 s.

SDK initialisation

Analytics and crash reporting SDKs were initialising synchronously on the JS thread. Moving them to a deferred init pattern (fire after the first meaningful render) saved 0.7 s with no change in functionality.

Result

Cold start on the same device: 4.2 s → 1.4 s. The fix took four days of engineering time. Profile before you optimise — the bottleneck is almost never where you expect it.
December 22, 2025
React Native vs Flutter in 2025: What We Tell Clients

Every few months a prospective client asks us to weigh in on React Native versus Flutter. Our answer has evolved as both frameworks have matured. In 2025 the technical gap has narrowed considerably — the decision is now more about team fit and ecosystem than raw performance.

Where React Native wins

If your team already writes TypeScript for a web product, the ramp-up on React Native is genuinely fast. Code sharing between web and mobile — especially business logic, API clients, and form validation — is practical and saves real time. The Expo ecosystem has matured to the point where you can ship to both stores without touching native code for the majority of app types.

We reach for React Native when: the client has an existing React web team, the app is content-heavy or form-heavy, and there’s no specific need for complex custom UI with platform-native feel.

Where Flutter wins

Flutter’s widget system gives you pixel-identical UI across iOS, Android, and web from a single codebase. There’s no bridge overhead for animations, and the tooling — particularly hot reload and the DevTools profiler — is excellent.

We reach for Flutter when: the design is custom and highly animated, performance on lower-end Android devices is a hard requirement, or the team doesn’t have existing React experience.

The one thing that still tips the scales

Hiring. In most European markets, React Native developers are significantly easier to find than Flutter/Dart developers. For clients who plan to hire a team post-handoff, that’s often the deciding factor.

Neither choice is wrong. Both can produce excellent apps. The framework that the team shipping it knows best will produce the better product.

November 11, 2025

Fine-Tuning vs RAG: How We Actually Choose

Retrieval-Augmented Generation has become the default recommendation for almost every enterprise LLM project, to the point where fine-tuning is treated as exotic or unnecessary. That’s an overcorrection. Both approaches solve real problems; they solve different ones.

What each approach actually solves

RAG solves the knowledge freshness problem. The model doesn’t need to know facts — it retrieves them at query time from a store you control. It’s the right tool when the information changes frequently, when you need source attribution, or when the knowledge base is too large to fit in a context window.

Fine-tuning solves the behaviour and style problem. You can’t RAG your way to a model that consistently responds in a specific tone, formats outputs a specific way, or handles a domain-specific task type reliably.

The decision matrix

Need	Approach
Access to up-to-date information	RAG
Consistent output format/structure	Fine-tuning
Domain-specific terminology and tone	Fine-tuning
Attribution and source transparency	RAG
Reducing hallucination on facts	RAG
Few-shot task specialisation	Fine-tuning

What we tell clients who want to start with fine-tuning

Build the RAG pipeline first. It’s faster, cheaper, and easier to iterate. Fine-tune only after you’ve identified a specific, persistent failure mode that retrieval can’t fix. Fine-tuning on top of a good RAG baseline almost always outperforms fine-tuning alone.

September 17, 2025

LLM Evaluation in Production: Beyond Vibes

Shipping an LLM-powered feature without an evaluation framework is the ML equivalent of deploying without tests. A prompt change that “seems fine” in dev introduces a regression in a tone edge case you didn’t test. A model upgrade improves average quality but degrades on a specific task segment. Without evals, you find out from users.

The three-layer eval stack

1. Unit evals

Deterministic assertions on known inputs. If the output for a specific input should always contain a citation, assert it. If it should never start with “I”, assert that. These run in CI on every prompt change and take seconds.

2. Model-graded evals

For qualities that can’t be asserted deterministically (helpfulness, tone, factual grounding), we use a judge model with a rubric. The judge prompt is versioned alongside the application prompt. These are slower and noisier, but catch regressions unit evals miss.

3. Human evals

A sample of real production outputs, rated by a small panel on a defined rubric. We run these before any major prompt change or model upgrade. They’re expensive but they’re ground truth.

The golden dataset

The foundation of all three layers is a golden dataset of 200–500 examples: real user inputs, expected output characteristics, and known failure cases. Building this dataset is the hardest and most important work in LLM evaluation. It compounds over time — every production failure becomes a new golden example.

Making deployment decisions

We set a threshold: a change must not regress more than 2% on the golden dataset and must not introduce any new failures on a list of “never fail” inputs. If it clears both, it ships.

August 17, 2025
Building Accessible Navigation: What WCAG 2.2 Actually Requires
Navigation is the most-audited part of any website, and the most commonly failed. After running accessibility audits on a dozen client projects in the past year, the same three issues appear almost every time: missing skip links, keyboard traps in mobile menus, and focus indicators that exist only in the browser’s default stylesheet.

Skip links

A “Skip to main content” link must be the first focusable element on the page. It can be visually hidden until focused — but it must be reachable by keyboard and functional. The most common mistake is display: none, which removes it from the focus order entirely. Use the clip pattern instead:
```
.skip-link {
  position: absolute;
  transform: translateY(-100%);
}
.skip-link:focus {
  transform: translateY(0);
}
```
Mobile menu keyboard traps

When a mobile menu opens, focus should move inside it. When it closes, focus must return to the trigger button. Failing to return focus leaves keyboard users stranded at the top of the page. We use a small useFocusTrap hook that captures Tab and Shift+Tab while the menu is open.

Focus indicators

WCAG 2.2 introduced Success Criterion 2.4.11 (Focus Appearance — minimum), which requires a focus indicator with at least 3:1 contrast against adjacent colours and an area of at least the element’s perimeter. A safe cross-browser pattern:
```
:focus-visible {
  outline: 2px solid #FFB020;
  outline-offset: 3px;
}
```
Accessibility isn’t a project-end checklist item — it’s cheapest to build in from the first component. These three fixes alone will clear the majority of navigation findings in most audits.
August 4, 2025
The Architecture Review We Wish We’d Demanded Before Starting

There’s a version of every troubled project where the problems were visible from the start, to anyone who looked carefully. The database schema that can’t represent the core business entity without a join across five tables. The event model that can’t support the audit log that compliance will definitely require. The authentication approach that can’t extend to the multi-tenant model on the roadmap.

What we review and why

A pre-engagement architecture review covers six areas: data model, API surface, authentication and authorisation model, infrastructure and deployment topology, observability, and test coverage. We’re not auditing for code quality — we’re looking for structural constraints that will become expensive problems later.

The most valuable output isn’t a list of issues — it’s a risk-ranked prioritisation. Not every architectural problem needs fixing before work starts. The review gives the team a shared, honest picture to make that call.

The conversation with clients

Some clients push back on a review. They want to ship, not audit. Our framing: the review takes a week and might save three months of rework. We’ve had that trade pay off on more projects than we can count.

We share the findings in a written report with a risk rating (high/medium/low) and a recommended action for each finding. High-risk items go into the project plan explicitly.

What we consistently find

The most common high-risk finding is a schema that can’t represent the business entity accurately. The second most common is an auth model that assumes a single-tenant deployment when the product roadmap clearly leads to multi-tenant. Both are cheap to fix at the start and expensive to fix at month six.

July 3, 2025
How We Built a Support Chatbot That Actually Deflects Tickets

The graveyard of failed support chatbots is full of bots that were really just keyword-triggered FAQ lookups with a chat interface. Users ask in natural language, bots match on keywords, the answer misses the question, the user escalates. Everyone loses.

Retrieval-Augmented Generation, not a fine-tuned model

We didn’t fine-tune a model on historical tickets. Fine-tuning is expensive to maintain — every time the product changes, the training set goes stale. Instead we used RAG: a vector store of the current documentation, FAQs, and resolved tickets, with a retrieval step before every generation call.

The retrieval step is where most RAG implementations underperform. We use a hybrid search — dense vector similarity plus BM25 keyword matching, reranked by a cross-encoder. The extra latency (about 200 ms) is worth it for the precision improvement.

Confidence thresholds and graceful escalation

When the model isn’t confident — low similarity scores across the retrieved chunks, or a query that doesn’t match the domain — it escalates explicitly: “I’m not confident I can answer this accurately. Let me connect you with the support team.” Users prefer honest escalation to a confidently wrong answer.

What actually drives the 40% deflection

The biggest factor wasn’t the model quality — it was documentation quality. The bot can only be as good as the content it retrieves. We spent two weeks rewriting the top 30 FAQ entries to be more specific and answer-first. That single change improved deflection rate by 12 percentage points.

If you’re building a support bot, audit your documentation before you build anything else.

May 22, 2025
Modernising a Legacy Codebase Without Stopping Feature Development

Every legacy modernisation project starts with someone proposing a full rewrite. It’s the instinct of engineers who’ve spent months fighting an old codebase: burn it down, start fresh. We’ve seen this end badly enough times that we now treat the full-rewrite proposal as a red flag rather than a solution.

Why rewrites fail

The new system has to reach feature parity with the old one before it can ship. The old system keeps being used (and sometimes extended) while the new one is being built. By the time the new system is ready, the old one has moved on, and the team has been running in parallel for months or years.

The strangler fig approach

We use the strangler fig pattern: route traffic through a thin proxy layer, and migrate functionality piece by piece from the old system to the new one. Each migrated module is deployed and tested independently. The old system shrinks over time; the new one grows. There is no “big bang” cutover.

The key to making this work is identifying the seams. Not every legacy system has clean module boundaries — some are a single large process where everything touches everything else. The first phase of a strangler fig engagement is often introducing those seams.

What to migrate first

Start with the highest-value, lowest-risk modules: things that are well-understood, frequently changed, and have clear inputs and outputs. Not the authentication module (high risk), not the reporting engine (complex) — something in the middle that gives the team a working example of the new architecture in production.

The organisational piece

Modernisation only works if feature development continues on the old system during the migration. This requires explicit rules: new features go on the new system if the relevant module has been migrated; otherwise they go on the old system with a migration ticket created. Without this rule, the migration stalls.

April 26, 2025

Blog

The core principle

Choosing a local store

The sync queue pattern

What to test

The trigger

What we migrated first

What broke

Results after 8 weeks in production

Profiling first

Bundle parsing: the biggest win

SDK initialisation

Result

Where React Native wins

Where Flutter wins

The one thing that still tips the scales

What each approach actually solves

The decision matrix

What we tell clients who want to start with fine-tuning

The three-layer eval stack

1. Unit evals

2. Model-graded evals

3. Human evals

The golden dataset

Making deployment decisions

Skip links

Mobile menu keyboard traps

Focus indicators

What we review and why

The conversation with clients

What we consistently find

Retrieval-Augmented Generation, not a fine-tuned model

Confidence thresholds and graceful escalation

What actually drives the 40% deflection

Why rewrites fail

The strangler fig approach

What to migrate first

The organisational piece