UC Berkeley MIDS Capstone · Spring 2026

A Proactive AI Companion
for Sustaining Long-Term Goals

Our Mission

Reinforcement learning for human improvement

We believe that the real promise of AI isn't smarter machines. It's better humans. By using AI to help people align with their long term goals, we can enable them to live better and more fulfilling lives.

The Problem

The Intention–Action Gap

92%

Failure Rate

According to a 2024 Forbes Health survey, 92% of people who set long-term goals never achieve them — that's 9 out of 10 people. The problem isn't willpower or laziness. It's the enormous distance between wanting something and doing something about it, day after day, month after month.

Forbes Health survey: How long do resolutions last? — Forbes Health / OnePoll survey, 2024 — How Long Do Resolutions Last?

Three forces work against you

Present Bias

Our brains are wired to trade tomorrow's reward for today's comfort. Long-term commitments are sacrificed for short-term comfort.

Cognitive Overload

Goals quietly disappear as they fall out of working memory due to constant noise and competing priorities of daily life.

Static Tools

Today's productivity apps are transactional — they record what you did but never understand what you're going through or what works for you.

Four Pillars of Intentive

Semantic Memory

Your goals aren't stored as checkboxes — they're stored as meaning. Intentive learns your preferences and personality over time.

Context-Aware Check-ins

A single natural-language message each day is all it takes. Intentive maps your words to every relevant goal automatically.

Intelligent Reflections

The system connects your daily actions to your long-term arc, creating a continuous thread between who you are and who you want to become.

Proactive Nudges

When life pulls you off course — and it will — Intentive reaches back, gently and without guilt, to re-anchor you to what you said mattered most.

See It In Action

Product Demo

Data Science Techniques

Our Data Science Approach

Multi-Goal Intent Mapping

One of Intentive's core innovations is reducing the cognitive friction of daily tracking. When you write a single check-in — something as casual as “I ran 3 miles and listened to a Python podcast on the way” — our LLM-based classification layer automatically identifies every goal that check-in is relevant to. The user simply writes a short update and the system takes care of the rest.

Benefit 1

Maximizes signal extraction per interaction from minimal user input

Benefit 2

Reduces cognitive load by eliminating redundant goal-by-goal tracking

Multi-Goal Intent Mapping: single check-in → LLM classification → multiple goals updated — A single check-in is mapped to multiple goals via LLM-based classification

Dynamic Goal State Detection & Re-engagement

To determine when intervention is needed, Intentive tracks each goal using a dynamic goal state model with adaptive stall detection. If a user does not engage within the inactivity window, the goal automatically transitions to a stalled state, triggering re-engagement mechanisms such as nudges and context-aware suggestions.

State	Description
Active	Default state when goal is created. Intentive will try to return user to this state.
Paused	User has chosen to pause. Intentive recognises this and will not nudge.
Stalled	Inactivity detected. Triggers nudges, context suggestions, and local event recommendations.
Completed	User marks complete. Goal achievement plan is finished.
Archived	User archives from any state. Goal achievement plan is finished.

Dynamic Goal State Detection and Re-engagement diagram — Adaptive stall detection triggers re-engagement mechanisms automatically

Research-Driven Interventions

Intentive uses a RAG pipeline built on a behavioral science research corpus (PubMed, ArXiv — 5 papers, 278 chunks) to surface scientifically proven goal-achievement tips. This separates Intentive from apps that only provide generic encouragement or re-hash popular beliefs.

Other apps say…

“Believe in yourself! Keep trying and have confidence you can do it! Get sleep and eat vegetables. Eye of the Tiger!”

Intentive says…

“Create Specific Implementation Plans. Develop clear, detailed strategies for overcoming potential challenges in training. Specify exact times, locations, and actions.”

Research-Driven Interventions: PubMed/ArXiv corpus powering goal tips — RAG-powered tips grounded in peer-reviewed behavioral science research

Architecture

System Architecture & Privacy

Intentive is built on a 5-layer stack with clear separation of concerns. The architecture is designed around one core idea: you stay in control — always.

Intentive system architecture diagram — 5-layer stack with Memzero as the dedicated memory API boundary

Privacy, Security & Guardrails

Privacy wasn't an afterthought — it shaped the architecture. The separation between Memzero and the main backend is a privacy decision: memory lives in its own isolated service with its own auth, so access is explicit and auditable. The right to be forgotten maps to a real delete endpoint.

Data Minimisation

Only what the service requires. No ad profiles, no cross-user benchmarking, no training on your content.

Contextual Integrity

Your data flows only within the context you shared it — personal reflection. Never sold or rented.

Granular User Control

Inspect any memory the AI has derived about you. Delete individually or wipe your account with immediate cascade.

Content Guardrails

Five harm categories are screened at ~0ms latency via regex pattern matching — no LLM call required. Every pattern and action is documented in GUARDRAILS.md.

Category	Action	Resource
Crisis / Suicide	BLOCK goals & check-ins	988 Lifeline
Self-Harm	BLOCK goals / WARN check-ins	988 + distress validation
Eating Disorders	BLOCK goals / WARN check-ins	NEDA helpline
Violence	BLOCK (direct) / WARN (indirect)	988 Lifeline
Illegal Activity	BLOCK (serious) / WARN (minor)	—

Agentic Layer

Agents

Intentive has 15 agents across 4 categories, 12 of which are LLM-powered. The key design decision is the two-tier approach — LLM-powered agents handle work that requires reasoning, while rule-based agents handle deterministic tasks faster and cheaper. All agents communicate results via SSE for real-time UI updates.

Cognitive	Goal Intelligence	Contextual	Infrastructure
Daily North Star	Goal Strategy	Context Suggestions	Search Agent
Monthly Momentum	Goal Health Score	Find Goals (Check-in)	Store Memory
Goal Reflection	Completion Assessment	Research Tips
Check-in Patterns	Batch Health Assessment	Stalled Goal Nudges
	Evaluate Goal States

LLM-powered agent Rule-based / no LLM

Memory Layer

Long-Term Memory

Most apps store your data in rows and columns — and we do that too with PostgreSQL. But the interesting layer is Qdrant, our vector store. When you write a check-in, we convert it into a 768-dimensional vector embedding and store it alongside your goals and past memories. Later, when you write another check-in, we retrieve the semantically closest past context and give it to the AI as background. This is how the assistant gets smarter over time — it's not just reading your latest message, it's reading it in the context of everything relevant you've ever written.

PostgreSQL

Structured Facts

Goals
Check-ins
State history

Qdrant

Semantic Memory

768-dim embeddings
Cosine similarity
Context retrieval

Redis

Short-Term Cache

Goal lists
User profiles
Query results

Evaluation

Evaluation Strategies

The core question: would a real user stay on track with their goals because of this system?

We built a two-layer evaluation framework to measure whether outputs are useful, personalized, and well-timed. All results are tracked in Braintrust for comparison across prompt iterations.

Layer 1 — Agent Prompt Evaluation

12 eval modules test each agent prompt in isolation. Heuristic scorers check format and calibration; LLM-as-judge scorers measure empathy, actionability, and specificity. The Prompt Engineering Lab (built in Streamlit) enables rapid iteration: select a prompt, run it against Bedrock or Ollama, and compare outputs side-by-side.

Heuristic scorers

Format compliance, calibration, length

LLM-as-judge scorers

Empathy, actionability, specificity, persona-fit

Prompt Engineering Lab — Streamlit-based interactive prompt testing environment — Streamlit-based Prompt Engineering Lab with one-click Braintrust integration

Layer 2 — Simulated 30-Day User Journeys

We defined 7 synthetic personas, each with distinct behavior patterns. A LangGraph agent runs each journey day by day — an LLM role-plays as the persona and decides whether to check in based on what the system sent them. Nine real agent prompts fire in sequence.

Fading Felix

Loses motivation over time

Busy Blake

Always exhausted, inconsistent check-ins

Comeback Casey

Two-week break, then returns strong

+ 4 others

Various engagement and dropout patterns

7 scoring dimensions

CoherencePersona-fitStrategy calibrationGoal progressFailure modes (6 checks)RetentionNudge reactions

Combined threshold: 0.7+ = system is working for this persona.

Simulated 30-day user journeys — calendar view with check-ins, nudges, and reflections — 30-day simulated journey for Fading Felix — day-by-day check-ins, nudges, and reflections

Key Learnings & Impact

Nudges need refinement

Nudges scored lowest across all persona evaluations — consistently below threshold. Tone, timing, and content are the highest-priority refinement for the next iteration.

Two-tier design is a cost win

Replacing LLM calls with deterministic logic for state management and memory storage reduced per-check-in cost significantly without quality loss.

Simulated evals as leading indicators

The 30-day journey simulations exposed failure modes — like personas disengaging after generic nudges — before any real user testing, accelerating iteration cycles.

Acknowledgements

Prof Korin Reid & Prof Todd Holloway

UC Berkeley MIDS Capstone Instructors

For their guidance, mentorship, and support throughout the Capstone journey.

Prof Vinicio De Sola

Guest Advisor

For guidance on using user personas and peer-reviewed research paper-based RAG systems for grounding goal-specific tips in behavioral science.

Future Work

From Prototype to Product

👥

Real User Pilot

10–20 users over 2 weeks to validate whether simulated eval scores predict real engagement.

💬

Nudge Calibration

Improve nudge tone, timing, and content — the highest-priority refinement identified by evaluation.

🛡️

Fine-Tuned Guardrails

Explore small fine-tuned models for content guardrails to reduce cost while maintaining accuracy.

🎙️

Voice Check-ins

Make daily logging more natural, especially on mobile, through voice input.

📅

Calendar Integration

Surface goal-relevant events and deadlines automatically from your calendar.

📱

Mobile-Native App

Reduce friction for daily engagement with a native iOS/Android experience.

The Team

Meet the Team

Siddharth Manu

Stephanie Andrews

Aaron Lin

Justin Sterling

UC Berkeley MIDS Capstone · Spring 2026 · Siddharth Manu, Stephanie Andrews, Aaron Lin, Justin Sterling

A Proactive AI Companion for Sustaining Long-Term Goals