USABILITY TESTING // UX RESEARCH // IxD

Philz Coffee Usability Study & Redesign

A moderated usability study of the Philz Coffee mobile app with 12 first-time users, uncovering where Philz's signature conversational interface delights and where it trips people up.

Moderated Testing 12 Participants iOS & Android UX Eval Fundamentals IxD Redesign

STUDY AT A GLANCE ☕

Participants
Observed

Scored Task
Attempts

50%

Affordance
Friction Rate

100%

Location
Task Success

01 // OVERVIEW

The Study

What We Were Testing

The Philz Coffee app uses an unconventional, conversational interface. Ordering feels more like speaking to a barista than tapping a menu. That's a deliberate brand choice. But does it hold up when a first-time user sits down with it alone? This study evaluated learnability, gesture comprehension, and task completion for new users across four core ordering tasks.

Why It Matters

Philz built something distinctive: an ordering experience that mirrors their in-café warmth. The risk is that when the design model doesn't match users' mobile mental models, charm becomes friction. The goal here was to measure that gap precisely, so recommendations could be surgical rather than generic "make it more standard."

02 // METHOD

How We Did It

PARAMETER	DETAILS
Method	Moderated usability sessions (in-person and remote), conducted by 6 members of "Group 2." Each member reviewed sessions and synthesized findings collectively.
App Tested	Philz Coffee App (iOS and Android)
Participants	15 sessions recorded; 12 usable for analysis. Participant profile: regular coffee or tea drinkers who had never or rarely used the Philz app.
Tasks	4 core scenarios: store/location selection, coffee customization with modifiers, tea selection + special instructions, cart editing and checkout.
Scoring	Each task rated: Success / Success with Difficulty / Did Not Succeed. Gesture log tracked specific interaction behaviors (map vs. search, tap-to-edit recognition, swipe-to-remove attempts).
Goal	Assess learnability, gesture comprehension, and task completion for first-time users. Identify friction points specific to Philz's conversational UI model.

⚠ STUDY LIMITATION

Inconsistent script adherence and participant prompting across 5 moderators likely inflated task success rates. Results should be read as conservative estimates of friction; the real numbers may be higher. This is a direct argument for standardized moderation protocols in future rounds.

03 // TASKS

Task Scenarios

TASK 1

📍 Select a Pickup Location

Select a pick-up location in San Francisco, CA.

✓ Success: Uses search or map to find and select any SF location within reasonable time.

TASK 2

☕ Add Coffee w/ Modifiers

Add a medium "Oatmeal Cookie Cold Brew" with light almond milk and sweet honey. All three modifiers required.

✓ Success: Correct drink with all three modifiers (medium, light, almond milk, sweet, honey).

TASK 3

🍵 Add Tea + Special Instructions

Add any tea of choice, then add a special instruction: "Please do not use the teabag for brewing, use a mesh/filter to avoid microplastics."

✓ Success: Tea added; special instructions field located and used.

TASK 4

🛒 Edit Cart & Checkout

Remove the cold brew from cart, add a second identical tea, then proceed to checkout. Stop at payment details.

✓ Success: Cold brew removed, tea quantity changed to 2, reached payment screen.

ARTIFACT

Moderator's Guide

📋 MODERATION PROTOCOL

Test the app. Not the person.

The moderator script was designed to establish a low-pressure environment and encourage authentic think-aloud behavior. Moderators were instructed not to guide, confirm, or correct. Observe and document only.

"Thank you for helping me today. We are testing the Philz Coffee app to see how easy it is for new users to navigate. Please think out loud as much as possible: tell me what you are looking at, what you're trying to do, and if anything surprises you. I am testing the app, not you, so there are no wrong answers."

MODERATOR BEST PRACTICES

Do not guide or correct Encourage think-aloud Allow productive struggle Avoid confirming correct actions Record hesitation moments

GESTURE LOG: TRACKED PER PARTICIPANT

🗺 Map vs. Search bar in Task 1

👆 Recognized underlined text as tappable?

👈 Tried swipe-left to remove item?

04 // PRIOR EVALUATION

What Philz Gets Right

Before the moderated usability study, I evaluated both the Philz and Starbucks apps against Nielsen's 10 Usability Heuristics in my Interaction Design course (60503, Prof. Andy Vitale). That work established something important: Philz's conversational interface isn't just a liability. Parts of it are genuinely well-executed. The usability study findings hit harder when you understand what the design is trying to do.

✓ Where Philz Succeeds

Visibility of System Status

The add-to-cart button delivers one of the more delightful feedback moments in any food ordering app. It reads "I'll Try It!" before the action, then transitions to "YAY!" after the item is added. It's a small interaction, but it confirms the action immediately and in a voice that matches the brand. Users know it worked.

Match Between System and Real World

The conversational sentence structure mirrors how you'd actually speak to a Philz barista. For users who are already Philz fans, this feels native and warm. The language is intentional, not accidental, and it succeeds at reinforcing the brand's in-café personality.

Aesthetic and Minimalist Design

The ordering screen is uncluttered. There are no competing calls to action, no promotional noise, no loyalty point banners fighting for attention mid-order. Several participants noted the interface looked "cute" and "simple" on first impression.

✕ Where It Breaks Down

User Control and Freedom

Starbucks allows order cancellation post-placement. Philz does not. Once an order is submitted, users have no exit. In a heuristic evaluation this is a clear failure of user control; in a live usability session, it's the kind of thing that would cause real anxiety.

Consistency and Standards

Underlined text on the web means a link. In native apps, tappable elements typically look like buttons, chips, or labeled controls. Philz's underlined modifier words violate this convention so thoroughly that 5 of 12 participants in the usability study never discovered them without struggle.

Error Prevention

The Special Instructions field silently caps input with no character counter and no warning. This isn't just a usability failure; it's an error prevention failure. Users believe they've communicated a preference when they haven't. For someone with a dietary restriction, that gap matters.

THE TENSION

Philz's design is trying to do something real: bring the warmth of their café into a mobile experience. That's a legitimate design goal, and in places it works. The problem is that a handful of execution decisions — underlined modifiers as interactive elements, no character counter, a carousel that obscures the full menu — undermine the experience for anyone who isn't already a Philz regular. The usability study quantified exactly where those breakdowns happen.

05 // RESULTS

What Happened

TASK	DESCRIPTION	✓ SUCCESS	⚡ DIFFICULTIES	✕ FAIL	FRICTION RATE
Task 1	Select pickup location	12	0	0	0%
Task 2	Add coffee with modifiers	6	5	1	50%
Task 3	Add tea with special instruction	9	3	0	25%
Task 4	Remove coffee, duplicate tea, checkout	7	4	1	42%

★ KEY TAKEAWAYS

Modifier affordance: 50% of participants had difficulty noticing or interacting with beverage modifiers. Philz's tappable underlined words were invisible to users who expected dropdowns or tap-through images.

Special Instructions: 25% of participants struggled with the character limit in the special instructions text field, with no counter shown and no feedback when the limit was hit.

Cart modifier: 42% of participants didn't notice the item count button in the cart that allows quantity changes. It's a hidden path that bypasses the need to start over.

06 // THEMES

Friction Patterns

👆 GESTURE & AFFORDANCE MISMATCH

The Invisible Interface

Users' default mobile behaviors (tapping images, expecting dropdown menus) didn't match Philz's interaction model. Tappable underlined words were read as static text or decorative styling, not buttons.

Only 7 of 12 users noticed modifiable text immediately

💬 TERMINOLOGY CONFUSION

What Is "Creamy"?

Modifier labels like "creamy," "light," and "sweet" are evocative but ambiguous. Without scale anchors or definitions, participants slowed down, second-guessed, and asked aloud what amounts these corresponded to.

"What is the quantity of 'creamy'?" (direct participant quote)

🔍 DISCOVERY VS. DEFAULT PATHS

The Carousel Trap

The default "Recommended" carousel view presented an infinite scroll of featured items. Users seeking a specific drink expected search, not endless horizontal browsing. Three users couldn't find the full menu without help.

3 users couldn't find full menu immediately

📝 SPECIAL INSTRUCTIONS CONSTRAINTS

Silent Character Limit

The Special Instructions field enforces a character limit but shows no counter and gives no feedback when the limit is hit. Users continued typing, unaware their request was being silently truncated.

"Nothing pops up showing I've run out of space."

07 // USER VOICE

In Their Words

ONBOARDING

The first two steps kind of lock you out of doing anything else, which I don't love. What if I wanted to make an order really quick? You just want to get to it.

😤 Frustrated

MENU NAVIGATION

Do they not have a search? That to me is a problem.

😤 Frustrated

CUSTOMIZATION

Those are underlined, looks like hyperlinks. Looks like I'm supposed to click and edit them, but nowhere does it tell me to do that.

🤔 Confused

MODIFIER LANGUAGE

The adjectives, I understand from a branding standpoint make it look cute, but it's very difficult for ordering a drink. 'Creamy' and 'Almond Milk' don't make sense together.

🤔 Mixed

SPECIAL INSTRUCTIONS

Oh, I'm out of characters! I can't type anymore. Nothing pops up on the screen showing that I've ran out of space in the text box, so I keep unknowingly typing.

😤 Frustrated

OVERALL SENTIMENT

I know the same things that cause confusion also help to keep it clean. I like the set up if you know what you are doing. After the first time you would know.

⚖️ Balanced

DESIGN PREFERENCE

I much prefer to use checkboxes and radio buttons. I don't understand the difference between: sweet, medium, and light. Does 'sweet' just mean a lot?

🤔 Curious

FIRST IMPRESSION

I like the intro, looks very cute. Looks pretty simple to use.

😊 Positive

08 // UI DISCONNECTS

What Philz Built vs. What Users Did

TASK 2 Hidden Customization

WHAT PHILZ DESIGNED

A conversational sentence structured like a verbal order: "I'll have a large, with creamy oat milk and sweet sugar, iced." Tappable, underlined, bolded words allow the user to modify size, milk type, and sweetener inline — an extension of Philz's in-café voice.

WHAT USERS DID INSTEAD

Tapped the product image, expecting a tap-through
Opened Special Instructions instead of modifiers
Added to cart and tried to edit using the cart's "Edit" button
Did not recognize underlined/bold words as interactive
Discovered modifiers by accident, only after extended exploration

TASK 1 Location Ambiguity

THE ISSUE

Region-based store labels ("San Francisco" vs. "South San Francisco") are ambiguous in a scrollable list. All 12 participants confirmed a store without verifying the full address — only 3 were familiar enough with the area to recognize store names by neighborhood.

Notable: Corte Madera — ~30 minutes north — was grouped under the San Francisco region.

RECOMMENDATION

Show full address and city tag prominently in the store list. Correct geographic groupings so regional labels are accurate (Corte Madera ≠ San Francisco). Consider distance from current location as a default sort.

TASK 4 Quantity Controls

WHAT PHILZ DESIGNED

Tapping the down-arrow on the quantity "1" in the cart opens a bottom drawer — a modal sheet that slides up from the bottom of the screen with stepper controls (+/-) and a "Choose Quantity" confirmation button.

WHAT USERS EXPECTED

A classic dropdown selector or inline +/- stepper — standard patterns that are immediately recognizable. The bottom drawer was a surprise behavior that many participants didn't discover, leading to failed or abandoned attempts at quantity modification.

09 // RECOMMENDATIONS

What to Fix

P1 · HIGH ☕

Modifier Affordance

Add a caret / edit icon next to tappable words
Show a one-time first-run tooltip: "Tap words to customize"
Better visually distinguish modifiers from static text using weight and color

P1 · HIGH 🔍

Clarify Modifier Language

Replace creamy/light/sweet with standard scales or numeric labels
Add microcopy explaining each level (e.g., "creamy = ¾ cream")
Show a plain-language summary of selected modifiers before adding to cart

P2 · MEDIUM 📱

UX Improvements

Stop the "Recommended" carousel from infinite scroll; end with a "See All" card
Show character count in the Special Instructions field
Implement a standard stepper UI directly on the cart line item

← Back to Featured Work