USABILITY TESTING // UX RESEARCH // IxD

Philz Coffee Usability Study & Redesign

A moderated usability study of the Philz Coffee mobile app with 12 first-time users, uncovering where Philz's signature conversational interface delights and where it trips people up.

Moderated Testing 12 Participants iOS & Android UX Eval Fundamentals IxD Redesign
STUDY AT A GLANCE โ˜•
12
Participants
Observed
48
Scored Task
Attempts
50%
Affordance
Friction Rate
100%
Location
Task Success
01 // OVERVIEW

The Study

What We Were Testing

The Philz Coffee app uses an unconventional, conversational interface. Ordering feels more like speaking to a barista than tapping a menu. That's a deliberate brand choice. But does it hold up when a first-time user sits down with it alone? This study evaluated learnability, gesture comprehension, and task completion for new users across four core ordering tasks.

Why It Matters

Philz built something distinctive: an ordering experience that mirrors their in-cafรฉ warmth. The risk is that when the design model doesn't match users' mobile mental models, charm becomes friction. The goal here was to measure that gap precisely, so recommendations could be surgical rather than generic "make it more standard."

02 // METHOD

How We Did It

PARAMETER DETAILS
Method Moderated usability sessions (in-person and remote), conducted by 6 members of "Group 2." Each member reviewed sessions and synthesized findings collectively.
App Tested Philz Coffee App (iOS and Android)
Participants 15 sessions recorded; 12 usable for analysis. Participant profile: regular coffee or tea drinkers who had never or rarely used the Philz app.
Tasks 4 core scenarios: store/location selection, coffee customization with modifiers, tea selection + special instructions, cart editing and checkout.
Scoring Each task rated: Success / Success with Difficulty / Did Not Succeed. Gesture log tracked specific interaction behaviors (map vs. search, tap-to-edit recognition, swipe-to-remove attempts).
Goal Assess learnability, gesture comprehension, and task completion for first-time users. Identify friction points specific to Philz's conversational UI model.
โš  STUDY LIMITATION

Inconsistent script adherence and participant prompting across 5 moderators likely inflated task success rates. Results should be read as conservative estimates of friction; the real numbers may be higher. This is a direct argument for standardized moderation protocols in future rounds.

03 // TASKS

Task Scenarios

TASK 1
๐Ÿ“ Select a Pickup Location

Select a pick-up location in San Francisco, CA.

โœ“ Success: Uses search or map to find and select any SF location within reasonable time.
TASK 2
โ˜• Add Coffee w/ Modifiers

Add a medium "Oatmeal Cookie Cold Brew" with light almond milk and sweet honey. All three modifiers required.

โœ“ Success: Correct drink with all three modifiers (medium, light, almond milk, sweet, honey).
TASK 3
๐Ÿต Add Tea + Special Instructions

Add any tea of choice, then add a special instruction: "Please do not use the teabag for brewing, use a mesh/filter to avoid microplastics."

โœ“ Success: Tea added; special instructions field located and used.
TASK 4
๐Ÿ›’ Edit Cart & Checkout

Remove the cold brew from cart, add a second identical tea, then proceed to checkout. Stop at payment details.

โœ“ Success: Cold brew removed, tea quantity changed to 2, reached payment screen.
ARTIFACT

Moderator's Guide

๐Ÿ“‹ MODERATION PROTOCOL
Test the app. Not the person.

The moderator script was designed to establish a low-pressure environment and encourage authentic think-aloud behavior. Moderators were instructed not to guide, confirm, or correct. Observe and document only.

"Thank you for helping me today. We are testing the Philz Coffee app to see how easy it is for new users to navigate. Please think out loud as much as possible: tell me what you are looking at, what you're trying to do, and if anything surprises you. I am testing the app, not you, so there are no wrong answers."
MODERATOR BEST PRACTICES
Do not guide or correct Encourage think-aloud Allow productive struggle Avoid confirming correct actions Record hesitation moments
GESTURE LOG: TRACKED PER PARTICIPANT
๐Ÿ—บ Map vs. Search bar in Task 1
๐Ÿ‘† Recognized underlined text as tappable?
๐Ÿ‘ˆ Tried swipe-left to remove item?
04 // PRIOR EVALUATION

What Philz Gets Right

Before the moderated usability study, I evaluated both the Philz and Starbucks apps against Nielsen's 10 Usability Heuristics in my Interaction Design course (60503, Prof. Andy Vitale). That work established something important: Philz's conversational interface isn't just a liability. Parts of it are genuinely well-executed. The usability study findings hit harder when you understand what the design is trying to do.

โœ“ Where Philz Succeeds
Visibility of System Status

The add-to-cart button delivers one of the more delightful feedback moments in any food ordering app. It reads "I'll Try It!" before the action, then transitions to "YAY!" after the item is added. It's a small interaction, but it confirms the action immediately and in a voice that matches the brand. Users know it worked.

Match Between System and Real World

The conversational sentence structure mirrors how you'd actually speak to a Philz barista. For users who are already Philz fans, this feels native and warm. The language is intentional, not accidental, and it succeeds at reinforcing the brand's in-cafรฉ personality.

Aesthetic and Minimalist Design

The ordering screen is uncluttered. There are no competing calls to action, no promotional noise, no loyalty point banners fighting for attention mid-order. Several participants noted the interface looked "cute" and "simple" on first impression.

โœ• Where It Breaks Down
User Control and Freedom

Starbucks allows order cancellation post-placement. Philz does not. Once an order is submitted, users have no exit. In a heuristic evaluation this is a clear failure of user control; in a live usability session, it's the kind of thing that would cause real anxiety.

Consistency and Standards

Underlined text on the web means a link. In native apps, tappable elements typically look like buttons, chips, or labeled controls. Philz's underlined modifier words violate this convention so thoroughly that 5 of 12 participants in the usability study never discovered them without struggle.

Error Prevention

The Special Instructions field silently caps input with no character counter and no warning. This isn't just a usability failure; it's an error prevention failure. Users believe they've communicated a preference when they haven't. For someone with a dietary restriction, that gap matters.

THE TENSION

Philz's design is trying to do something real: bring the warmth of their cafรฉ into a mobile experience. That's a legitimate design goal, and in places it works. The problem is that a handful of execution decisions โ€” underlined modifiers as interactive elements, no character counter, a carousel that obscures the full menu โ€” undermine the experience for anyone who isn't already a Philz regular. The usability study quantified exactly where those breakdowns happen.

05 // RESULTS

What Happened

TASK DESCRIPTION โœ“ SUCCESS โšก DIFFICULTIES โœ• FAIL FRICTION RATE
Task 1 Select pickup location 12 0 0 0%
Task 2 Add coffee with modifiers 6 5 1 50%
Task 3 Add tea with special instruction 9 3 0 25%
Task 4 Remove coffee, duplicate tea, checkout 7 4 1 42%
โ˜… KEY TAKEAWAYS
1
Modifier affordance: 50% of participants had difficulty noticing or interacting with beverage modifiers. Philz's tappable underlined words were invisible to users who expected dropdowns or tap-through images.
2
Special Instructions: 25% of participants struggled with the character limit in the special instructions text field, with no counter shown and no feedback when the limit was hit.
3
Cart modifier: 42% of participants didn't notice the item count button in the cart that allows quantity changes. It's a hidden path that bypasses the need to start over.
06 // THEMES

Friction Patterns

๐Ÿ‘† GESTURE & AFFORDANCE MISMATCH
The Invisible Interface

Users' default mobile behaviors (tapping images, expecting dropdown menus) didn't match Philz's interaction model. Tappable underlined words were read as static text or decorative styling, not buttons.

Only 7 of 12 users noticed modifiable text immediately
๐Ÿ’ฌ TERMINOLOGY CONFUSION
What Is "Creamy"?

Modifier labels like "creamy," "light," and "sweet" are evocative but ambiguous. Without scale anchors or definitions, participants slowed down, second-guessed, and asked aloud what amounts these corresponded to.

"What is the quantity of 'creamy'?" (direct participant quote)
๐Ÿ” DISCOVERY VS. DEFAULT PATHS
The Carousel Trap

The default "Recommended" carousel view presented an infinite scroll of featured items. Users seeking a specific drink expected search, not endless horizontal browsing. Three users couldn't find the full menu without help.

3 users couldn't find full menu immediately
๐Ÿ“ SPECIAL INSTRUCTIONS CONSTRAINTS
Silent Character Limit

The Special Instructions field enforces a character limit but shows no counter and gives no feedback when the limit is hit. Users continued typing, unaware their request was being silently truncated.

"Nothing pops up showing I've run out of space."
07 // USER VOICE

In Their Words

ONBOARDING

The first two steps kind of lock you out of doing anything else, which I don't love. What if I wanted to make an order really quick? You just want to get to it.

๐Ÿ˜ค Frustrated
MENU NAVIGATION

Do they not have a search? That to me is a problem.

๐Ÿ˜ค Frustrated
CUSTOMIZATION

Those are underlined, looks like hyperlinks. Looks like I'm supposed to click and edit them, but nowhere does it tell me to do that.

๐Ÿค” Confused
MODIFIER LANGUAGE

The adjectives, I understand from a branding standpoint make it look cute, but it's very difficult for ordering a drink. 'Creamy' and 'Almond Milk' don't make sense together.

๐Ÿค” Mixed
SPECIAL INSTRUCTIONS

Oh, I'm out of characters! I can't type anymore. Nothing pops up on the screen showing that I've ran out of space in the text box, so I keep unknowingly typing.

๐Ÿ˜ค Frustrated
OVERALL SENTIMENT

I know the same things that cause confusion also help to keep it clean. I like the set up if you know what you are doing. After the first time you would know.

โš–๏ธ Balanced
DESIGN PREFERENCE

I much prefer to use checkboxes and radio buttons. I don't understand the difference between: sweet, medium, and light. Does 'sweet' just mean a lot?

๐Ÿค” Curious
FIRST IMPRESSION

I like the intro, looks very cute. Looks pretty simple to use.

๐Ÿ˜Š Positive
08 // UI DISCONNECTS

What Philz Built vs. What Users Did

TASK 2 Hidden Customization
WHAT PHILZ DESIGNED

A conversational sentence structured like a verbal order: "I'll have a large, with creamy oat milk and sweet sugar, iced." Tappable, underlined, bolded words allow the user to modify size, milk type, and sweetener inline โ€” an extension of Philz's in-cafรฉ voice.

WHAT USERS DID INSTEAD
  • Tapped the product image, expecting a tap-through
  • Opened Special Instructions instead of modifiers
  • Added to cart and tried to edit using the cart's "Edit" button
  • Did not recognize underlined/bold words as interactive
  • Discovered modifiers by accident, only after extended exploration
TASK 1 Location Ambiguity
THE ISSUE

Region-based store labels ("San Francisco" vs. "South San Francisco") are ambiguous in a scrollable list. All 12 participants confirmed a store without verifying the full address โ€” only 3 were familiar enough with the area to recognize store names by neighborhood.


Notable: Corte Madera โ€” ~30 minutes north โ€” was grouped under the San Francisco region.

RECOMMENDATION

Show full address and city tag prominently in the store list. Correct geographic groupings so regional labels are accurate (Corte Madera โ‰  San Francisco). Consider distance from current location as a default sort.

TASK 4 Quantity Controls
WHAT PHILZ DESIGNED

Tapping the down-arrow on the quantity "1" in the cart opens a bottom drawer โ€” a modal sheet that slides up from the bottom of the screen with stepper controls (+/-) and a "Choose Quantity" confirmation button.

WHAT USERS EXPECTED

A classic dropdown selector or inline +/- stepper โ€” standard patterns that are immediately recognizable. The bottom drawer was a surprise behavior that many participants didn't discover, leading to failed or abandoned attempts at quantity modification.

09 // RECOMMENDATIONS

What to Fix

P1 ยท HIGH โ˜•
Modifier Affordance
  • Add a caret / edit icon next to tappable words
  • Show a one-time first-run tooltip: "Tap words to customize"
  • Better visually distinguish modifiers from static text using weight and color
P1 ยท HIGH ๐Ÿ”
Clarify Modifier Language
  • Replace creamy/light/sweet with standard scales or numeric labels
  • Add microcopy explaining each level (e.g., "creamy = ยพ cream")
  • Show a plain-language summary of selected modifiers before adding to cart
P2 ยท MEDIUM ๐Ÿ“ฑ
UX Improvements
  • Stop the "Recommended" carousel from infinite scroll; end with a "See All" card
  • Show character count in the Special Instructions field
  • Implement a standard stepper UI directly on the cart line item