Philz Coffee Usability Study & Redesign
A moderated usability study of the Philz Coffee mobile app with 12 first-time users, uncovering where Philz's signature conversational interface delights and where it trips people up.
Observed
Attempts
Friction Rate
Task Success
The Study
The Philz Coffee app uses an unconventional, conversational interface. Ordering feels more like speaking to a barista than tapping a menu. That's a deliberate brand choice. But does it hold up when a first-time user sits down with it alone? This study evaluated learnability, gesture comprehension, and task completion for new users across four core ordering tasks.
Philz built something distinctive: an ordering experience that mirrors their in-cafรฉ warmth. The risk is that when the design model doesn't match users' mobile mental models, charm becomes friction. The goal here was to measure that gap precisely, so recommendations could be surgical rather than generic "make it more standard."
How We Did It
| PARAMETER | DETAILS |
|---|---|
| Method | Moderated usability sessions (in-person and remote), conducted by 6 members of "Group 2." Each member reviewed sessions and synthesized findings collectively. |
| App Tested | Philz Coffee App (iOS and Android) |
| Participants | 15 sessions recorded; 12 usable for analysis. Participant profile: regular coffee or tea drinkers who had never or rarely used the Philz app. |
| Tasks | 4 core scenarios: store/location selection, coffee customization with modifiers, tea selection + special instructions, cart editing and checkout. |
| Scoring | Each task rated: Success / Success with Difficulty / Did Not Succeed. Gesture log tracked specific interaction behaviors (map vs. search, tap-to-edit recognition, swipe-to-remove attempts). |
| Goal | Assess learnability, gesture comprehension, and task completion for first-time users. Identify friction points specific to Philz's conversational UI model. |
Inconsistent script adherence and participant prompting across 5 moderators likely inflated task success rates. Results should be read as conservative estimates of friction; the real numbers may be higher. This is a direct argument for standardized moderation protocols in future rounds.
Task Scenarios
Select a pick-up location in San Francisco, CA.
Add a medium "Oatmeal Cookie Cold Brew" with light almond milk and sweet honey. All three modifiers required.
Add any tea of choice, then add a special instruction: "Please do not use the teabag for brewing, use a mesh/filter to avoid microplastics."
Remove the cold brew from cart, add a second identical tea, then proceed to checkout. Stop at payment details.
Moderator's Guide
The moderator script was designed to establish a low-pressure environment and encourage authentic think-aloud behavior. Moderators were instructed not to guide, confirm, or correct. Observe and document only.
What Philz Gets Right
Before the moderated usability study, I evaluated both the Philz and Starbucks apps against Nielsen's 10 Usability Heuristics in my Interaction Design course (60503, Prof. Andy Vitale). That work established something important: Philz's conversational interface isn't just a liability. Parts of it are genuinely well-executed. The usability study findings hit harder when you understand what the design is trying to do.
The add-to-cart button delivers one of the more delightful feedback moments in any food ordering app. It reads "I'll Try It!" before the action, then transitions to "YAY!" after the item is added. It's a small interaction, but it confirms the action immediately and in a voice that matches the brand. Users know it worked.
The conversational sentence structure mirrors how you'd actually speak to a Philz barista. For users who are already Philz fans, this feels native and warm. The language is intentional, not accidental, and it succeeds at reinforcing the brand's in-cafรฉ personality.
The ordering screen is uncluttered. There are no competing calls to action, no promotional noise, no loyalty point banners fighting for attention mid-order. Several participants noted the interface looked "cute" and "simple" on first impression.
Starbucks allows order cancellation post-placement. Philz does not. Once an order is submitted, users have no exit. In a heuristic evaluation this is a clear failure of user control; in a live usability session, it's the kind of thing that would cause real anxiety.
Underlined text on the web means a link. In native apps, tappable elements typically look like buttons, chips, or labeled controls. Philz's underlined modifier words violate this convention so thoroughly that 5 of 12 participants in the usability study never discovered them without struggle.
The Special Instructions field silently caps input with no character counter and no warning. This isn't just a usability failure; it's an error prevention failure. Users believe they've communicated a preference when they haven't. For someone with a dietary restriction, that gap matters.
Philz's design is trying to do something real: bring the warmth of their cafรฉ into a mobile experience. That's a legitimate design goal, and in places it works. The problem is that a handful of execution decisions โ underlined modifiers as interactive elements, no character counter, a carousel that obscures the full menu โ undermine the experience for anyone who isn't already a Philz regular. The usability study quantified exactly where those breakdowns happen.
What Happened
| TASK | DESCRIPTION | โ SUCCESS | โก DIFFICULTIES | โ FAIL | FRICTION RATE |
|---|---|---|---|---|---|
| Task 1 | Select pickup location | 12 | 0 | 0 | 0% |
| Task 2 | Add coffee with modifiers | 6 | 5 | 1 | 50% |
| Task 3 | Add tea with special instruction | 9 | 3 | 0 | 25% |
| Task 4 | Remove coffee, duplicate tea, checkout | 7 | 4 | 1 | 42% |
Friction Patterns
Users' default mobile behaviors (tapping images, expecting dropdown menus) didn't match Philz's interaction model. Tappable underlined words were read as static text or decorative styling, not buttons.
Only 7 of 12 users noticed modifiable text immediatelyModifier labels like "creamy," "light," and "sweet" are evocative but ambiguous. Without scale anchors or definitions, participants slowed down, second-guessed, and asked aloud what amounts these corresponded to.
"What is the quantity of 'creamy'?" (direct participant quote)The default "Recommended" carousel view presented an infinite scroll of featured items. Users seeking a specific drink expected search, not endless horizontal browsing. Three users couldn't find the full menu without help.
3 users couldn't find full menu immediatelyThe Special Instructions field enforces a character limit but shows no counter and gives no feedback when the limit is hit. Users continued typing, unaware their request was being silently truncated.
"Nothing pops up showing I've run out of space."In Their Words
The first two steps kind of lock you out of doing anything else, which I don't love. What if I wanted to make an order really quick? You just want to get to it.
Do they not have a search? That to me is a problem.
Those are underlined, looks like hyperlinks. Looks like I'm supposed to click and edit them, but nowhere does it tell me to do that.
The adjectives, I understand from a branding standpoint make it look cute, but it's very difficult for ordering a drink. 'Creamy' and 'Almond Milk' don't make sense together.
Oh, I'm out of characters! I can't type anymore. Nothing pops up on the screen showing that I've ran out of space in the text box, so I keep unknowingly typing.
I know the same things that cause confusion also help to keep it clean. I like the set up if you know what you are doing. After the first time you would know.
I much prefer to use checkboxes and radio buttons. I don't understand the difference between: sweet, medium, and light. Does 'sweet' just mean a lot?
I like the intro, looks very cute. Looks pretty simple to use.
What Philz Built vs. What Users Did
A conversational sentence structured like a verbal order: "I'll have a large, with creamy oat milk and sweet sugar, iced." Tappable, underlined, bolded words allow the user to modify size, milk type, and sweetener inline โ an extension of Philz's in-cafรฉ voice.
- Tapped the product image, expecting a tap-through
- Opened Special Instructions instead of modifiers
- Added to cart and tried to edit using the cart's "Edit" button
- Did not recognize underlined/bold words as interactive
- Discovered modifiers by accident, only after extended exploration
Region-based store labels ("San Francisco" vs. "South San Francisco") are ambiguous in a scrollable list. All 12 participants confirmed a store without verifying the full address โ only 3 were familiar enough with the area to recognize store names by neighborhood.
Notable: Corte Madera โ ~30 minutes north โ was grouped under the San Francisco region.
Show full address and city tag prominently in the store list. Correct geographic groupings so regional labels are accurate (Corte Madera โ San Francisco). Consider distance from current location as a default sort.
Tapping the down-arrow on the quantity "1" in the cart opens a bottom drawer โ a modal sheet that slides up from the bottom of the screen with stepper controls (+/-) and a "Choose Quantity" confirmation button.
A classic dropdown selector or inline +/- stepper โ standard patterns that are immediately recognizable. The bottom drawer was a surprise behavior that many participants didn't discover, leading to failed or abandoned attempts at quantity modification.
What to Fix
- Add a caret / edit icon next to tappable words
- Show a one-time first-run tooltip: "Tap words to customize"
- Better visually distinguish modifiers from static text using weight and color
- Replace creamy/light/sweet with standard scales or numeric labels
- Add microcopy explaining each level (e.g., "creamy = ยพ cream")
- Show a plain-language summary of selected modifiers before adding to cart
- Stop the "Recommended" carousel from infinite scroll; end with a "See All" card
- Show character count in the Special Instructions field
- Implement a standard stepper UI directly on the cart line item