18 min read

How to measure feature bloat: the four metrics that matter

You cannot manage what you cannot measure. But you also cannot measure what you cannot name. Most product teams measure feature adoption in aggregate: how many times was feature X used last month. That tells you whether the feature exists in usage. It does not tell you whether it is pulling its weight, creating cognitive load, or silently degrading your product's performance. These four metrics tell you that.

64%

of enterprise features rarely/never used

Standish Group, 2002

12%

of features used often, per Pendo

Pendo Feature Adoption Report, 2019

01. Feature usage rate (per user, per feature, per week)

Feature usage rate (per user, per feature, per week)

Feature usage rate is the percentage of active users who interact with a given feature in a given time period. The key word is per user. Aggregate counts are misleading: a feature used 10,000 times per month by 50 power users looks identical in aggregate to a feature used 10,000 times per month by 5,000 users. The first is a niche feature serving a small segment. The second is a widely-adopted feature. They require completely different responses.

The Standish Group CHAOS research from 2002 gives the baseline: across enterprise applications, 7% of features are used always, 13% often, 16% occasionally, 19% rarely, and 45% never. This means the median enterprise feature is in the 'rarely or never' category. Your product is almost certainly not an outlier.

Pendo's 2019 Feature Adoption Report, based on 180 million users across 35,000 applications, found only 12% of features used 'often.' The distribution was: 12% often, 15% sometimes, 73% rarely or never. This is more recent data and confirms the Standish finding with a different methodology.

How to instrument: fire an analytics event whenever a user interacts with a feature. Distinguish between passive viewing (feature visible on screen) and active use (user clicked, typed, or initiated the feature's primary action). Passive views are not usage. Then compute: unique active users per feature per week, divided by total active users. This is your per-feature adoption rate. Sort ascending. The bottom quintile is your audit starting point.

Benchmarks

Healthy: >30% of active users touch the feature per week. Warning: 5-30%. Candidate for audit: <5%. Candidate for deprecation: <2% for 90 days.

02. Active feature count per user

Active feature count per user

How many of your N features does the median user actually touch per month? This is a different question from feature usage rate. Usage rate tells you which features are used. Active feature count tells you how wide the experience is for the median user.

Most products exhibit a power law distribution in feature adoption. The median user uses 20-30% of available features. A small cohort of power users uses 60-70%. New users use fewer than 10% in their first month. This distribution is not unusual; it is expected. What is unusual is when the active feature count for the median user is trending downward over time, which is a signal that complexity is increasing faster than adoption.

Contrast Basecamp with Notion. Basecamp has a relatively small and stable feature set. The median Basecamp user uses most of its features because there are not that many to choose from. Notion has a much larger feature surface. The median Notion user uses a fraction of what Notion offers, which is fine if those users are satisfied, but creates a growing maintenance burden for features that most users will never discover.

How to compute: for each user, count the distinct feature types they have interacted with in the last 30 days. Divide by total feature count. Plot the distribution. The median of this distribution is your active feature count ratio. Watch the trend over time: if it is declining as you add features, users are not adopting the new features at the same rate you are shipping them.

Benchmarks

Healthy: median user adopts 25%+ of features. Warning: median below 15% and declining. Action required: median below 10% with high churn correlation.

03. Time-to-interactive and time-to-first-value

Time-to-interactive and time-to-first-value

Feature bloat degrades Time-to-Interactive (TTI) through two mechanisms. The first is JavaScript weight: each feature shipped as a client-side component adds to your bundle size, which increases the time between a user requesting the page and the page being interactive. The second is UI complexity: more features on the screen means more rendering time, more event listeners, and more state management on each page load.

Google's Core Web Vitals framework sets TTI thresholds: good is under 3.8 seconds, needs improvement is 3.8-7.3 seconds, poor is over 7.3 seconds. These thresholds affect Google rankings for pages. For web applications, the TTI impact of feature bloat is not just a UX problem; it is a search engine problem.

Time-to-First-Value (TTFV) is the product metric version of TTI: how long does it take a new user to reach their first success in the product? Feature bloat extends TTFV because it adds steps and complexity to the onboarding flow. A user who must configure 15 settings before seeing value will abandon more often than a user who sees value in three steps. Evernote's documented onboarding drop-off in its later years correlates directly with the feature accumulation that made the product harder to get started with.

How to measure: instrument your onboarding funnel step by step. Identify the first action that correlates strongly with long-term retention (your 'aha moment'). Measure the time from account creation to that action. Track week-over-week. Feature additions that increase this time signal onboarding complexity.

Benchmarks

Good TTFV: under 10 minutes to first success for a new user. Warning: 10-30 minutes. Action required: over 30 minutes or increasing trend.

04. Retention cohorts by feature adoption

Retention cohorts by feature adoption

The most powerful and counterintuitive metric: do users who touch more features retain better, or worse, than users who touch fewer? The conventional assumption is that feature adoption correlates with retention. More features adopted means more invested, means harder to churn. This is true for some features: features that embed a user's data or create habits (daily notes in Evernote, channel structure in Slack) do correlate with retention.

The counterintuitive finding in multiple Pendo and Amplitude datasets: users who touch more features in their first 30 days do not always retain better than users who touch fewer. For some products, users who adopt many features in the first week have higher short-term retention and higher long-term churn: they are explorers who lose interest once the exploration is complete, not committed users who have integrated the product into their workflow.

The correct analysis: segment your features by the retention correlation of the users who adopt them. Which features, when adopted in the first 30 days, predict 90-day retention? These are your sticky features. Build more of those. Which features, when adopted, correlate with zero retention improvement? These are candidates for removal or de-emphasis in onboarding.

How to build the cohort analysis: create two cohorts: users who adopted feature X in their first 30 days, and users who did not. Compare 30-day, 60-day, and 90-day retention between cohorts. Repeat for every feature. The output is a ranked list of features by retention correlation. This list is the most actionable output any product team can generate.

Benchmarks

High-value sticky features: 15%+ retention improvement in cohort. Neutral: less than 5% improvement. Remove candidates: zero or negative correlation.

05. The tools

What each analytics tool is actually good at

Pendo

Feature adoption out of the box. Excellent per-feature usage reports without custom instrumentation. The right tool if you want the Standish-style analysis without engineering work.

Amplitude

Retention cohorts and funnel analysis. Best for the cohort-by-feature-adoption analysis. Requires instrumentation but gives you the most flexible query layer.

Mixpanel

Event-based analysis with strong segmentation. Good for power users building custom reports. Steeper learning curve than Pendo.

PostHog

Open-source, self-hosted option. All the same capabilities as Mixpanel with the option to keep data on your infrastructure. Good for companies with strict data residency requirements.

Heap

Retroactive analysis: instruments every interaction automatically, allowing you to query historical data without re-instrumentation. Useful when you realise after the fact that you needed a metric you did not instrument.