How to Approach Any System Design Interview: A Step-by-Step Framework

If you’ve ever walked into a system design interview and immediately started drawing microservices, load balancers, and Kafka clusters—only to have the interviewer stop you 10 minutes in and ask, “Wait, what problem are we actually solving?”—you’re not alone.

This is the #1 mistake candidates make.

The truth is, system design interviews aren’t about memorizing Netflix’s architecture or knowing every AWS service. They’re about demonstrating structured thinking, clear communication, and justified decision-making.

In this guide, I’ll walk you through the exact 7-step framework that works for any system design question—whether you’re designing a URL shortener, Twitter, or a video streaming platform.

Table of Contents

Why Most Candidates Fail (And How to Avoid It)

Before we dive into the framework, let’s address the three critical mistakes that cause instant rejection:

Mistake #1: Jumping Straight to Architecture

You hear “Design Instagram” and immediately start drawing boxes labeled “Microservices,” “Kafka,” “Redis,” and “Kubernetes.”

The problem? You haven’t clarified what you’re actually building. Are we focusing on the photo upload flow? The feed generation? Direct messaging? Each would lead to a different architecture.

The fix: Always start with requirements clarification (Step 1 below).

Mistake #2: Over-Engineering Without Justification

You design for 10 billion users when the interviewer mentioned 10 million. You add complex distributed tracing before discussing basic database choices.

The problem? You’re solving problems that don’t exist yet. Interviewers want to see you match complexity to actual requirements.

The fix: Let the scale and requirements drive your design decisions, not the other way around.

Mistake #3: Not Articulating Trade-offs

You pick Cassandra over PostgreSQL, or choose eventual consistency over strong consistency, but you never explain why.

The problem? Every architectural decision involves trade-offs. Not acknowledging them shows a lack of depth.

The fix: For every major decision, explicitly state what you’re gaining and what you’re giving up.

The 7-Step Framework

Here’s the structured approach that will work for any system design interview:

1. Clarify Requirements
2. Back-of-the-Envelope Estimation
3. Define the API
4. High-Level Design
5. Deep Dive into Components
6. Identify Bottlenecks & Scale
7. Discuss Trade-offs

Let’s break down each step with a concrete example: Designing Instagram.

Step 1: Clarify Requirements (3-5 minutes)

The interviewer gives you something vague like “Design Instagram.” Your job is to turn this into a clear, scoped problem.

Don’t assume. Ask questions. Write down the answers.

I break requirements into two categories:

Functional Requirements

What features are we building?

For Instagram, you might clarify:

Upload photos
Follow/unfollow users
View personalized feed
Like and comment on photos

Non-Functional Requirements

What are the scale, performance, and reliability expectations?

Examples:

Scale: 500 million active users
Latency: Feed loads in under 1 second
Availability: 99.9% uptime
Storage: Support for high-resolution photos

Why this matters: If you skip this step, you might design for the wrong problem. The interviewer is testing whether you can handle ambiguity and ask intelligent questions.

Pro tip: Write these down in two columns (Functional | Non-Functional) on your whiteboard. It keeps you organized and shows structured thinking.

Step 2: Back-of-the-Envelope Estimation (3-5 minutes)

This is where many candidates panic, but it’s easier than you think. You’re not doing calculus—you’re just estimating orders of magnitude.

Example Calculation for Instagram:

Users & Activity:

500 million total users
10% are daily active users (DAU) = 50 million DAU
Each user uploads 1 photo per day on average

Storage:

50 million photos per day
Average photo size: 2 MB
Daily storage: 50M × 2MB = 100 TB/day
Annual storage: 100TB × 365 = ~36 PB/year

Traffic:

Read-heavy system (users view far more than they upload)
Assume 100:1 read-to-write ratio
50M writes/day = ~580 writes/second
5B reads/day = ~58,000 reads/second

What This Tells You:

✅ You need distributed storage (can’t use a single database)
✅ You need caching (58K reads/sec is too much for DB alone)
✅ You need CDN for photo delivery (reduce latency globally)

Why this matters: These calculations guide your architectural decisions. You’re not pulling Redis or Cassandra out of thin air—you’re showing the math that justifies them.

Step 3: Define the API (2-3 minutes)

This step is optional, but I always do it because it forces you to think about data flow and inputs/outputs.

Pick 2-3 core APIs. Don’t try to design every endpoint.

Example for Instagram:

POST /api/v1/upload
Request:
  - userId: string
  - photo: binary
  - caption: string
Response:
  - photoId: string
  - uploadUrl: string

GET /api/v1/feed
Request:
  - userId: string
  - pageToken: string (for pagination)
Response:
  - photos: [Photo]
  - nextPageToken: string

POST /api/v1/follow
Request:
  - userId: string
  - targetUserId: string
Response:
  - success: boolean

Why this matters: It shows you understand the system from the client’s perspective. It also makes the next step (high-level design) easier because you know what data needs to flow where.

Step 4: High-Level Design (5-7 minutes)

Now comes the fun part: drawing your architecture.

Start Simple, Then Layer Complexity

Step 4a: The Basics

Client → API Server → Database

Step 4b: Add Components Based on Requirements

Based on our estimation:

Need to handle 58K reads/sec → Add Load Balancer + Cache
Need to store 36PB/year of photos → Add Object Storage (S3)
Need low-latency global access → Add CDN
Need to handle user data + relationships → Add SQL Database
Need to handle feed data at scale → Add NoSQL Database

Final High-Level Architecture:

                    ┌─────────┐
                    │ Client  │
                    └────┬────┘
                         │
                    ┌────▼─────────┐
                    │Load Balancer │
                    └────┬─────────┘
                         │
              ┌──────────┴──────────┐
              │                     │
         ┌────▼────┐           ┌───▼────┐
         │ API     │           │ API    │
         │ Server 1│           │Server 2│
         └────┬────┘           └───┬────┘
              │                    │
      ┌───────┼────────────────────┤
      │       │                    │
  ┌───▼───┐ ┌─▼──────┐   ┌────────▼─────┐
  │ Cache │ │SQL DB  │   │Object Storage│
  │(Redis)│ │(Users, │   │  (S3/Photos) │
  └───────┘ │Follow) │   └──────┬───────┘
            └────────┘           │
                           ┌─────▼─────┐
            ┌──────────┐   │    CDN    │
            │NoSQL DB  │   └───────────┘
            │ (Feeds)  │
            └──────────┘

The Key: Explain why you’re adding each component.

Load Balancer: Distribute traffic across multiple API servers
Cache (Redis): Reduce database load for frequently accessed data (hot feeds)
SQL Database: Store structured data (users, follows, likes) with ACID guarantees
NoSQL Database: Store feed data that needs horizontal scaling
Object Storage: Cost-effective storage for binary data (photos)
CDN: Serve static content from edge locations (reduce latency)

Step 5: Deep Dive into Components (5-7 minutes)

The interviewer will pick 1-2 areas and say, “Let’s go deeper.”

This is where trade-offs become critical.

Example Deep Dive: Database Choices

Why SQL (PostgreSQL) for User Data?

✅ ACID transactions (critical for money/follows)
✅ Strong consistency
✅ Relationships (users, follows, likes)
✅ Well-understood query patterns
❌ Harder to scale horizontally

Why NoSQL (Cassandra) for Feed Data?

✅ Horizontal scalability (partition by userId)
✅ High write throughput
✅ Eventual consistency acceptable for feeds
❌ No complex joins
❌ No ACID guarantees

You might use both. User profiles and relationships in SQL. Activity feeds in NoSQL.

Example Deep Dive: Caching Strategy

What to cache?

User profiles (rarely change)
Hot feeds (top 100 posts for active users)
Follower counts

Cache invalidation strategy:

Write-through: Write to cache and database simultaneously (slower writes, consistent reads)
Write-back: Write to cache first, async to database (faster writes, risk of data loss)
TTL-based: Set time-to-live for cached data (good for feeds that can tolerate staleness)

For Instagram feeds: Use TTL-based caching with 30-second expiry. Users won’t notice if their feed is 30 seconds old, but you’ll massively reduce database load.

Step 6: Identify Bottlenecks & Scale (3-4 minutes)

The interviewer will ask: “What happens when you hit 10x traffic?”

Walk through potential failure points:

Bottleneck 1: Database

Problem: A single SQL database can’t handle 500K reads/sec.

Solutions:

Read Replicas: Separate read and write traffic
Sharding: Partition users across multiple databases (shard by userId)
Cache Layer: Reduce database hits by 80-90%

Bottleneck 2: API Servers

Problem: Traffic spikes during major events (celebrity posts, etc.)

Solutions:

Horizontal Scaling: Add more API servers behind load balancer
Auto-scaling: Use Kubernetes/ECS to scale based on CPU/memory

Bottleneck 3: Photo Storage

Problem: 36PB/year keeps growing.

Solutions:

Cold Storage: Move old photos to cheaper storage (S3 Glacier)
Compression: Optimize images on upload
Smart Serving: Serve different resolutions based on device

Why this matters: You’re showing you can think beyond the happy path. Real systems fail, and you need to anticipate where and how.

Step 7: Wrap Up & Discuss Trade-offs (2-3 minutes)

Summarize your design in 30 seconds. Then mention 1-2 trade-offs.

Example Summary:

“We designed a scalable Instagram architecture using a microservices approach. User data and relationships go into PostgreSQL for consistency. Feeds are stored in Cassandra for horizontal scaling. Photos are stored in S3 and served via CDN for low latency. We cache hot feeds in Redis to handle 58K reads/sec.”

Trade-offs to Mention:

1. Eventual Consistency in Feeds

✅ Gain: Better latency and scalability
❌ Cost: Users might see slightly stale data (a post from 30 seconds ago might not appear immediately)

2. Microservices Architecture

✅ Gain: Independent scaling, team autonomy
❌ Cost: Operational complexity, distributed debugging

3. NoSQL for Feeds

✅ Gain: Massive write throughput
❌ Cost: No ACID, eventual consistency

Why this matters: Acknowledging trade-offs shows maturity. There’s no perfect architecture—only architectures that match your requirements and constraints.

What Interviewers Are Really Evaluating

Remember, the interviewer isn’t looking for the “correct” architecture (there isn’t one). They’re evaluating:

Can you handle ambiguity? (Step 1: Requirements)
Can you think quantitatively? (Step 2: Estimation)
Can you justify decisions? (Steps 4-7: Design + Trade-offs)
Can you communicate clearly? (All steps)
Do you know when to stop? (Not over-engineering)

How to Practice This Framework

Pick a system (Twitter, Uber, Netflix, URL shortener)
Set a timer for 45 minutes
Walk through all 7 steps out loud (yes, talk to yourself—it matters)
Draw on a whiteboard or Excalidraw
Review: What did you miss? Where did you over-complicate?

Repeat this 10-15 times with different systems. The framework will become second nature.

Common Systems to Practice On

Easy: URL Shortener, Pastebin, Rate Limiter
Medium: Twitter, Instagram, Notification System
Hard: Uber, Google Drive, YouTube, Distributed Cache

Start easy. Build muscle memory. Then tackle the hard ones.

Final Thoughts

System design interviews feel intimidating because they’re open-ended. But that’s exactly why this framework works—it gives you structure in ambiguity.

You’re not memorizing architectures. You’re learning a thinking process that applies to any problem.

Master this framework, and you’ll walk into your next system design interview with confidence.

What’s Next?

If you found this helpful, I’m working on a comprehensive course where we’ll apply this exact framework to 15+ real interview problems—complete with diagrams, deep dives, and trade-off discussions.

For now, practice this framework religiously. Print it out. Pin it to your wall. Use it in mock interviews.

You’ve got this.

Want to go deeper? Watch the full video walkthrough where I demonstrate this framework step-by-step using Excalidraw diagrams: [Link to Video]

Questions or feedback? Drop a comment below or reach out—I read everything.

Happy learning, and good luck with your interviews!

— Nitin

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Why Most Candidates Fail (And How to Avoid It)

Mistake #1: Jumping Straight to Architecture

Mistake #2: Over-Engineering Without Justification

Mistake #3: Not Articulating Trade-offs

The 7-Step Framework

Step 1: Clarify Requirements (3-5 minutes)

Functional Requirements

Non-Functional Requirements

Step 2: Back-of-the-Envelope Estimation (3-5 minutes)

Example Calculation for Instagram:

What This Tells You:

Step 3: Define the API (2-3 minutes)

Example for Instagram:

Step 4: High-Level Design (5-7 minutes)

Start Simple, Then Layer Complexity

Final High-Level Architecture:

Step 5: Deep Dive into Components (5-7 minutes)

Example Deep Dive: Database Choices

Example Deep Dive: Caching Strategy

Step 6: Identify Bottlenecks & Scale (3-4 minutes)

Bottleneck 1: Database

Bottleneck 2: API Servers

Bottleneck 3: Photo Storage

Step 7: Wrap Up & Discuss Trade-offs (2-3 minutes)

Example Summary:

Trade-offs to Mention:

What Interviewers Are Really Evaluating

How to Practice This Framework

Common Systems to Practice On

Final Thoughts

What’s Next?

You Might Also Like

Mastering Distributed Monitoring: A Key to Acing Your System Design Interview

Common HLD System Design Interview Terms You Should Know: A Complete Guide for Engineering Interviews

System Design Interview Series #1: Building a Social media Feed | Senior Software Engineer Prep Guide

Leave a Reply Cancel reply