Skip to content

Chapter 35

Testing in the Interview

An interview-round skill chapter on testing coding solutions live: happy paths, minimal inputs, empty cases, duplicates, negative and extreme values, malformed input, stateful behavior, randomized reasoning, and regression tests after fixes.

Part IV - Coding Interview Mastery Coding fluencyProblem framingExecutionCommunication and reflectionProduction judgment CodingPractical CodingDebuggingSenior Interview 55 min ready
Jump around the book
On this page

What live testing is really evaluating

Testing in a coding interview is not a ceremonial final step. It is how the interviewer sees whether you understand your own solution.

A candidate who writes code and says “this should work” has left the interviewer to do the verification. A senior candidate chooses tests that target the contract, the invariant, and the most likely failure points. The tests do not need a full framework. They need intent.

Testing evaluates whether you can:

  • translate requirements into observable examples;
  • identify the smallest cases that exercise the invariant;
  • include duplicates, empty inputs, negative values, extremes, and malformed input when the domain allows them;
  • test state across calls or iterations;
  • use randomized or property-style reasoning when exact expected outputs are cumbersome;
  • add regression tests after a bug fix instead of moving on with hope.

The strongest tests are not numerous. They are diagnostic.

What senior-level performance looks like

Weak testing:

“The sample works, so I think we are good.”

Mid-level testing:

“I will test the sample, an empty input, and maybe a big input.”

Senior testing:

“I want one normal case, one smallest valid case, one empty case, one duplicate case because the map stores counts, and one negative-value case because my sentinel must not collide with real data. If a test fails, I will keep that case as a regression after fixing the code.”

Senior candidates do three things differently:

  • They choose tests from the solution’s risk profile.
  • They explain what each test is meant to catch.
  • They use failures to improve the test set, not just the implementation.

The operating model

Use a four-pass testing loop.

Pass Purpose Example question
Contract tests Does the function satisfy the stated behavior? “Does the output shape and order match the prompt?”
Boundary tests Does it handle minimal, empty, duplicate, negative, and extreme inputs? “What is the smallest input that should work?”
Invariant tests Does the core algorithm preserve its model? “What case breaks if left moves backward?”
Regression tests Does a discovered bug stay fixed? “Can I rerun the failing case after the correction?”

In a 45-minute round, you may not run ten tests. You can still select three to five high-value cases and say what additional cases you would run if this were production code.

Essential testing knowledge

Happy path

The happy path is a representative valid input where the main algorithm should succeed.

It proves only that the implementation can work in ordinary conditions. It does not prove the boundary behavior. Use it to establish the baseline, then move quickly to more diagnostic cases.

Smallest valid input

The smallest valid input often catches initialization bugs:

  • one element in an array;
  • one node in a tree;
  • one interval;
  • one character;
  • one row or one column in a grid;
  • k = 1 for top-k prompts.

If the smallest valid input fails, the main invariant is usually not anchored correctly.

Empty input

Empty input catches missing guards and incorrect default outputs.

Examples:

  • empty list should return [], 0, None, or raise depending on the contract;
  • empty string may be a valid answer, not malformed input;
  • empty grid may require checking rows before columns;
  • empty map state may need reset behavior.

Do not assume empty input is in scope. Ask when it changes the function contract.

Duplicates

Duplicates test whether the solution tracks membership, count, or position correctly.

Examples:

  • two equal values in two-sum;
  • repeated characters in a sliding window;
  • duplicate intervals;
  • equal frequencies in top-k;
  • repeated graph edges.

A set-based solution often fails when multiplicity matters. A map-based solution often fails when update order matters.

Negative and extreme values

Negative values and extremes test sentinel choices, ordering, overflow risk, and numeric assumptions.

Use these when the domain includes:

  • negative numbers;
  • zero as a meaningful value;
  • very large integers;
  • min/max timestamps;
  • float precision;
  • k = 0, k = n, or k > n;
  • values at the boundary of comparison logic.

If the language has fixed-width integers, mention overflow risk when constraints approach the limit.

Malformed input where applicable

Malformed input is not always part of an algorithm round. When it is in scope, test it deliberately instead of mixing it into normal cases.

Examples:

  • interval with start > end;
  • grid rows with inconsistent lengths;
  • null input;
  • missing fields in records;
  • invalid characters in a parser;
  • k outside the accepted range.

If the prompt says input is valid, do not spend the round building a validation subsystem. State the assumption and keep moving.

Stateful cases

Stateful tests matter when the code stores data across calls, mutates input, uses class fields, caches, recursion state, or shared collections.

Examples:

  • call the same object twice and verify counts reset or accumulate intentionally;
  • run two test cases after a backtracking solution and confirm paths are not shared;
  • mutate an input after a call only if mutation is part of the contract;
  • verify visited state does not leak between connected-component searches.

State leaks are senior-level bugs because they often pass the first sample and fail only under repeated use.

Randomized and property-style reasoning

Some outputs are hard to compare exactly. Use properties when they are clearer than one expected value.

Examples:

  • sorting: output is ordered and contains the same multiset as input;
  • merge intervals: output intervals are non-overlapping and cover the same points as input;
  • top-k: every returned item has frequency at least as high as every omitted item;
  • shortest path: returned distance is no greater than any manually constructed path for a tiny graph;
  • randomized small arrays: compare optimized solution against a brute-force implementation.

In an interview, you may only describe the property test. That still shows senior verification judgment.

Worked example

Prompt:

“Return the length of the longest substring without repeating characters.”

Core invariant:

The window from left to right contains no duplicate characters after each iteration.

High-value tests:

Test Input Expected What it catches
Happy path "abcabcbb" 3 Basic sliding-window behavior.
Empty input "" 0 Default answer and loop guard.
Smallest valid "a" 1 Initialization.
All duplicates "bbbbb" 1 Repeated shrink/update behavior.
Left must not move backward "abba" 2 Last-seen index update bug.
Later unique run "pwwkew" 3 Repeated character inside active window.

The most diagnostic case is "abba". A common bug sets left = last_seen[char] + 1 without checking whether the prior occurrence is inside the active window. On the final a, that moves left backward and produces an invalid answer.

A senior testing explanation:

“After the sample, I want to run abba because it specifically tests that left never moves backward. If that fails, the fix is not a random rewrite; it means the update needs left = max(left, last_seen[char] + 1). I would keep abba as the regression case.”

The test is tied to the invariant, not chosen from a generic edge-case list.

Annotated interaction

Testing during a live coding round

Interviewer: Your solution for longest substring looks complete. How would you test it?

Candidate: “I will start with the sample abcabcbb expecting 3, just to confirm the main path. Then I want the smallest valid case, a, expecting 1, and empty string expecting 0 if empty input is allowed.”

Interviewer: Empty is allowed.

Candidate: “The risky part is repeated characters and the left boundary. bbbbb should return 1. More importantly, abba should return 2; if I ever move left backward on the final a, that case catches it.”

Interviewer: Suppose abba returns 3.

Candidate: “Then the bug is in the repeat handling. I am probably using the last seen index even when it is outside the current window. I will update left with max(left, last_seen[char] + 1) and rerun abba as the regression test before moving on.”

The candidate explains why each test exists, diagnoses the likely bug from the failing case, and preserves the failure as a regression test.

Response quality by maturity

Testing maturity

Weak

Runs only the sample or manually eyeballs the code. Edge cases are guessed after prompting, and failures trigger broad rewrites.

Mid-level

Runs several reasonable cases, but the cases are not tied clearly to the invariant or the implementation risks.

Senior

Selects diagnostic tests from the contract and invariant, covers boundary and state risks, explains expected results, and adds regression tests after fixes.

Test matrix

Use this matrix as a selection tool, not a requirement to run every row.

Category Use when Example Expected signal
Happy path Every prompt. Typical valid input from the prompt. Main behavior works.
Smallest valid input The domain has a minimum size. One item, one node, one interval, one character. Initialization is correct.
Empty input Empty is valid or needs clarification. [], "", empty grid. Guard and default output are correct.
Duplicates Membership, counts, ordering, or positions matter. Repeated value, repeated char, equal priority. Multiplicity and tie behavior are correct.
Negative values Numeric domain allows negatives. [-3, -1, 0, 2]. Sentinels and comparisons are safe.
Extreme values Constraints approach limits. Large n, max integer, k = n. Complexity and numeric assumptions hold.
Malformed input Prompt includes validation or real-world parsing. Bad interval, null record, invalid token. Contract handles invalid data deliberately.
Stateful sequence Code stores or mutates state. Two calls on same object, two DFS components. State does not leak accidentally.
Unordered output Output order is irrelevant. All valid pairs or grouped anagrams. Comparison checks sets or multisets correctly.
Property-style Exact output is hard or many outputs are valid. Sorted result, merged intervals, randomized brute-force comparison. General correctness properties hold.
Regression A bug was found. The failing input after the fix. The fix targets the defect and stays fixed.

Senior trade-offs

Testing enough without losing the clock

Testing can consume the end of the round if it is unbounded. Select tests by risk:

  1. sample or happy path;
  2. smallest or empty boundary;
  3. one case aimed at the core invariant;
  4. one case aimed at a known implementation risk;
  5. regression case if a bug appeared.

If time is short, say the unrun tests explicitly:

“Given the time, I ran the sample and the duplicate boundary. I would also test empty input and negative values if negatives are in scope.”

That is better than pretending the solution is fully validated.

Testing invalid input without overbuilding

If the interviewer says input is valid, do not turn the solution into a parser or validator. State:

“I am assuming valid intervals as given. If this were a production API, I would validate start <= end before the merge step.”

If malformed input is part of the prompt, isolate validation so it does not obscure the algorithm.

Comparing unordered outputs

Some correct answers can appear in multiple orders. Do not write tests that fail because of accidental ordering unless order is required.

Examples:

  • compare sets for unique unordered values;
  • compare sorted lists for unordered pairs;
  • compare multisets when duplicates matter;
  • check properties for graph traversals with multiple valid paths.

Say what equality means before testing.

Using brute force as an oracle

For optimized algorithms, a brute-force version can be a powerful test oracle on small inputs.

Example:

“For random arrays up to length eight, I can compare this O(n log n) implementation against a straightforward O(n²) checker. I would not run that fully in this interview, but it is the property-style test I would use to harden the solution.”

This shows verification depth without derailing the live implementation.

Failure modes and red flags

Common testing failures:

  • treating the sample as proof;
  • listing edge cases without expected outputs;
  • choosing cases unrelated to the algorithm’s risks;
  • ignoring empty input after using items[0];
  • missing duplicates when using sets or maps;
  • missing negative values after choosing -1 as a sentinel;
  • testing malformed input even though the prompt guarantees validity and time is tight;
  • failing to rerun a case after a fix;
  • comparing unordered outputs as ordered outputs;
  • using random tests without a property or oracle;
  • forgetting stateful repeated-call behavior.

Interviewer red flags include:

  • “The candidate did not know what would break their solution.”
  • “They fixed a bug but did not rerun the failing input.”
  • “Their tests were generic and missed the invariant.”
  • “They relied on visual inspection instead of expected results.”

Practice drills

Live testing drills

  • For ten solved prompts, write a five-row test matrix with expected outputs and the bug each case targets.
  • Take three sliding-window problems and identify the one case that catches boundary regression.
  • Take three graph problems and test empty graph, single node, disconnected components, and repeated edges where applicable.
  • Take three heap or sorting prompts and include a tie case.
  • For one optimized solution, write a brute-force checker for small inputs and compare randomized cases.
  • For one backtracking solution, test that stored outputs are not mutated after recursion unwinds.
  • For one class-based prompt, call methods in two different sequences and check state reset or accumulation.
  • After fixing any bug in practice, add the failing input as a named regression case.

For each drill, write the expected output before running the code. If you cannot predict the output, the test is not yet a test.

Self-check rubric

Testing in the interview rubric

Score Evidence
1 - Weak Runs only samples, lacks expected outputs, misses obvious boundaries, and treats failures as reasons to rewrite broadly.
3 - Usable Covers common cases and some edges, but test selection is partly generic and regression behavior is inconsistent.
5 - Senior-ready Chooses diagnostic tests from the contract, invariant, and implementation risks; explains expected results; tests stateful behavior when relevant; and preserves bug cases as regressions.

Before ending the round, ask:

  • Did I run or describe a happy path?
  • Did I test the smallest valid input?
  • Did I clarify or test empty input?
  • Did I include duplicates when multiplicity matters?
  • Did I include negative or extreme values when the domain allows them?
  • Did I isolate malformed input only when it is in scope?
  • Did I test state across calls, loops, recursion branches, or shared structures when relevant?
  • Did I compare unordered outputs correctly?
  • Did I add a regression case after every fix?

Field reference

Field reference

Testing in the interview

  • Start with the contract, then test the invariant.
  • Use a small set of diagnostic cases rather than a long generic list.
  • Cover happy path, smallest valid input, empty input, duplicates, negative/extreme values, malformed input when applicable, and stateful behavior when relevant.
  • Give expected outputs before running.
  • Tie each test to a likely bug.
  • For unordered outputs, define equality before comparing.
  • Use property-style or brute-force checks when exact outputs are hard.
  • After a fix, rerun the failing case and keep it as a regression.
  • If time runs out, state the high-value tests left unrun.