Chapter 35
Testing in the Interview
An interview-round skill chapter on testing coding solutions live: happy paths, minimal inputs, empty cases, duplicates, negative and extreme values, malformed input, stateful behavior, randomized reasoning, and regression tests after fixes.
Jump around the book
On this page
What live testing is really evaluating
Testing in a coding interview is not a ceremonial final step. It is how the interviewer sees whether you understand your own solution.
A candidate who writes code and says “this should work” has left the interviewer to do the verification. A senior candidate chooses tests that target the contract, the invariant, and the most likely failure points. The tests do not need a full framework. They need intent.
Testing evaluates whether you can:
- translate requirements into observable examples;
- identify the smallest cases that exercise the invariant;
- include duplicates, empty inputs, negative values, extremes, and malformed input when the domain allows them;
- test state across calls or iterations;
- use randomized or property-style reasoning when exact expected outputs are cumbersome;
- add regression tests after a bug fix instead of moving on with hope.
The strongest tests are not numerous. They are diagnostic.
What senior-level performance looks like
Weak testing:
“The sample works, so I think we are good.”
Mid-level testing:
“I will test the sample, an empty input, and maybe a big input.”
Senior testing:
“I want one normal case, one smallest valid case, one empty case, one duplicate case because the map stores counts, and one negative-value case because my sentinel must not collide with real data. If a test fails, I will keep that case as a regression after fixing the code.”
Senior candidates do three things differently:
- They choose tests from the solution’s risk profile.
- They explain what each test is meant to catch.
- They use failures to improve the test set, not just the implementation.
The operating model
Use a four-pass testing loop.
| Pass | Purpose | Example question |
|---|---|---|
| Contract tests | Does the function satisfy the stated behavior? | “Does the output shape and order match the prompt?” |
| Boundary tests | Does it handle minimal, empty, duplicate, negative, and extreme inputs? | “What is the smallest input that should work?” |
| Invariant tests | Does the core algorithm preserve its model? | “What case breaks if left moves backward?” |
| Regression tests | Does a discovered bug stay fixed? | “Can I rerun the failing case after the correction?” |
In a 45-minute round, you may not run ten tests. You can still select three to five high-value cases and say what additional cases you would run if this were production code.
Essential testing knowledge
Happy path
The happy path is a representative valid input where the main algorithm should succeed.
It proves only that the implementation can work in ordinary conditions. It does not prove the boundary behavior. Use it to establish the baseline, then move quickly to more diagnostic cases.
Smallest valid input
The smallest valid input often catches initialization bugs:
- one element in an array;
- one node in a tree;
- one interval;
- one character;
- one row or one column in a grid;
k = 1for top-k prompts.
If the smallest valid input fails, the main invariant is usually not anchored correctly.
Empty input
Empty input catches missing guards and incorrect default outputs.
Examples:
- empty list should return
[],0,None, or raise depending on the contract; - empty string may be a valid answer, not malformed input;
- empty grid may require checking rows before columns;
- empty map state may need reset behavior.
Do not assume empty input is in scope. Ask when it changes the function contract.
Duplicates
Duplicates test whether the solution tracks membership, count, or position correctly.
Examples:
- two equal values in two-sum;
- repeated characters in a sliding window;
- duplicate intervals;
- equal frequencies in top-k;
- repeated graph edges.
A set-based solution often fails when multiplicity matters. A map-based solution often fails when update order matters.
Negative and extreme values
Negative values and extremes test sentinel choices, ordering, overflow risk, and numeric assumptions.
Use these when the domain includes:
- negative numbers;
- zero as a meaningful value;
- very large integers;
- min/max timestamps;
- float precision;
k = 0,k = n, ork > n;- values at the boundary of comparison logic.
If the language has fixed-width integers, mention overflow risk when constraints approach the limit.
Malformed input where applicable
Malformed input is not always part of an algorithm round. When it is in scope, test it deliberately instead of mixing it into normal cases.
Examples:
- interval with
start > end; - grid rows with inconsistent lengths;
- null input;
- missing fields in records;
- invalid characters in a parser;
koutside the accepted range.
If the prompt says input is valid, do not spend the round building a validation subsystem. State the assumption and keep moving.
Stateful cases
Stateful tests matter when the code stores data across calls, mutates input, uses class fields, caches, recursion state, or shared collections.
Examples:
- call the same object twice and verify counts reset or accumulate intentionally;
- run two test cases after a backtracking solution and confirm paths are not shared;
- mutate an input after a call only if mutation is part of the contract;
- verify
visitedstate does not leak between connected-component searches.
State leaks are senior-level bugs because they often pass the first sample and fail only under repeated use.
Randomized and property-style reasoning
Some outputs are hard to compare exactly. Use properties when they are clearer than one expected value.
Examples:
- sorting: output is ordered and contains the same multiset as input;
- merge intervals: output intervals are non-overlapping and cover the same points as input;
- top-k: every returned item has frequency at least as high as every omitted item;
- shortest path: returned distance is no greater than any manually constructed path for a tiny graph;
- randomized small arrays: compare optimized solution against a brute-force implementation.
In an interview, you may only describe the property test. That still shows senior verification judgment.
Worked example
Prompt:
“Return the length of the longest substring without repeating characters.”
Core invariant:
The window from
lefttorightcontains no duplicate characters after each iteration.
High-value tests:
| Test | Input | Expected | What it catches |
|---|---|---|---|
| Happy path | "abcabcbb" |
3 | Basic sliding-window behavior. |
| Empty input | "" |
0 | Default answer and loop guard. |
| Smallest valid | "a" |
1 | Initialization. |
| All duplicates | "bbbbb" |
1 | Repeated shrink/update behavior. |
| Left must not move backward | "abba" |
2 | Last-seen index update bug. |
| Later unique run | "pwwkew" |
3 | Repeated character inside active window. |
The most diagnostic case is "abba". A common bug sets left = last_seen[char] + 1 without checking whether the prior occurrence is inside the active window. On the final a, that moves left backward and produces an invalid answer.
A senior testing explanation:
“After the sample, I want to run
abbabecause it specifically tests that left never moves backward. If that fails, the fix is not a random rewrite; it means the update needsleft = max(left, last_seen[char] + 1). I would keepabbaas the regression case.”
The test is tied to the invariant, not chosen from a generic edge-case list.
Annotated interaction
Testing during a live coding round
Interviewer: Your solution for longest substring looks complete. How would you test it?
Candidate: “I will start with the sample abcabcbb expecting 3, just to confirm the main path. Then I want the smallest valid case, a, expecting 1, and empty string expecting 0 if empty input is allowed.”
Interviewer: Empty is allowed.
Candidate: “The risky part is repeated characters and the left boundary. bbbbb should return 1. More importantly, abba should return 2; if I ever move left backward on the final a, that case catches it.”
Interviewer: Suppose abba returns 3.
Candidate: “Then the bug is in the repeat handling. I am probably using the last seen index even when it is outside the current window. I will update left with max(left, last_seen[char] + 1) and rerun abba as the regression test before moving on.”
The candidate explains why each test exists, diagnoses the likely bug from the failing case, and preserves the failure as a regression test.
Response quality by maturity
Testing maturity
Weak
Mid-level
Senior
Test matrix
Use this matrix as a selection tool, not a requirement to run every row.
| Category | Use when | Example | Expected signal |
|---|---|---|---|
| Happy path | Every prompt. | Typical valid input from the prompt. | Main behavior works. |
| Smallest valid input | The domain has a minimum size. | One item, one node, one interval, one character. | Initialization is correct. |
| Empty input | Empty is valid or needs clarification. | [], "", empty grid. |
Guard and default output are correct. |
| Duplicates | Membership, counts, ordering, or positions matter. | Repeated value, repeated char, equal priority. | Multiplicity and tie behavior are correct. |
| Negative values | Numeric domain allows negatives. | [-3, -1, 0, 2]. |
Sentinels and comparisons are safe. |
| Extreme values | Constraints approach limits. | Large n, max integer, k = n. |
Complexity and numeric assumptions hold. |
| Malformed input | Prompt includes validation or real-world parsing. | Bad interval, null record, invalid token. | Contract handles invalid data deliberately. |
| Stateful sequence | Code stores or mutates state. | Two calls on same object, two DFS components. | State does not leak accidentally. |
| Unordered output | Output order is irrelevant. | All valid pairs or grouped anagrams. | Comparison checks sets or multisets correctly. |
| Property-style | Exact output is hard or many outputs are valid. | Sorted result, merged intervals, randomized brute-force comparison. | General correctness properties hold. |
| Regression | A bug was found. | The failing input after the fix. | The fix targets the defect and stays fixed. |
Senior trade-offs
Testing enough without losing the clock
Testing can consume the end of the round if it is unbounded. Select tests by risk:
- sample or happy path;
- smallest or empty boundary;
- one case aimed at the core invariant;
- one case aimed at a known implementation risk;
- regression case if a bug appeared.
If time is short, say the unrun tests explicitly:
“Given the time, I ran the sample and the duplicate boundary. I would also test empty input and negative values if negatives are in scope.”
That is better than pretending the solution is fully validated.
Testing invalid input without overbuilding
If the interviewer says input is valid, do not turn the solution into a parser or validator. State:
“I am assuming valid intervals as given. If this were a production API, I would validate
start <= endbefore the merge step.”
If malformed input is part of the prompt, isolate validation so it does not obscure the algorithm.
Comparing unordered outputs
Some correct answers can appear in multiple orders. Do not write tests that fail because of accidental ordering unless order is required.
Examples:
- compare sets for unique unordered values;
- compare sorted lists for unordered pairs;
- compare multisets when duplicates matter;
- check properties for graph traversals with multiple valid paths.
Say what equality means before testing.
Using brute force as an oracle
For optimized algorithms, a brute-force version can be a powerful test oracle on small inputs.
Example:
“For random arrays up to length eight, I can compare this O(n log n) implementation against a straightforward O(n²) checker. I would not run that fully in this interview, but it is the property-style test I would use to harden the solution.”
This shows verification depth without derailing the live implementation.
Failure modes and red flags
Common testing failures:
- treating the sample as proof;
- listing edge cases without expected outputs;
- choosing cases unrelated to the algorithm’s risks;
- ignoring empty input after using
items[0]; - missing duplicates when using sets or maps;
- missing negative values after choosing
-1as a sentinel; - testing malformed input even though the prompt guarantees validity and time is tight;
- failing to rerun a case after a fix;
- comparing unordered outputs as ordered outputs;
- using random tests without a property or oracle;
- forgetting stateful repeated-call behavior.
Interviewer red flags include:
- “The candidate did not know what would break their solution.”
- “They fixed a bug but did not rerun the failing input.”
- “Their tests were generic and missed the invariant.”
- “They relied on visual inspection instead of expected results.”
Practice drills
Live testing drills
- For ten solved prompts, write a five-row test matrix with expected outputs and the bug each case targets.
- Take three sliding-window problems and identify the one case that catches boundary regression.
- Take three graph problems and test empty graph, single node, disconnected components, and repeated edges where applicable.
- Take three heap or sorting prompts and include a tie case.
- For one optimized solution, write a brute-force checker for small inputs and compare randomized cases.
- For one backtracking solution, test that stored outputs are not mutated after recursion unwinds.
- For one class-based prompt, call methods in two different sequences and check state reset or accumulation.
- After fixing any bug in practice, add the failing input as a named regression case.
For each drill, write the expected output before running the code. If you cannot predict the output, the test is not yet a test.
Self-check rubric
Testing in the interview rubric
| Score | Evidence |
|---|---|
| 1 - Weak | Runs only samples, lacks expected outputs, misses obvious boundaries, and treats failures as reasons to rewrite broadly. |
| 3 - Usable | Covers common cases and some edges, but test selection is partly generic and regression behavior is inconsistent. |
| 5 - Senior-ready | Chooses diagnostic tests from the contract, invariant, and implementation risks; explains expected results; tests stateful behavior when relevant; and preserves bug cases as regressions. |
Before ending the round, ask:
- Did I run or describe a happy path?
- Did I test the smallest valid input?
- Did I clarify or test empty input?
- Did I include duplicates when multiplicity matters?
- Did I include negative or extreme values when the domain allows them?
- Did I isolate malformed input only when it is in scope?
- Did I test state across calls, loops, recursion branches, or shared structures when relevant?
- Did I compare unordered outputs correctly?
- Did I add a regression case after every fix?
Field reference
Field reference
Testing in the interview
- Start with the contract, then test the invariant.
- Use a small set of diagnostic cases rather than a long generic list.
- Cover happy path, smallest valid input, empty input, duplicates, negative/extreme values, malformed input when applicable, and stateful behavior when relevant.
- Give expected outputs before running.
- Tie each test to a likely bug.
- For unordered outputs, define equality before comparing.
- Use property-style or brute-force checks when exact outputs are hard.
- After a fix, rerun the failing case and keep it as a regression.
- If time runs out, state the high-value tests left unrun.
Related reading
Continue reading
Full table of contents