Who should read this perspective?

This perspective is designed for companies preparing for or currently undergoing compliance audits.

Trust BuildingAudit Insights & Preparation

The Goldilocks Zone of Penetration Testing: Balancing Compliance and Real Security

At Agency, we help clients find the penetration testing sweet spot — rigorous enough to find real vulnerabilities but scoped appropriately for compliance requirements.

By Tyler Carbone

January 20, 2024·12 min read

One of the most common frustrations we hear at Agency comes from security teams who feel caught between two extremes: a cheap checkbox pen test that satisfies the auditor but finds nothing real, or an expensive red team exercise that uncovers genuine risks but costs five times more than the compliance budget allows. The truth is, there is a middle ground — and finding it is one of the most impactful decisions a security-conscious compliance team can make.

Penetration testing in the compliance world suffers from a polarization problem. On one end, companies treat pen testing as a pure compliance checkbox — they hire the cheapest vendor, scope the test as narrowly as possible, and file the clean report alongside their other audit evidence. On the other end, security-driven organizations commission exhaustive red team engagements that test every conceivable attack vector, including social engineering and physical access, producing findings that are genuinely useful for security but far exceed what any auditor needs or evaluates.

Both approaches have real costs. The checkbox approach creates false confidence, leaves real vulnerabilities undiscovered, and can actually backfire when an auditor questions the thoroughness of a test that produced zero findings. The red team approach consumes budget that could be deployed across other security controls and often produces findings that overwhelm a team's remediation capacity. What we help clients find is the Goldilocks zone — penetration testing that is rigorous enough to discover genuine vulnerabilities, structured to produce evidence auditors value, and scoped to deliver both security and compliance returns on the investment.

The Penetration Testing Spectrum

From Checkbox to Red Team

Approach	Description	Compliance Value	Security Value
Automated scan report	Automated vulnerability scanner output repackaged as a "penetration test" report	Very low — most auditors will reject this as a pen test	Very low — identifies only known vulnerability signatures
Checkbox pen test	Minimal manual testing (1-2 days); tester runs standard tools and documents output; scope is narrow	Low to moderate — may satisfy inattentive auditors but creates risk	Low — superficial testing misses application-layer and logic vulnerabilities
Standard compliance pen test	Professional testing (5-10 days); covers web apps, APIs, and infrastructure; follows recognized methodology	High — satisfies SOC 2 and ISO 27001 auditor expectations	Moderate — identifies common vulnerabilities and some deeper issues
Security-focused pen test	Thorough testing (7-15 days); deep application testing, chained attack exploration, business logic assessment	High	High — identifies real-world exploitable vulnerabilities including complex attack paths
Red team engagement	Adversary simulation (15-30+ days); includes social engineering, physical security, custom exploits, lateral movement	Exceeds compliance requirements	Very high — simulates real-world threat actors

The Goldilocks zone sits in the "standard compliance pen test" to "security-focused pen test" range. This is where the overlap between compliance value and genuine security value is highest.

What Auditors Actually Want to See

Auditor Expectations vs Common Misconceptions

Understanding what auditors actually evaluate — versus what companies think they evaluate — is critical to finding the right balance.

What Auditors Evaluate	What Companies Think Auditors Want	The Reality
Was a qualified tester engaged to perform the test?	The most expensive vendor produces the best audit evidence	Auditor qualification check is pass/fail — a qualified boutique firm satisfies this as well as a Big Four firm
Did the test scope cover in-scope systems?	Every system in the organization must be tested	Auditors evaluate whether the test covered systems within the compliance boundary, not every system the company operates
Does the report document methodology and findings?	A clean report with zero findings is ideal	Auditors are actually more skeptical of zero-finding reports; a report that identifies and documents remediated findings demonstrates a healthy testing process
Were findings remediated or documented with plans?	All findings must be fully remediated before the audit	Auditors expect critical and high-severity findings to be remediated; medium and low findings with documented remediation plans are acceptable
Was the test conducted within or near the observation period?	The test must fall exactly within the observation period dates	Most auditors accept testing within 12 months of the observation period end date, with a preference for more recent testing

The Zero-Findings Problem

In our experience, one of the biggest misconceptions is that a clean pen test report is the best outcome for compliance. What we tell clients is the opposite — a penetration test that reports zero findings raises more questions than it answers. Auditors may question whether the testing was sufficiently thorough, whether the scope was too narrow, or whether the tester lacked the skill to identify issues. A healthy pen test report identifies a range of findings (typically 5-20 for a standard engagement), with the company demonstrating remediation of critical and high items and documented plans for medium and low items. This evidence pattern tells the auditor the story they want to hear: your organization actively assesses security, identifies real issues, and remediates them.

Finding Count	Auditor Perception	Our Assessment
Zero findings	Suspicion about test thoroughness or scope	Likely indicates insufficient testing depth or overly narrow scope
1-3 findings	Acceptable but may prompt questions about scope	May be legitimate for very small, simple applications
5-15 findings	Expected range; demonstrates thorough testing and healthy security posture	Goldilocks zone — shows the tester looked hard and the organization has a mature remediation process
15-30 findings	Acceptable if most are medium/low severity	Indicates thorough testing; high number of critical findings may concern auditors about overall security posture
30+ findings	May raise concerns about security program maturity	Typical for first-ever pen tests or major application changes; auditors evaluate remediation response more than raw count

The Diminishing Returns Curve

Where Additional Testing Investment Stops Paying Off

Penetration testing follows a diminishing returns curve. The first hours of manual testing against an application yield the highest-value findings. As testing continues, findings become increasingly edge-case, harder to exploit, and lower in severity. Understanding where this curve flattens is the key to cost-efficient pen testing.

Testing Investment	What It Typically Reveals	Compliance Value Added	Security Value Added
Days 1-3	Critical infrastructure misconfigurations, default credentials, unpatched systems, OWASP Top 10 web vulnerabilities	High — the findings auditors care most about	High — these are the vulnerabilities attackers exploit first
Days 4-7	Authentication bypass edge cases, authorization flaws, API security issues, session management weaknesses	High — demonstrates thorough application-layer testing	High — real-world exploitable vulnerabilities
Days 8-12	Chained attack paths, business logic vulnerabilities, race conditions, complex authorization bypass scenarios	Moderate — exceeds what most auditors evaluate in detail	High — these findings represent real attacker techniques
Days 13-20	Subtle timing attacks, complex multi-step exploits, edge-case data exposure, deeper infrastructure pivoting	Low — auditors rarely evaluate this depth	Moderate to high — valuable for organizations with sophisticated threat models
Days 20+	Custom exploit development, zero-day research, advanced persistent threat simulation	Minimal — far exceeds compliance expectations	Variable — valuable for specific threat models but low probability of occurrence

What we recommend is that most compliance-driven pen tests should run 5-10 days for a standard SaaS environment. This captures the highest-value findings from both a compliance and security perspective. Organizations with specific threat intelligence suggesting advanced persistent threats or nation-state attackers may benefit from extended testing, but that investment should be justified by the threat model rather than the compliance program.

Building a Pen Test Program That Serves Both Masters

The Dual-Purpose Testing Framework

In our experience, the most effective approach is designing a pen testing program that explicitly serves both compliance and security objectives. This does not mean running two separate tests — it means structuring one test to produce both types of value.

Program Element	Compliance Purpose	Security Purpose	How to Achieve Both
Scope definition	Cover all systems within the compliance boundary	Cover high-risk assets based on threat model	Define scope as the union of compliance boundary and critical assets — in most cases these overlap significantly
Methodology	Follow a recognized framework (OWASP, PTES) that auditors accept	Use techniques that reflect real attacker behavior	OWASP and PTES methodologies already incorporate real-world attack techniques; no conflict exists
Reporting	Document findings mapped to compliance controls (TSC, Annex A)	Provide actionable remediation guidance with exploitation evidence	Structure the report with a compliance summary section and a detailed technical section — one report serves both audiences
Remediation tracking	Demonstrate that findings were identified and addressed	Actually fix the vulnerabilities	These are the same objective — compliance tracking and genuine remediation are identical when the program is well-designed
Frequency	Annual minimum; aligned with audit observation period	Risk-based; more frequent for high-change environments	Annual baseline with triggered testing after major changes satisfies both

Frequency Recommendations

Scenario	Recommended Frequency	Rationale
Stable SaaS application, annual SOC 2 audit	Annual, timed to the first half of the observation period	Satisfies auditor expectations while providing current security assessment
Rapidly evolving application with frequent releases	Annual comprehensive test + quarterly targeted testing of new features	Major releases introduce new attack surface that should not wait for the annual test
Multi-framework compliance (SOC 2 + ISO 27001 + PCI DSS)	Annual comprehensive test scoped to cover all frameworks	One well-scoped test satisfies all frameworks; separate tests are wasteful
Post-breach or post-incident	Immediate targeted test of affected systems, regardless of annual schedule	Incident-driven testing validates remediation and identifies additional exposure
Major infrastructure change (cloud migration, architecture redesign)	Targeted test of changed components within 90 days of production deployment	Architecture changes can introduce vulnerabilities in unexpected places; waiting for the annual test creates an extended exposure window

Structuring the Engagement for Dual Value

What we tell clients is that the statement of work is where dual-purpose testing succeeds or fails. Here is how we help clients structure engagements.

SOW Element	Checkbox Approach	Goldilocks Approach
Scope description	"Test the web application at app.example.com"	"Test all customer-facing applications, APIs, and supporting infrastructure within the SOC 2 system boundary, with additional focus on payment processing workflows and multi-tenant isolation"
Methodology	"Industry-standard testing methodology"	"OWASP Testing Guide v4.2 for application testing; PTES for infrastructure testing; specific focus areas include authentication, authorization, session management, API security, and tenant isolation"
Deliverables	"Penetration test report"	"Executive summary suitable for board and auditor review; technical report with detailed findings, exploitation evidence, and remediation guidance; findings mapped to SOC 2 Trust Service Criteria and ISO 27001 Annex A controls; risk ratings aligned with organizational risk framework"
Tester access	"Black box testing — no information provided"	"Gray box testing with architecture documentation, API specifications, and test user accounts at each privilege level — maximizing testing depth within the engagement timeframe"
Retesting	Not included	"One round of retesting within 60 days for all critical and high-severity findings, with updated report reflecting remediation status"

Common Mistakes We See

Mistakes That Undermine Both Compliance and Security

Mistake	Why It Happens	Impact	What We Recommend Instead
Selecting a vendor purely on price	Budget pressure; pen testing seen as a checkbox	Superficial testing misses real vulnerabilities; report may not satisfy auditor requirements	Evaluate vendor qualifications, report quality, and methodology first; then compare pricing among qualified vendors
Scoping the test too narrowly to minimize cost	Desire to keep pen test budget low	Auditor questions scope coverage; real vulnerabilities in excluded systems go undetected	Scope to the compliance boundary at minimum; add high-risk systems identified in the risk assessment
Running the test right before the audit	Procrastination or scheduling conflicts	Findings discovered with no time for remediation appear as exceptions	Schedule testing in the first third of the observation period
Treating the pen test report as a compliance artifact only	Compliance team manages the pen test; security team is not involved	Findings are filed for the auditor but never actually remediated	Involve the security and engineering teams in scoping, findings review, and remediation from the start
Commissioning a red team when a standard test is appropriate	Vendor upselling or internal desire for "the best" testing	Excessive cost with marginal additional compliance value; findings may overwhelm remediation capacity	Match the testing approach to the actual threat model and compliance requirements
Not requesting retesting after remediation	Retesting costs extra and seems unnecessary for compliance	Auditor cannot verify that findings were actually fixed; no evidence of closed-loop remediation	Include one round of retesting in every pen test engagement

The Goldilocks Decision Framework

How to Find Your Right Balance

When clients ask us how to calibrate their pen testing program, we walk them through these questions.

Question	If Yes	If No
Is this your first pen test?	Start with a standard compliance pen test (5-10 days) to establish a baseline	Consider whether last year's scope and depth remain appropriate for your current environment
Has your application changed significantly since the last test?	Targeted testing of changed components is warranted, potentially in addition to annual testing	Annual testing at similar scope to last year is likely appropriate
Do you process highly sensitive data (healthcare, financial, government)?	Lean toward the security-focused end of the spectrum; regulatory expectations may exceed typical compliance requirements	Standard compliance testing is likely sufficient
Is your compliance scope limited to a single framework?	Scope the test to that framework's requirements; do not over-invest	Design the test scope to cover all frameworks simultaneously for maximum efficiency
Do you have a dedicated security team that can remediate findings?	Deeper testing is appropriate because findings will actually be addressed	Limit testing depth to what your team can realistically remediate; extensive findings with no remediation plan weakens your compliance position
Has your organization experienced a security incident in the past 12 months?	More thorough testing is warranted to validate incident response effectiveness and identify residual exposure	Standard risk-based scoping is appropriate

Key Takeaways

In our experience, the Goldilocks zone for compliance-driven penetration testing is the standard to security-focused range, which produces evidence auditors value while identifying genuine vulnerabilities — cheap checkbox tests and expensive red team exercises both miss the mark for most compliance programs
What we tell clients is that a zero-findings pen test report is not the ideal outcome — auditors are actually more suspicious of clean reports, and the strongest compliance evidence is a report showing 5-15 findings with documented remediation of critical and high items
We recommend structuring penetration test engagements as dual-purpose from the start — one well-designed test with proper scoping, gray box methodology, and a report that maps findings to compliance controls serves both the auditor and the security team
In our experience, the diminishing returns curve flattens significantly after 10-12 days of testing for most standard SaaS environments — days 1-7 produce the highest-value findings for both compliance and security, and testing beyond 12 days primarily benefits organizations with sophisticated threat models
What we recommend is scheduling the annual pen test in the first third of the SOC 2 observation period, which provides evidence within the audit window while leaving adequate time for remediation and retesting before the auditor evaluates controls
We help clients avoid the most common pen testing mistakes we see: scoping too narrowly to cut costs, running the test too late for meaningful remediation, treating the report as a compliance artifact rather than a security tool, and commissioning expensive engagements that exceed actual compliance and security needs
The statement of work is where the balance between compliance and security is won or lost — what we recommend is explicit scope aligned to the compliance boundary, gray box methodology for testing efficiency, deliverables mapped to compliance controls, and retesting included as standard

Frequently Asked Questions

Tyler Carbone

Managing Director and Cofounder

Tyler Carbone is a Managing Director and Cofounder of Agency and one of the industry's leading voices on governance, risk, and compliance. He holds degrees from Harvard and a JD/MBA from the University of Virginia, and previously worked in cybersecurity at Deloitte. Tyler has helped hundreds of companies operate SOC 2, ISO 27001, HIPAA, and GDPR programs.

The Goldilocks Zone of Penetration Testing: Balancing Compliance and Real Security