How much does SOC 2 cost for a startup?

Typically $20-40K all-in for Type I, including penetration test and audit. The enterprise deal it unblocks is usually $100K or more.

How long does SOC 2 take?

Type I takes 6-12 weeks. Type II requires a 6-12 month observation period. Most initial enterprise buyer requirements are satisfied by Type I.

What's the difference between SOC 2 Type I and Type II?

Type I is a point-in-time snapshot that proves your controls exist. Type II covers a period (usually 6-12 months) and proves your controls work over time. More sophisticated enterprise buyers require Type II.

Can I pass enterprise security review without SOC 2?

Sometimes, depending on the buyer. Some enterprise customers accept detailed security questionnaire responses and penetration test reports. Others require SOC 2 as a hard prerequisite.

Do I need a full-time security hire?

Probably not until $50M+ ARR or 200+ employees. Until then, fractional security leadership (virtual CISO) plus project-based work like penetration testing and compliance support covers most needs.

How do I respond to a security questionnaire?

Security questionnaires like SIG, CAIQ, and VSAQ follow common patterns. Building a reusable answer library and having someone experienced review responses significantly reduces turnaround time and improves pass rates.

Field Notes

AI Security Questionnaires: What's Being Asked and How to Answer

RAG architectures, third-party models, GDPR, and the frameworks enterprise buyers care about

December 29, 2025

Enterprise security questionnaires are adding more AI sections. If you’re processing people’s data with LLMs and selling to financial services, healthcare, or large tech companies, you’ve probably seen this already. Questions about training data, customer data handling, prompt injection, and incident response for AI-specific failures.

And most growing SaaS companies are handling it the way they handle all security questionnaires: imperfectly, iteratively, on demand, and with varying degrees of frustrated typing. But if you’re getting these questions repeatedly, having documented answers ready certainly beats improvising or hoping they don’t notice your AI-generated answers.

What’s Being Asked

Based on what we’re seeing in enterprise security reviews:

“Do you train on customer data?”
- They want to know about their liability exposure and data rights
“How long do you keep our data if it’s processed by AI?”
- They want to know how it’s protected and if it will be deleted post relationship
“Which vendors do you use to process our data with AI?”
- They want to know if your vendors will be the cause of a data breach
“How do you handle customer data in AI features?”
- They want to understand basic data governance
“How do you prevent prompt injection?”
- They want to know if you’re aware of AI-specific security weaknesses
“How are you minimizing the data that is processed by AI?”
- They want to know the extent of the breach they’ll have to deal with

Third-Party Model Providers

If your AI features call OpenAI, Anthropic, Google, or use managed services like AWS Bedrock, your data flow story is probably more complicated. “We use Bedrock” is only the start of a longer conversation.

Sub-Processors

When customer data hits a third-party model API, that provider becomes a sub-processor under GDPR and similar regimes. Buyers will want to know:

Which providers, specifically? Bedrock is a wrapper—are you using Anthropic’s Claude, Meta’s Llama, or Amazon’s Titan underneath? Each has different terms.
What’s your DPA with them? AWS has a GDPR DPA, but is it executed? Do you understand what it covers and what your responsibilities are?
Do they train on inputs? This varies by provider, license, and API tier. OpenAI’s API has different defaults than ChatGPT. Anthropic’s API doesn’t train on inputs by default. Bedrock’s managed models don’t, but verify this for each model you use and be able to cite the specific documentation.
Where does processing happen? Bedrock lets you choose regions, which matters for data residency. If you’re using us-east-1 and your customer has EU data residency requirements, that’s going to be a problem that needs to be resolved.

RAG Architectures

The standard answer to “Do you train on customer data?” is “No.” With retrieval-augmented generation, that answer is technically accurate, but as with everything, it’s a little complicated.

In a RAG system, you’re fundamentally augmenting the foundational model’s context and response based on data you’re providing - not training the model. More specifically, you’re chunking data, embedding it into vectors, storing the vectors, and at query time embedding the query, retrieving semantically similar chunks, and injecting them into the model's context to inform its generation.

But there a several things to keep in mind. It does:

Store transformed representations of customer data until it’s deleted (considered the same as the customer data effectively, even though they might just be numbers - see embedding inversion attacks)
Send query data and maybe customer data to the model provider at query time (Bedrock, AI Foundry, etc excluded)
Create a system where customer A’s query could theoretically surface customer B’s data if your tenant or data isolation is weak

Buyers who understand this will ask the following:

Where are embeddings stored? (Provider, region, encryption, who can access)
What’s the retention policy for the vector data?
How do you handle deletion requests, can you remove specific embeddings, or is it a full reindex?
How is tenant isolation or user data authorization enforced?
When context is sent to the model at query time, what data is included?
What embedding model is used - is it hosted? does it also call an external API?
Are queries, context, and output logged? If so, where and for how long? If it’s a 3rd-party service, how is that service protected?

GDPR Specifically

If you have EU customers or process EU personal data, LLMs and RAG architectures create some complications. None of this should be construed as legal advice.

Legal basis for processing

You need a lawful basis for both your primary service and the AI-specific processing. If customers consented to “analytics,” that doesn’t cover “we embed your data and use it to generate AI responses.” Make it transparent. Get consent. Review your legal basis with your counsel.

Right to erasure

Requests for the right to erasure or right to be forgotten (Article 17) get complicated with vector embeddings. If someone requests deletion, can you actually remove their data from your vector database? Will you need full reindexing? Is it simple as deleting a database? The Bavarian State Office for Data Protection Supervision says post-training options may be necessary.

Data transfers

If you’re using US-based model providers, you need valid transfer mechanisms for EU personal data. This means standard contractual clauses and a transfer impact assessment to describe the specifics around circumstances, destination laws, and any safeguards in place. AWS has SCCs, and you should also understand the supplementary measures question.

Sub-processor disclosure

Your customers need to know about your sub-processors who process their data. Your privacy policy probably lists AWS if you’re using Bedrock. Otherwise, list your vendors who see any customer data.

Framework Landscape

Several frameworks cover AI governance. Here’s a current picture

NIST AI RMF — US government-adjacent. Relevant if you’re selling to federal or if your enterprise buyers reference NIST frameworks generally.

EU AI Act — Mandatory for the EU market, and quite complex. The risk-based system means that if your product falls into high-risk categories (healthcare AI, emotion recognition, employment decisions, creditworthiness, certain infrastructure), you’re looking at entire bans, conformity assessments, technical documentation requirements, and ongoing monitoring obligations, far beyond documentation exercises. If you’re selling AI products in the EU, get legal counsel who deeply understands this. Of course, nothing here is meant to be legal advice.

Colorado AI Act — Relevant if you have Colorado customers. Set to come into effect in February 2026, but under some federal scrutiny at the time of publishing. It creates disclosure and risk management requirements for “high-risk AI systems” in consequential decisions (employment, financial services, healthcare, housing, insurance, legal services, education, government services). Likely that similar state-level legislation will be coming elsewhere.

AIUC-1 — A newer framework with detailed, practical requirements. Less market traction than ISO 42001 currently, but the implementation guidance is more specific. Launched in July 2025 by the Artificial Intelligence Underwriting Company. Built around six pillars: data & privacy, security, safety, reliability, accountability, and societal risks and builds on NIST AI RMF, EU AI Act, ISO 42001, and MITRE ATLAS. Early commercial framework - worth understanding, but early to bet heavily on.

ISO 42001 — Likely to stick around for enterprise adoption as it’s from ISO, focused specifically on AI management systems.

These frameworks all aim to answer the questions sophisticated buyers are asking. If you document your AI practices clearly and can speak to your operational processes, you can answer questionnaires regardless of which framework the buyer cares about. As with SOC2, you don’t necessarily need it if you can confidently answer questions and don’t mind completing every organization’s snowflake unique security questionnaire. Don’t overbuild for any single standard until you know which ones your specific buyers are most likely to ask for.

Unsolved Problems in AI Agent Security

It’s worth noting here that both academic and industry research are still working to address agentic security gaps, and no current framework fully addresses them all.

There are no standards for agent identities, there are no established solutions for agentic authorization, there are no foolproof ways to differentiate between instructions and data, and this is a fast-moving and complex space across multiple spheres.

So having thoughtful answers, even if it’s “yes, that’s a gap we’ve thought about and here’s how we’re trying to mitigate it”, will differentiate you from those who haven’t considered these issues.

What to Do

The following is good operational hygiene that will get you started, regardless of the frameworks you use.

Get Baseline Documentation in Place

There are policy templates you can use, but to truly reflect your organization, application, and product, this will take a week or more of focused work. If you’ve already gone down the road of GDPR or other data privacy frameworks, you may already have a head start on this. Reconcile what you write (or tell AI to write) with what engineering is really doing, which means pulling engineers into the conversation. If your documentation says one thing and your implementation does another, that will surface in a serious security review.

Document the following

Data flows - what customer data enters your AI systems, what comes out, where it’s stored, and how long it’s retained
Data flow diagrams - should include AI processing paths, especially for RAG
Data training - whether you train models on customer data (probably no, especially if you’re using foundational models and RAG - but verify with your ML devs)
Accountability - who owns AI risk decisions (that is, a name with actual authority, not a committee)
Sovereignty - where exactly does AI processing happen? (cloud provider, region, any third-party models)
Vendor management - what is your evaluation process for AI/ML providers
Sub-processor lists - including AI/ML providers (which models through which services)
Lawful basis analysis for AI-specific processing (GDPR-specific, Article 6)
Embedding retention policies and deletion procedures
Your threat models - how are attackers targeting your engineers, your application, and users? How are you protecting against those attack vectors?
Basic risk management documentation - per NIST’s AI Risk Management Playbook (Governance 1.4), draft this information in preparation for later stages

Operational Depth

Where sophisticated buyers separate vendors who’ve thought about AI operations from those who are winging it - asking about model maturity and AI-specific incident response.

Model lifecycle management

How do you version models in production?
What’s your rollback procedure if a model update causes problems?
What specific testing happens before AI features ship?
How do you monitor model performance over time?

Incident response (AI-specific)

Extend existing response runbooks to cover AI scenarios like harmful output, hallucinations, data leakage through AI features, or prompt injection issues
Define what “harmful output” means for your product specifically
Document escalation paths and customer communication templates

If you have decent incident response documentation, updating it for AI scenarios should be relatively straight forward.

Use Platform Guardrails (With Caveats)

AWS Bedrock Guardrails, Azure AI Content Safety, and Google Vertex AI safety filters are worth evaluating, but it’s not necessarily as simple as enabling them. They’re good for a few things:

Baseline content filtering (toxicity, PII detection, basic prompt injection attempts)
Checkbox for “do you have guardrails”
Low effort

But what you need to do

Test them against your specific (ab)use cases. Generic protections will miss domain-specific issues
What happens when they block legitimate content?
Review and configure thresholds since defaults might be too aggressive or too permissive.
And don’t assume they solve the malicious user problem

This takes evaluation and testing to deploy, plus ongoing monitoring as models and capabilities change.

Bedrock for Compliance, Not Just Convenience

AWS Bedrock gets positioned as the “enterprise-ready” option, and there’s truth to that, but this is a starting point.

Bedrock gives you a legal and foundational start

Regional deployment options (for data residency concerns)
Integration with AWS’s broader compliance regimes (SOC 2, ISO 27001, FedRAMP, etc)
Guardrails feature for content filtering and PII redaction
Model invocation logging through CloudWatch and CloudTrail
VPC endpoints for private connectivity

But - you still have to

Configure AWS itself correctly (a hefty endeavor across many acronyms)
Understand and document the data flow across the service as mentioned above
Work through tenant and data isolation in your application if appropriate
Define and enforce retention policies for any data you store
Build the monitoring and alerting that makes logging, traceability, and observability useful

“Show me your Bedrock configuration and explain your design decisions” isn’t an unreasonable request. Be ready to walk through your model access policies, guardrail configuration, logging setup, and VPC architecture.

Defer Unless Required

Unless a specific buyer requires these, wait and see where the market goes:

Full AI management system certification
Continuous adversarial testing programs
Custom-built AI safety infrastructure
Third-party AI audits beyond what’s required for specific deals

Now, if you’re selling AI products to highly regulated industries or anything that might fall under the EU AI Act high-risk categories or the Colorado AI Act scope, accelerate this timeline.

Things Negotiated in Contracts

Documentation and questionnaire responses are where conversations start. Contracts are where they finish. A few additional things to think about regarding AI usage:

MSA and DPA language

Your Master Service Agreement and Data Processing Addendum probably weren’t written with AI features in mind. Enterprise buyers may want to negotiate AI-specific terms such as restrictions on training, data handling commitments, and liability allocation for AI-generated outputs. Get legal or your attorney involved before these negotiations.

Liability for AI outputs

If your AI hallucinates in a regulated context (think financial advice, health information, legal guidance) that’s not only a PR problem but a legal and customer retention problem. Contracts should address this and your product should have appropriate guardrails and disclaimers.

We need to talk to your AI/Security team requests

In deals with sophisticated buyers or mature security teams, they’ll want to talk to someone technical who can walk through architecture, explain design decisions, and answer follow-up questions. Documentation doesn’t solve this, especially since so much of it can be AI-generated, killing confidence in the documentation. You either need internal expertise or a partner who can credibly represent your technical approach.

When This Becomes a Bigger Deal

You’ll know when more time and investment is needed. Obvious things like deals delayed and too much time is put into AI governance questions and explicit requests around AI governance and compliance.

Keep an eye on on state level regulation such as Colorado’s AI Act, and other state level requirements. Until then, keep building operational maturity, extend your existing security program, and answer questions as they come. But don’t yet build an entire compliance program around a framework that is still proving market traction.

Tools and Resources

The AIUC-1 Navigator provides searchable implementation guidance across AI governance requirements. Useful for understanding what’s being asked even if you’re not specifically pursuing AIUC-1 compliance since the standard pulls requirements from NIST AI RMF, EU AI Act, ISO 42001, and MITRE ATLAS into specific tasks, which can help you structure your own documentation.

NIST AI RMF is another one to review, though that’s a bit heavier to parse.

‍

Document what you're actually doing, extend your existing security practices to cover AI-specific scenarios, and answer buyer questions as they come. The framework adoption conversations will shake out over the next year or two, hopefully. Until then, operational maturity beats anything else.

‍

Want more resources like this? Let us know!

We'll never share your email.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Continue Reading

Get Started

Let's Unblock Your Next Deal

Whether it's a questionnaire, a certification, or a pen test—we'll scope what you actually need.

Smiling man with blond hair wearing a blue shirt and dark blazer, with bookshelves in the background.

Noah Potti

Principal

Talk to us

Suspension bridge over a calm body of water with snow-covered mountains and a cloudy sky at dusk.