AWS · Azure · GCP · IAM · CSPM · Zero Trust · Shared Responsibility
Cloud is different, but the fundamentals apply
The names changed when everything moved to cloud, but the underlying security disciplines did not. IAM is still identity and access management — it just uses policies and roles instead of Active Directory groups. Security groups are still firewalls — they just live in software instead of a rack. S3 is still a NAS — except the default configuration is that the whole internet can see it if you misconfigure the ACL. The concepts are the same. The attack surface is larger, the blast radius of a mistake is global, and the controls work at API level rather than network level.
The critical difference that reshapes everything: in cloud, every resource is managed through an API, and that API is accessible from anywhere in the world. A leaked AWS access key is not a local threat. It is a global authentication credential to your entire cloud estate. A developer who accidentally pushes an ~/.aws/credentials file to a public GitHub repo gives attackers immediate access — and automated tools scan GitHub for exactly these secrets within seconds of a commit. The response time from "key leaked" to "account compromised" in the wild is measured in minutes.
Shared responsibility: who owns what?
The shared responsibility model defines where the cloud provider's security obligation ends and yours begins. It is the most important mental model for cloud security because misunderstanding it is where the majority of cloud breaches originate. The provider is responsible for the security of the cloud — the physical infrastructure, the hardware, the hypervisor, the managed service availability. You are responsible for security in the cloud — how you configure everything you deploy on top of it.
What the provider always secures
| Provider responsibility | Scope |
|---|---|
| Physical data centres | Locks, guards, power, cooling, hardware disposal |
| Global network infrastructure | Inter-region backbone, DDoS protection at the edge |
| Hypervisor | VM isolation, escape prevention, patching |
| Managed service availability | RDS, Lambda, ECS, AKS — the platform SLA |
What the customer always owns
| Customer responsibility | What it means in practice |
|---|---|
| IAM configuration | Who has access, what policies are attached, whether MFA is enforced |
| Data encryption choices | Encryption at rest (S3, EBS, RDS) and in transit — opt-in, not default everywhere |
| Network access controls | Security groups, NACLs, VPC configuration, firewall rules |
| Application security | Your code, its dependencies, OWASP vulnerabilities in your app |
| Patch management (IaaS) | Guest OS, application runtime, middleware — the provider does NOT patch EC2 instances |
| Logging configuration | CloudTrail/Activity Log/Cloud Audit Logs must be enabled — not always on by default in all regions |
IAM across AWS, Azure, and GCP
IAM is the single highest-impact control in any cloud environment. Get it right and the blast radius of every other misconfiguration shrinks dramatically. Get it wrong and a single compromised credential gives an attacker the run of the entire cloud estate. The three major clouds have different naming conventions but the underlying model is identical: an identity (user, group, service account, managed identity) is granted permissions via a role/policy and can act on resources within that permission boundary.
AWS IAM
# Check for users with AdministratorAccess (the most common critical misconfiguration): aws iam list-attached-user-policies --user-name alice aws iam get-policy --policy-arn arn:aws:iam::aws:policy/AdministratorAccess # List all access keys older than 90 days (should be rotated or disabled): aws iam generate-credential-report aws iam get-credential-report --output text --query Content | base64 -d | \ awk -F',' 'NR>1 && $9!="N/A" && $9<"2025-03-01" {print $1, $9}' # Check for root account access keys (should never exist): aws iam get-account-summary | grep "AccountAccessKeysPresent" # Should return 0 # Check password policy: aws iam get-account-password-policy # List all roles and their trust policies (find over-trusted roles): aws iam list-roles --query 'Roles[*].[RoleName,AssumeRolePolicyDocument]'
Azure RBAC
# List all role assignments at subscription scope: az role assignment list --scope /subscriptions/<sub-id> --output table # Find all users/SPs with Owner role (highest risk): az role assignment list --role Owner --all --output table # Check for service principals with client secrets (prefer managed identities): az ad sp list --all --query '[?passwordCredentials!=null].[displayName,appId]' # List managed identity role assignments: az identity list --output table # Check PIM (Privileged Identity Management) eligibility vs active assignments: # az rest --method GET --uri 'https://management.azure.com/subscriptions/{id}/providers/Microsoft.Authorization/roleEligibilitySchedules?api-version=2020-10-01'
GCP IAM
# List all IAM policy bindings at project level: gcloud projects get-iam-policy <project-id> --format=json | \ jq '.bindings[] | select(.role | contains("owner", "editor")) | {role, members}' # List service account keys (should be zero — use Workload Identity instead): gcloud iam service-accounts keys list --iam-account=<sa>@<project>.iam.gserviceaccount.com # Check for allUsers or allAuthenticatedUsers on GCS buckets (public access): gsutil iam get gs://<bucket-name> | grep -E "allUsers|allAuthenticatedUsers" # Check org policies (prevent public IPs, restrict domains, etc.): gcloud org-policies list --project=<project-id>
The most common IAM mistakes
The most common IAM mistakes are well-documented, show up in CISA advisories, and still appear in production environments year after year. Not because security teams do not know about them — but because cloud estates grow faster than governance does. A developer creates a "temporary" IAM user with AdminAccess for a weekend project, and it is still there three years later. An EC2 instance gets an IAM role with S3 full-access because it was easier than figuring out the exact bucket permissions. These are the findings that matter most because they are the ones attackers exploit first.
AWS critical misconfigurations
| Misconfiguration | Severity | Remediation |
|---|---|---|
| Root account without MFA | Critical | Enable MFA, create an admin IAM user instead of using root for daily work |
| AdministratorAccess on IAM users | Critical | Use IAM Identity Center (SSO); never attach AdministratorAccess to human users |
| Long-lived access keys for humans | High | Use IAM Identity Center and temporary credentials; no human should have long-lived keys |
| EC2/Lambda role with s3:* | High | Scope down to exactly the buckets and actions needed; add condition keys |
| No permission boundaries on developer accounts | Medium | Prevents privilege escalation via RBAC; cap max permissions at the boundary |
| VPC Flow Logs disabled | Medium | No network visibility for incident response; enable for all VPCs |
Azure critical misconfigurations
| Misconfiguration | Severity | Remediation |
|---|---|---|
| Owner role at subscription scope (too many users) | Critical | Review all Owner assignments; use PIM for just-in-time activation |
| Service principal with client secret (not managed identity) | High | Migrate to Managed Identity wherever possible; secrets require manual rotation |
| No Conditional Access policy | High | Require MFA for all users, restrict access to managed devices for admin roles |
| Diagnostic settings not configured | Medium | Enable Activity Log export to Storage/Log Analytics; default retention is 90 days |
GCP critical misconfigurations
| Misconfiguration | Severity | Remediation |
|---|---|---|
| allUsers on GCS bucket | Critical | Remove allUsers binding; enable uniform bucket-level access; use Org Policy to prevent |
| Service account keys created and shared | Critical | Delete all SA keys; use Workload Identity Federation instead |
| Owner/Editor primitive roles at project | High | Migrate to predefined roles; primitive roles are far too broad |
| No org-level policy constraints | Medium | Enable: constraints/iam.disableServiceAccountKeyCreation, constraints/compute.disableGuestAttributesAccess |
CSPM: automated posture management
Cloud Security Posture Management (CSPM) tools continuously scan your cloud environment against security benchmarks and flag misconfigurations before (or after) they are exploited. They do not require deploying agents — they use the cloud provider's read-only APIs with a configured set of credentials. Three tools cover the landscape: Prowler for deep AWS coverage, ScoutSuite for multi-cloud assessments, and Checkov for Infrastructure-as-Code scanning before deployment.
# ---- Prowler (AWS — 500+ checks against CIS, PCI-DSS, NIST, SOC2) ---- pip install prowler # Full assessment with HTML + JSON + CSV output: prowler aws --output-formats html json csv -o output/ # Assess only specific service (faster for CI/CD): prowler aws --service iam s3 cloudtrail # Run specific CIS checks: prowler aws --compliance cis_1.5_aws # Check only critical/high findings: prowler aws --severity critical high # ---- ScoutSuite (multi-cloud: AWS, Azure, GCP, OCI, Alibaba) ---- pip install scoutsuite python scout.py aws --report-dir /tmp/scoutsuite-report/ python scout.py azure --tenant <tenant-id> --report-dir /tmp/ python scout.py gcp --report-dir /tmp/ # Open the self-contained HTML report: # open /tmp/scoutsuite-report/scoutsuite-results/scoutsuite_report.html # ---- Checkov (IaC: Terraform, CloudFormation, K8s, Helm, ARM) ---- pip install checkov # Scan a Terraform directory: checkov -d terraform/ --compact # Scan a specific CloudFormation template: checkov -f cloudformation-template.yaml # Scan Kubernetes manifests: checkov -d k8s-manifests/ --framework kubernetes # Fail CI/CD pipeline if any HIGH findings exist: checkov -d terraform/ --check HIGH --compact --quiet # Returns exit code 1 if findings exist # Suppress a specific check (with justification comment): checkov -d terraform/ --skip-check CKV_AWS_18
Integrating CSPM into CI/CD
The highest-value integration: run Checkov as a required CI/CD check on every pull request that modifies infrastructure code. A developer who creates a public S3 bucket in Terraform sees the Checkov failure in their PR, not six months later in a breach notification. Prowler as a scheduled job (daily or weekly) catches what slips through configuration drift — a manual change through the console, a permission added "temporarily", a security group rule that was never removed.
Network security in cloud: security groups and VPCs
Cloud networks use abstractions over physical networking but the concepts and the risks are the same: keep the attack surface small, apply least privilege at the network layer, know where your traffic goes, log it all. The three major controls are security groups / network security groups (instance-level firewall), VPCs (network segmentation), and VPC Flow Logs (traffic visibility).
Security groups: the cloud firewall
| Rule | Severity | Remediation |
|---|---|---|
| 0.0.0.0/0 inbound to port 22 (SSH) | Critical | Entire internet can attempt SSH to your EC2 instance; restrict to known IPs or VPN |
| 0.0.0.0/0 inbound to port 3389 (RDP) | Critical | Same as above for Windows; RDP is actively scanned and brute-forced |
| 0.0.0.0/0 inbound to database ports (3306/5432/27017/6379) | Critical | Databases should never be internet-accessible; restrict to application security group only |
| Overly broad egress (all traffic allowed) | Medium | Restrict outbound to known destinations; limits blast radius if a workload is compromised |
| Unused security groups | Low | Reduces sprawl; unused rules may be re-attached to new instances with unintended scope |
VPC design for security
# AWS VPC best practice architecture: # # Internet Gateway # | # Public subnets → Load balancers, NAT Gateways, Bastion hosts # | # Private subnets → Application servers (EC2, ECS, Lambda in VPC) # | # Data subnets → RDS, ElastiCache, OpenSearch (no internet route) # # Key rules: # - Data subnets: NO route to internet gateway # - Application servers: access internet only via NAT Gateway (outbound only) # - Separate VPCs (or accounts) for prod/staging/dev # - VPC Peering / Transit Gateway for controlled cross-VPC access # - PrivateLink for AWS service access (no internet, no NAT) # Enable VPC Flow Logs (essential for incident response): aws ec2 create-flow-logs \ --resource-type VPC \ --resource-ids vpc-12345abc \ --traffic-type ALL \ --log-destination-type cloud-watch-logs \ --log-group-name /aws/vpc/flowlogs \ --deliver-logs-permission-arn arn:aws:iam::<account>:role/flowlogs-role # Query VPC Flow Logs in Athena to find unexpected outbound traffic from a compromised instance: # SELECT srcaddr, dstaddr, dstport, packets, bytes # FROM vpc_flow_logs # WHERE srcaddr = '10.0.1.14' # AND action = 'ACCEPT' # ORDER BY bytes DESC # LIMIT 50;
Zero trust posture and continuous compliance
Zero trust in cloud is not a product you buy. It is a set of principles that, when applied consistently, reduce the blast radius of any single failure — whether that failure is a leaked credential, a compromised instance, a supply chain attack on a dependency, or an insider threat. The principle is simple: never assume a request is legitimate because of where it comes from or what identity it claims. Verify it explicitly, every time, against the minimum necessary permissions, with full logging.
The six pillars of zero trust in cloud
| Pillar | Implementation in cloud | Tools |
|---|---|---|
| Identity | Every API call authenticated; MFA everywhere; short-lived credentials (roles/tokens) not long-lived access keys; anomaly detection on API patterns | IAM Identity Center, PIM, Entra ID Conditional Access, GCP Workload Identity |
| Least privilege | Automated enforcement via SCPs/guardrails; regular access reviews (remove what is not used); permission boundaries cap max permissions | AWS IAM Access Analyzer, Azure AD Access Reviews, GCP IAM Recommender |
| Device | Restrict console and sensitive API access to corporate managed devices; MDM enrollment required for admin roles | Conditional Access (Azure), AWS SSO device trust, BeyondCorp (GCP) |
| Network | VPC segmentation; security groups by application tier; PrivateLink for managed service access; no management ports open to internet | AWS VPC, Azure VNet, GCP VPC; flow logs in all |
| Logging | CloudTrail/Activity Logs/Cloud Audit Logs in EVERY region, EVERY service; immutable log storage; SIEM ingestion | AWS CloudTrail + Security Lake, Azure Sentinel, GCP Chronicle |
| Detection & Response | GuardDuty/Defender for Cloud/Security Command Center for anomaly detection; automated remediation (Lambda auto-disabling public buckets, auto-revoking compromised keys) | EventBridge + Lambda, Security Hub, Azure Policy auto-remediation |
Automated remediation examples
# AWS Lambda auto-remediating a newly public S3 bucket # (triggered by EventBridge rule on s3:PutBucketPublicAccessBlock event): import boto3 def lambda_handler(event, context): s3 = boto3.client('s3') bucket = event['detail']['requestParameters']['bucketName'] s3.put_public_access_block( Bucket=bucket, PublicAccessBlockConfiguration={ 'BlockPublicAcls': True, 'IgnorePublicAcls': True, 'BlockPublicPolicy': True, 'RestrictPublicBuckets': True } ) print(f"Blocked public access on {bucket}") # AWS Security Hub custom action — auto-disable compromised IAM user: # EventBridge rule: SecurityHub > Findings > Filter on IAM compromised credential # Target: Lambda that calls iam.update_access_key(Status='Inactive') # Azure Policy auto-remediation — enforce storage account HTTPS-only: az policy assignment create \ --name "enforce-https-storage" \ --policy "34c877ad-507e-4c82-993e-3452a6e0ad3c" \ --scope "/subscriptions/<sub-id>" \ --location eastus \ --identity-scope "/subscriptions/<sub-id>" \ --role "Contributor"
The goal of continuous monitoring and automated remediation is to reduce the mean time to remediation (MTTR) for common misconfigs from weeks (manual review cycle) to seconds (automated fix triggered by the misconfiguration itself). Not everything can be auto-remediated — some fixes need human judgment. But the known, high-confidence findings (public S3 bucket, disabled MFA, compromised credential indicator) should trigger immediate automated action with a notification to the security team.