AWS EC2 Troubleshooting

AWS EC2 Troubleshooting

An Instance Immediately Terminates

  • EBS volume limit was reached. Its a soft limit and can be increased by submitting a support request
  • EBS snapshot is corrupt.
  • Root EBS volume is encrypted and you do not have permission to access the KMS key for decryption.
  • Instance store-backed AMI used to launch the instance is missing a required part
  • Resolution
    • Delete unused volumes
    • Ensure proper permissions to access the AWS keys.

EC2 Instance Connectivity Issues

  • Error connecting to your instance: Connection timed out
    • Route table, for the subnet, does not have a  route that sends all traffic destined outside the VPC to the Internet gateway for the VPC.
    • Security group does not allow inbound traffic from the public IP address on the proper port
    • ACL does not allow inbound traffic from and outbound traffic to the public IP address on the proper port
    • Private key used to connect does not match with key that corresponds to the key pair selected for the instance during the launch
    • Appropriate user name for the AMI is not used for e.g. user name for Amazon Linux AMI is ec2-user, Ubuntu AMI is ubuntu, RHEL5 AMI & SUSE Linux can be either root or ec2-user, Fedora AMI can be fedora or ec2-user
    • If connecting from a corporate network, the internal firewall does not
      allow inbound and outbound traffic on port 22 (for Linux instances) or port 3389 (for Windows instances).
    • Instance does not have the same public IP address, which changes during restarts. Associate an Elastic IP address with the instance
    • CPU load on the instance is high; the server may be overloaded.
  • User key not recognized by the server
    • private key file used to connect has not been converted to the format as required by the server
  • Host key not found, Permission denied (publickey), or Authentication failed, permission denied
    • appropriate user name for the AMI is not used for connecting
    • proper private key file for the instance is not used
  • Unprotected Private Key File
    • private key file is not protected from read and write operations from any other users.
  • Server refused our key or No supported authentication methods available
    • appropriate user name for the AMI is not used for connecting

Failed Status Checks

  • System Status CheckChecks Physical Hosts
    • Lost Network connectivity
    • Loss of System power
    • Software issues on the physical host
    • Hardware issues on the physical host
    • Resolution
      • Amazon EBS-backed AMI instance – stop and restart the instance
      • Instance-store backed AMI – terminate the instance and launch a replacement.
  • Instance Status Check – Checks Instance or VM
    • Possible reasons
      • Misconfigured networking or startup configuration
      • Exhausted memory
      • Corrupted file system
      • Failed Amazon EBS volume or Physical drive
      • Incompatible kernel
    • Resolution
      • Rebooting of the Instance or making modifications in your Operating system, volumes

Instance Capacity Issues

  • InsufficientInstanceCapacity
    • AWS does not currently have enough available capacity to service the request.
    • There is a limit to the number of instances of instance type that can be launched within a region.
    • Issue is mainly from the AWS side and it can be resolved by
      • reducing the request for the number of instances
      • changing the instance type
      • submitting a request without specifying the Availability Zone.
  • InstanceLimitExceeded
    • Concurrent running instance limit, default is 20, has been reached in a region.
    • Request an instance limit increase on a per-region basis

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user has launched an EC2 instance. The instance got terminated as soon as it was launched. Which of the below mentioned options is not a possible reason for this?
    1. The user account has reached the maximum EC2 instance limit (Refer link)
    2. The snapshot is corrupt
    3. The AMI is missing. It is the required part
    4. The user account has reached the maximum volume limit
  2. If you’re unable to connect via SSH to your EC2 instance, which of the following should you check and possibly correct to restore connectivity?
    1. Adjust Security Group to permit egress traffic over TCP port 443 from your IP.
    2. Configure the IAM role to permit changes to security group settings.
    3. Modify the instance security group to allow ingress of ICMP packets from your IP.
    4. Adjust the instance’s Security Group to permit ingress traffic over port 22 from your IP
    5. Apply the most recently released Operating System security patches.
  3. You try to connect via SSH to a newly created Amazon EC2 instance and get one of the following error messages: “Network error: Connection timed out” or “Error connecting to [instance], reason: -> Connection timed out: connect,” You have confirmed that the network and security group rules are configured correctly and the instance is passing status checks. What steps should you take to identify the source of the behavior? Choose 2 answers
    1. Verify that the private key file corresponds to the Amazon EC2 key pair assigned at launch.
    2. Verify that your IAM user policy has permission to launch Amazon EC2 instances. (there is not need for a IAM user and just need ssh keys)
    3. Verify that you are connecting with the appropriate user name for your AMI. (Although it gives different error seems the only other logical choice)
    4. Verify that the Amazon EC2 Instance was launched with the proper IAM role. (role assigned to EC2 is irrelevant for ssh and only controls what AWS resources EC2 can access to)
    5. Verify that your federation trust to AWS has been established (federation is for authenticating the user)
  4. A user has launched an EBS backed EC2 instance in the us-east-1a region. The user stopped the instance and started it back after 20 days. AWS throws up an ‘Insufficient Instance Capacity’ error. What can be the possible reason for this?
    1. AWS does not have sufficient capacity in that availability zone
    2. AWS zone mapping is changed for that user account
    3. There is some issue with the host capacity on which the instance is launched
    4. The user account has reached the maximum EC2 instance limit
  5. A user is trying to connect to a running EC2 instance using SSH. However, the user gets an Unprotected Private Key File error. Which of the below mentioned options can be a possible reason for rejection?
    1. The private key file has the wrong file permission
    2. The ppk file used for SSH is read only
    3. The public key file has the wrong permission
    4. The user has provided the wrong user name for the OS login
  6. A user has launched an EC2 instance. However, due to some reason the instance was terminated. If the user wants to find out the reason for termination, where can he find the details?
    1. It is not possible to find the details after the instance is terminated
    2. The user can get information from the AWS console, by checking the Instance description under the State transition reason label
    3. The user can get information from the AWS console, by checking the Instance description under the Instance Status Change reason label
    4. The user can get information from the AWS console, by checking the Instance description under the Instance Termination reason label
  7. You have a Linux EC2 web server instance running inside a VPC. The instance is in a public subnet and has an EIP associated with it so you can connect to it over the Internet via HTTP or SSH. The instance was also fully accessible when you last logged in via SSH and was also serving web requests on port 80. Now you are not able to SSH into the host nor does it respond to web requests on port 80, that were working fine last time you checked. You have double-checked that all networking configuration parameters (security groups route tables, IGW, EIP. NACLs etc.) are properly configured and you haven’t made any changes to those anyway since you were last able to reach the Instance). You look at the EC2 console and notice that system status check shows “impaired.” Which should be your next step in troubleshooting and attempting to get the instance back to a healthy state so that you can log in again?
    1. Stop and start the instance so that it will be able to be redeployed on a healthy host system that most likely will fix the “impaired” system status (for system status check impaired status you need Stop Start for EBS and terminate and relaunch for Instance store)
    2. Reboot your instance so that the operating system will have a chance to boot in a clean healthy state that most likely will fix the ‘impaired” system status
    3. Add another dynamic private IP address to me instance and try to connect via that new path, since the networking stack of the OS may be locked up causing the “impaired” system status.
    4. Add another Elastic Network Interface to the instance and try to connect via that new path since the networking stack of the OS may be locked up causing the “impaired” system status
    5. un-map and then re-map the EIP to the instance, since the IGW/NAT gateway may not be working properly, causing the “impaired” system status
  8. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a connection time out error. Which of the below mentioned options is not a possible reason for rejection?
    1. The access key to connect to the instance is wrong (access key is different from ssh private key)
    2. The security group is not configured properly
    3. The private key used to launch the instance is not correct
    4. The instance CPU is heavily loaded
  9. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a Host key not found error. Which of the below mentioned options is a possible reason for rejection?
    1. The user has provided the wrong user name for the OS login
    2. The instance CPU is heavily loaded
    3. The security group is not configured properly
    4. The access key to connect to the instance is wrong (access key is different from ssh private key)

AWS Autoscaling Troubleshooting

Exam Question Scenario

EC2 instances fail to launch with Autoscaling configuration

Description

  • Autoscaling configuration requires the following :-
  • Autoscaling launch configuration which allows you to select an
    • AMI
    • Instance type
    • IAM role (optional)
    • Security group
    • Key pair file
  • Autoscaling group configuration allows you to select AZ to be used to launch the EC2 instances with the selected launch configuration

Troubleshooting key points :-

  • AMI id does not exist or is still pending and cannot be used to launch instances
  • Security group provided in the launch configuration does not exist
  • Key pair associated with the EC2 instance does not exist
  • Autoscaling group not found or is incorrectly configured
  • AZ configured with the Autoscaling group is no longer supported cause it might not be available
  • Invalid EBS block device mappings
  • Instance type is not supported in the AZ
  • Capacity limits reached either cause of the restriction on the number of instance type that can be launched in a region or cause AWS is not able to provision the specified instance type in the AZ (for e.g. no more spot instances or On-demand instances availability)

References

More details @ AWS Autoscaling Developer Guide