AWS EC2 Troubleshooting

AWS EC2 Troubleshooting

An Instance Immediately Terminates

  • EBS volume limit was reached. Its a soft limit and can be increased by submitting a support request
  • EBS snapshot is corrupt.
  • Root EBS volume is encrypted and you do not have permission to access the KMS key for decryption.
  • Instance store-backed AMI used to launch the instance is missing a required part
  • Resolution
    • Delete unused volumes
    • Ensure proper permissions to access the AWS keys.

EC2 Instance Connectivity Issues

  • Error connecting to your instance: Connection timed out
    • Route table, for the subnet, does not have a  route that sends all traffic destined outside the VPC to the Internet gateway for the VPC.
    • Security group does not allow inbound traffic from the public IP address on the proper port
    • ACL does not allow inbound traffic from and outbound traffic to the public IP address on the proper port
    • Private key used to connect does not match with key that corresponds to the key pair selected for the instance during the launch
    • Appropriate user name for the AMI is not used for e.g. user name for Amazon Linux AMI is ec2-user, Ubuntu AMI is ubuntu, RHEL5 AMI & SUSE Linux can be either root or ec2-user, Fedora AMI can be fedora or ec2-user
    • If connecting from a corporate network, the internal firewall does not
      allow inbound and outbound traffic on port 22 (for Linux instances) or port 3389 (for Windows instances).
    • Instance does not have the same public IP address, which changes during restarts. Associate an Elastic IP address with the instance
    • CPU load on the instance is high; the server may be overloaded.
  • User key not recognized by the server
    • private key file used to connect has not been converted to the format as required by the server
  • Host key not found, Permission denied (publickey), or Authentication failed, permission denied
    • appropriate user name for the AMI is not used for connecting
    • proper private key file for the instance is not used
  • Unprotected Private Key File
    • private key file is not protected from read and write operations from any other users.
  • Server refused our key or No supported authentication methods available
    • appropriate user name for the AMI is not used for connecting

Failed Status Checks

  • System Status CheckChecks Physical Hosts
    • Lost Network connectivity
    • Loss of System power
    • Software issues on the physical host
    • Hardware issues on the physical host
    • Resolution
      • Amazon EBS-backed AMI instance – stop and restart the instance
      • Instance-store backed AMI – terminate the instance and launch a replacement.
  • Instance Status Check – Checks Instance or VM
    • Possible reasons
      • Misconfigured networking or startup configuration
      • Exhausted memory
      • Corrupted file system
      • Failed Amazon EBS volume or Physical drive
      • Incompatible kernel
    • Resolution
      • Rebooting of the Instance or making modifications in your Operating system, volumes

Instance Capacity Issues

  • InsufficientInstanceCapacity
    • AWS does not currently have enough available capacity to service the request.
    • There is a limit to the number of instances of instance type that can be launched within a region.
    • Issue is mainly from the AWS side and it can be resolved by
      • reducing the request for the number of instances
      • changing the instance type
      • submitting a request without specifying the Availability Zone.
  • InstanceLimitExceeded
    • Concurrent running instance limit, default is 20, has been reached in a region.
    • Request an instance limit increase on a per-region basis

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user has launched an EC2 instance. The instance got terminated as soon as it was launched. Which of the below mentioned options is not a possible reason for this?
    1. The user account has reached the maximum EC2 instance limit (Refer link)
    2. The snapshot is corrupt
    3. The AMI is missing. It is the required part
    4. The user account has reached the maximum volume limit
  2. If you’re unable to connect via SSH to your EC2 instance, which of the following should you check and possibly correct to restore connectivity?
    1. Adjust Security Group to permit egress traffic over TCP port 443 from your IP.
    2. Configure the IAM role to permit changes to security group settings.
    3. Modify the instance security group to allow ingress of ICMP packets from your IP.
    4. Adjust the instance’s Security Group to permit ingress traffic over port 22 from your IP
    5. Apply the most recently released Operating System security patches.
  3. You try to connect via SSH to a newly created Amazon EC2 instance and get one of the following error messages: “Network error: Connection timed out” or “Error connecting to [instance], reason: -> Connection timed out: connect,” You have confirmed that the network and security group rules are configured correctly and the instance is passing status checks. What steps should you take to identify the source of the behavior? Choose 2 answers
    1. Verify that the private key file corresponds to the Amazon EC2 key pair assigned at launch.
    2. Verify that your IAM user policy has permission to launch Amazon EC2 instances. (there is not need for a IAM user and just need ssh keys)
    3. Verify that you are connecting with the appropriate user name for your AMI. (Although it gives different error seems the only other logical choice)
    4. Verify that the Amazon EC2 Instance was launched with the proper IAM role. (role assigned to EC2 is irrelevant for ssh and only controls what AWS resources EC2 can access to)
    5. Verify that your federation trust to AWS has been established (federation is for authenticating the user)
  4. A user has launched an EBS backed EC2 instance in the us-east-1a region. The user stopped the instance and started it back after 20 days. AWS throws up an ‘Insufficient Instance Capacity’ error. What can be the possible reason for this?
    1. AWS does not have sufficient capacity in that availability zone
    2. AWS zone mapping is changed for that user account
    3. There is some issue with the host capacity on which the instance is launched
    4. The user account has reached the maximum EC2 instance limit
  5. A user is trying to connect to a running EC2 instance using SSH. However, the user gets an Unprotected Private Key File error. Which of the below mentioned options can be a possible reason for rejection?
    1. The private key file has the wrong file permission
    2. The ppk file used for SSH is read only
    3. The public key file has the wrong permission
    4. The user has provided the wrong user name for the OS login
  6. A user has launched an EC2 instance. However, due to some reason the instance was terminated. If the user wants to find out the reason for termination, where can he find the details?
    1. It is not possible to find the details after the instance is terminated
    2. The user can get information from the AWS console, by checking the Instance description under the State transition reason label
    3. The user can get information from the AWS console, by checking the Instance description under the Instance Status Change reason label
    4. The user can get information from the AWS console, by checking the Instance description under the Instance Termination reason label
  7. You have a Linux EC2 web server instance running inside a VPC. The instance is in a public subnet and has an EIP associated with it so you can connect to it over the Internet via HTTP or SSH. The instance was also fully accessible when you last logged in via SSH and was also serving web requests on port 80. Now you are not able to SSH into the host nor does it respond to web requests on port 80, that were working fine last time you checked. You have double-checked that all networking configuration parameters (security groups route tables, IGW, EIP. NACLs etc.) are properly configured and you haven’t made any changes to those anyway since you were last able to reach the Instance). You look at the EC2 console and notice that system status check shows “impaired.” Which should be your next step in troubleshooting and attempting to get the instance back to a healthy state so that you can log in again?
    1. Stop and start the instance so that it will be able to be redeployed on a healthy host system that most likely will fix the “impaired” system status (for system status check impaired status you need Stop Start for EBS and terminate and relaunch for Instance store)
    2. Reboot your instance so that the operating system will have a chance to boot in a clean healthy state that most likely will fix the ‘impaired” system status
    3. Add another dynamic private IP address to me instance and try to connect via that new path, since the networking stack of the OS may be locked up causing the “impaired” system status.
    4. Add another Elastic Network Interface to the instance and try to connect via that new path since the networking stack of the OS may be locked up causing the “impaired” system status
    5. un-map and then re-map the EIP to the instance, since the IGW/NAT gateway may not be working properly, causing the “impaired” system status
  8. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a connection time out error. Which of the below mentioned options is not a possible reason for rejection?
    1. The access key to connect to the instance is wrong (access key is different from ssh private key)
    2. The security group is not configured properly
    3. The private key used to launch the instance is not correct
    4. The instance CPU is heavily loaded
  9. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a Host key not found error. Which of the below mentioned options is a possible reason for rejection?
    1. The user has provided the wrong user name for the OS login
    2. The instance CPU is heavily loaded
    3. The security group is not configured properly
    4. The access key to connect to the instance is wrong (access key is different from ssh private key)

20 thoughts on “AWS EC2 Troubleshooting

    1. Thanks. Good catch, corrected the answer. Its the Option A – access key which is not relevant for connection timeout error. Rest of the options are all relevant to the error

  1. In 3rd question, can you tell me why the A answer is correct, please? I thought that answer is false because if the private key files are incorrect, we would see the
    “Permission denied”, not “Connection timeout.”

    1. Thanks for the Comment @johnydeep. It is marked more on the basis of elimination.
      To perform ssh to an EC2 instance and is entirely driven by security, ACL, keys and status of the server
      #b is incorrect as there is not need for a IAM user and just needs key
      #d is incorrect, as role assign to EC2 is irrelevant for ssh and only controls what EC2 can access to
      #e is irrelevant as federation is for logged in user only and not irrelevant to ssh

      This is might be an old question not updated for changes in AWS

      1. Thank you for your reply.
        I took the exam a couple of days ago and saw this question.
        It was really odd question that made it unforgettable 😀
        After finishing the exam, i searched AWS document for the answer and found out nothing from document matched 5 of the answers.

  2. Hello Jayendra Sir.. Hope you are doing well.. Thanks a lot for such a nice blog with lots of questions and correct ans. This helps us a lot in cracking few of the aws certifications.

    Sir,there is some issue in the link .”https://jayendrapatil.com/aws-virtual-private-cloud-vpc/” and this vpc tag is not even mentioned in the site as well… no tag available for this option. Could you please check this once and let me know how can i access the topic related to VPC. Thanks in advance.

  3. Hi Patil Sir – Thanks so much for posting this synthesised information regarding AWS. I appreciate that you have it broken by granular topics. It makes learning really very easy.

    I had a question about the exam questions you have posted. Do these apply to the AWS Solutions Architect cert or the Sys Ops cert? Or do they apply to both exams? It would be good if you could clarify that. Thanks.

    1. The topics cover about 80-90% of Solutions Architect – Associate and about 50-60% of SysOps – Associate as some of the topics are quite common.

  4. Hi sir,

    I could not find any questions or topic related to Cloud formation on your blog

    Can you give me link for the same if it is there?

    Thanks in advance

  5. Hi Sir – I have passed the AWS SA-A all because of this website/blog. thanks you so much for taking your time in keeping this site.

  6. Jayendra,

    For Error connecting to your instance: Connection timed out:
    I guess your below statement is incorrect because this is true for
    Error: User key not recognized by server

    Appropriate user name for the AMI is not used for e.g. user name for Amazon Linux AMI is ec2-user, Ubuntu AMI is ubuntu, RHEL5 AMI & SUSE Linux can be either root or ec2-user, Fedora AMI can be fedora or ec2-user

    http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html#TroubleshootingInstancesConnectionTimeout

    In docs ur statement valid for Error: User key not recognized by server

    but not sure what would be the 2nd answer for Q#3 though

    Can you please clarify?

    1. Hi Pradeep, It is marked more on the basis of elimination rather then on logic
      To perform ssh to an EC2 instance and is entirely driven by security, ACL, keys and status of the server
      #b is incorrect as there is not need for a IAM user and just needs key
      #d is incorrect, as role assign to EC2 is irrelevant for ssh and only controls what EC2 can access to
      #e is irrelevant as federation is for logged in user only and not irrelevant to ssh

      This is might be an old question not updated for changes in AWS

  7. Hi Jayendra,

    For Q9, option “The access key to connect to the instance is wrong” is also a possible reason. I think both options A and D needs to be selected.

    Q9. A user is trying to connect to a running EC2 instance using SSH. However, the user gets a Host key not found error. Which of the below mentioned options is a possible reason for rejection?
    a) The user has provided the wrong user name for the OS login
    b) The instance CPU is heavily loaded
    c) The security group is not configured properly
    d) The access key to connect to the instance is wrong

    1. Hi Satish, th eonly reason D is not selected as an answer is it refers to the access key and not the private ssh key. Access key are used to access AWS resources through CLI, SDK and others and are not used to login to the EC2 instances.

  8. I’m curious to understand why answer A is better than B on question 7 isn’t correct.

    1. Reboot would launch the instance on the same hardware. while and Stop and Start with launch it on a new hardware.

Comments are closed.