AWS Machine Learning Services – Cheat Sheet

AWS Machine Learning Services

AWS Machine Learning Services

AWS Machine Learning Services

Amazon SageMaker

  • Build, train, and deploy machine learning models at scale
  • fully-managed service that enables data scientists and developers to quickly and easily build, train & deploy machine learning models.
  • enables developers and scientists to build machine learning models for use in intelligent, predictive apps.
  • is designed for high availability with no maintenance windows or scheduled downtimes.
  • allows users to select the number and type of instance used for the hosted notebook, training & model hosting.
  • can be deployed as endpoint interfaces and batch.
  • supports Canary deployment using ProductionVariant and deploying multiple variants of a model to the same SageMaker HTTPS endpoint.
  • supports Jupyter notebooks.
  • Users can persist their notebook files on the attached ML storage volume.
  • Users can modify the notebook instance and select a larger profile through the SageMaker console, after saving their files and data on the attached ML storage volume.
  • includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, seq2seq, time series forecasting, word2vec & image classification
  • algorithms work best when using the optimized protobuf recordIO format for the training data, which allows Pipe mode that streams data directly from S3 and helps faster start times and reduce space requirements
  • provides built-in algorithms, pre-built container images, or extend a pre-built container image and even build your custom container image.
  • supports users custom training algorithms provided through a Docker image adhering to the documented specification.
  • also provides optimized MXNet, Tensorflow, Chainer & PyTorch containers
  • ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest.
  • requests to the API and console are made over a secure (SSL) connection.
  • stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.
  • SageMaker Neo is a new capability that enables machine learning models to train once and run anywhere in the cloud and at the edge.

Amazon Textract

  • Textract provides OCR and helps add document text detection and analysis to the applications.
  • includes simple, easy-to-use API operations that can analyze image files and PDF files.

Amazon Comprehend

  • Comprehend is a managed natural language processing (NLP) service to find insights and relationships in text.
  • identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic.
  • can analyze a collection of documents and other text files (such as social media posts) and automatically organize them by relevant terms or topics.

Amazon Lex

  • is a service for building conversational interfaces using voice and text.
  • provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable building applications with highly engaging user experiences and lifelike conversational interactions.
  • common use cases of Lex include: Application/Transactional bot, Informational bot, Enterprise Productivity bot, and Device Control bot.
  • leverages Lambda for Intent fulfillment, Cognito for user authentication & Polly for text-to-speech.
  • scales to customers’ needs and does not impose bandwidth constraints.
  • is a completely managed service so users don’t have to manage the scaling of resources or maintenance of code.
  • uses deep learning to improve over time.

Amazon Polly

  • text into speech
  • uses advanced deep-learning technologies to synthesize speech that sounds like a human voice.
  • supports Lexicons to customize pronunciation of specific words & phrases
  • supports Speech Synthesis Markup Language (SSML) tags like prosody so users can adjust the speech rate, pitch, pauses, or volume.

Amazon Rekognition

  • analyzes image and video
  • identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.
  • provides highly accurate facial analysis and facial search capabilities that can be used to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.
  • helps identify potentially unsafe or inappropriate content across both image and video assets and provides detailed labels that help accurately control what you want to allow based on your needs.

Amazon Forecast

  • Amazon Forecast is a fully managed time-series forecasting service that uses statistical and machine learning algorithms to deliver highly accurate time-series forecasts and is built for business metrics analysis.
  • automatically tracks the accuracy of the model over time as new data is imported. Model’s deviation from initial quality metrics can be systematically quantified and used to make more informed decisions about keeping, retraining, or rebuilding the model as new data comes in.
  • provides six built-in algorithms which include ARIMA, Prophet, NPTS, ETS, CNN-QR, and DeepAR+.
  • integrates with AutoML to choose the optimal model for the datasets.

Amazon SageMaker Ground Truth

  • helps build highly accurate training datasets for machine learning quickly.
  • offers easy access to labelers through Amazon Mechanical Turk and provides them with built-in workflows and interfaces for common labeling tasks.
  • allows using your own labelers or use vendors recommended by Amazon through AWS Marketplace.
  • helps lower labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.
  • provides annotation consolidation to help improve the accuracy of the data object’s labels.

Amazon Translate

  • provides natural and fluent language translation
  • a neural machine translation service that delivers fast, high-quality, and affordable language translation.
  • Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and natural-sounding translation than traditional statistical and rule-based translation algorithms.
  • allows content localization – such as websites and applications – for international users, and to easily translate large volumes of text efficiently.

Amazon Transcribe

  • provides speech-to-text capability
  • uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately.
  • can be used to transcribe customer service calls, automate closed captioning and subtitling, and generate metadata for media assets to create a fully searchable archive.
  • adds punctuation and formatting so that the output closely matches the quality of manual transcription at a fraction of the time and expense.
  • process audio in batch or near real-time.
  • supports automatic language identification.
  • supports custom vocabulary to generate more accurate transcriptions for domain-specific words and phrases like product names, technical terminology, or names of individuals.
  • supports specifying a list of words to remove from transcripts.

Amazon Kendra

  • is an intelligent search service that uses NLP and advanced ML algorithms to return specific answers to search questions from your data.
  • uses its semantic and contextual understanding capabilities to decide whether a document is relevant to a search query.
  • returns specific answers to questions, giving users an experience that’s close to interacting with a human expert.
  • provides a unified search experience by connecting multiple data repositories to an index and ingesting and crawling documents.
  • can use the document metadata to create a feature-rich and customized search experience for the users, helping them efficiently find the right answers to their queries.

Augmented AI (Amazon A2I)

  • Augmented AI (Amazon A2I) is an ML service that makes it easy to build the workflows required for human review.
  • brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers, whether it runs on AWS or not.

Amazon Personalize

  • Personalize is a fully managed machine learning service that uses data to generate item recommendations.
  • can also generate user segments based on the users’ affinity for certain items or item metadata.
  • generates recommendations primarily based on item interaction data that comes from the users interacting with items in the catalog.
  • includes API operations for real-time personalization, and batch operations for bulk recommendations and user segments.

Amazon Panorama

  • brings computer vision to the on-premises camera network.
  • AWS Panorama Appliance or another compatible device can be installed in the data center and registered with AWS Panorama to deploy computer vision applications from the cloud.
  • AWS Panorama Appliance
    • is a compact edge appliance that uses a powerful system-on-module (SOM) that is optimized for ML workloads.
    • can run multiple computer vision models against multiple video streams in parallel and output the results in real-time.
    • is designed for use in commercial and industrial settings and is rated for dust and liquid protection.
  • works with the existing real-time streaming protocol (RTSP) network cameras.

Amazon Fraud Detector

  • Fraud Detector is a fully managed service to identify potentially fraudulent online activities such as online payment fraud and fake account creation.
  • takes care of all the heavy lifting such as data validation and enrichment, feature engineering, algorithm selection, hyperparameter tuning, and model deployment.

AWS IoT Greengrass ML Inference

  • IoT Greengrass helps perform machine learning inference locally on devices, using models that are created, trained, and optimized in the cloud.
  • provides flexibility to use machine learning models trained in SageMaker or to bring your pre-trained model stored in S3.
  • helps get inference results with very low latency to ensure the IoT applications can respond quickly to local events.

Amazon Elastic Inference

  • helps attach low-cost GPU-powered acceleration to EC2 and SageMaker instances or ECS tasks to reduce the cost of running deep learning inference by up to 75%.
  • supports TensorFlow, Apache MXNet, and ONNX models, with more frameworks coming soon.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company has built a deep learning model and now wants to deploy it using the SageMaker Hosting Services. For inference, they want a cost-effective option that guarantees low latency but still comes at a fraction of the cost of using a GPU instance for your endpoint. As a machine learning Specialist, what feature should be used?
    1. Inference Pipeline
    2. Elastic Inference
    3. SageMaker Ground Truth
    4. SageMaker Neo
  2. A machine learning specialist works for an online retail company that sells health products. The company allows users to enter reviews of the products they buy from the website. The company wants to make sure the reviews do not contain any offensive or unsafe content, such as obscenities or threatening language. Which Amazon SageMaker algorithm or service will allow scanning user’s review text in the simplest way?
    1. BlazingText
    2. Transcribe
    3. Semantic Segmentation
    4. Comprehend
  3. A company develops a tool whose coverage includes blogs, news sites, forums, videos, reviews, images, and social networks such as Twitter and Facebook. Users can search data by using Text and Image Search, and use charting, categorization, sentiment analysis, and other features to provide further information and analysis. They want to provide Image and text analysis capabilities to the applications which include identifying objects, people, text, scenes, and activities, and also provide highly accurate facial analysis and facial recognition. What service can provide this capability?
    1. Amazon Comprehend
    2. Amazon Rekognition
    3. Amazon Polly
    4. Amazon SageMaker