AWS Certification – Machine Learning Services – Cheat Sheet

Amazon SageMaker

  • Build, train, and deploy machine learning models at scale
  • fully-managed service that enables data scientists and developers to quickly and easily build, train & deploy machine learning models.
  • enables developers and scientists to build machine learning models for use in intelligent, predictive apps.
  • is designed for high availability with no maintenance windows or scheduled downtimes.
  • allows users to select the number and type of instance used for the hosted notebook, training & model hosting.
  • can be deployed as endpoint interfaces and batch.
  • supports Canary deployment using ProductionVariant and deploying multiple variants of a model to the same SageMaker HTTPS endpoint.
  • supports Jupyter notebooks.
  • Users can persist their notebook files on the attached ML storage volume.
  • Users can modify the notebook instance and select a larger profile through the SageMaker console, after saving their files and data on the attached ML storage volume.
  • includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, seq2seq, time series forecasting, word2vec & image classification
  • algorithms work best when using the optimized protobuf recordIO format for the training data, which allows Pipe mode that streams data directly from S3 and helps faster start times and reduce space requirements
  • provides built-in algorithms, pre-built container images, or extend a pre-built container image and even build your custom container image.
  • supports users custom training algorithms provided through a Docker image adhering to the documented specification.
  • also provides optimized MXNet, Tensorflow, Chainer & PyTorch containers
  • ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest.
  • requests to the API and console are made over a secure (SSL) connection.
  • stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.
  • SageMaker Neo is a new capability that enables machine learning models to train once and run anywhere in the cloud and at the edge.

Amazon Comprehend

  • is a managed natural language processing (NLP) service to find insights and relationships in text.
  • identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic.
  • can analyze a collection of documents and other text files (such as social media posts) and automatically organize them by relevant terms or topics.

Amazon Lex

  • is a service for building conversational interfaces using voice and text.
  • provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable building applications with highly engaging user experiences and lifelike conversational interactions.
  • common use-cases of Lex include: Application/Transactional bot, Informational bot,  Enterprise Productivity bot and Device Control bot.
  • leverages Lambda for Intent fulfillment, Cognito for user authentication & Polly for text to speech.
  • scales to customers needs and does not impose bandwidth constraints.
  • is a completely managed service so users don’t have to manage scaling of resources or maintenance of code.
  • uses deep learning to improve over time.

Amazon Polly

  • text into speech
  • uses advanced deep learning technologies to synthesize speech that sounds like a human voice.
  • supports Speech Synthesis Markup Language (SSML) tags like prosody so users can adjust the speech rate, pitch or volume.

Amazon Rekognition

  • analyze image and video
  • identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.
  • provides highly accurate facial analysis and facial search capabilities that you can use to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.
  • helps identify potentially unsafe or inappropriate content across both image and video assets and provides detailed labels that allow to accurately control what you want to allow based on your needs.

Amazon SageMaker Ground Truth

  • helps build highly accurate training datasets for machine learning quickly.
  • offers easy access to labelers through Amazon Mechanical Turk and provides them with built-in workflows and interfaces for common labeling tasks.
  • allows using your own labelers or use vendors recommended by Amazon through AWS Marketplace.
  • helps lower labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.
  • provides annotation consolidation to help improve the accuracy of the data object’s labels.

Amazon Translate

  • provides natural and fluent language translation
  • is a neural machine translation service that delivers fast, high-quality, and affordable language translation.
  • Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms.
  • allows content localization – such as websites and applications – for international users, and to easily translate large volumes of text efficiently.

Amazon Transcribe

  • provides speech-to-text capability
  • uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately.
  • can be used to transcribe customer service calls, to automate closed captioning and subtitling, and to generate metadata for media assets to create a fully searchable archive
  • adds punctuation and formatting so that the output closely matches the quality of manual transcription at a fraction of the time and expense.
  • process audio in batch or in near real-time
  • supports custom vocabulary to generate more accurate transcriptions for domain-specific words and phrases like product names, technical terminology, or names of individuals.
  • specify a list of words to remove from transcripts

Amazon Elastic Inference

  • helps attach low-cost GPU-powered acceleration to EC2 and SageMaker instances or ECS tasks to reduce the cost of running deep learning inference by up to 75%.
  • supports TensorFlow, Apache MXNet, and ONNX models, with more frameworks coming soon.
Posted in AWS