What is Amazon SageMaker?
If you are new to the machine learning development world you not know what you need to know about Amazon SageMaker. Here are the top 5 things you should know about it:
- Amazon SageMaker is an cloud-based AI development platform
- You have to give the algorithm examples of right answers
- You can compile your machine learning algorithm for performance
- SageMaker Canvas is a "point-and-click" version of Amazon SageMaker
- SageMaker is a low-level tool, and there are more powerful tools in the AWS ML ecosystem
(If you need help with your machine learning project, don't hesitate to reach out.)
1. Amazon SageMaker is an cloud-based AI development platform
Amazon SageMaker is a cloud-based artificial intelligence (AI) development platform which provides a consolidated, build-to-train-to-production flow including:
- Pre-built "Notebooks" - essentially the IDE (integrated Development Environment) of the Machine Learning space
- Built-in high performance algorithms
- One-click training
- Hyperparameter optimization
- One-click deployment
- Fully managed hosting with auto-scaling
If you're new to ML the value of those elements may not register, but it is a LOT easier than rolling your own environment from the ground up. All of that said, the greatest value in the AWS Machine Learning ecosystem can be drawn from more advanced Machine Learning/Deep learning services which I dive into more detail on below.
- Amazon Rekognition - image/video object detection, facial recognition and more
- Amazon Lex - voice and/or text chatbot which can decode language and identify intent
- Amazon Personalize - product recommendations, product re-ranking, and customized direct marketing at scale
- Amazon Polly - create automated spoken voice from text in over a dozen languages
- Amazon Comprehend - extract insights from large amounts of written text like product reviews or document libraries
- Amazon Translate - accurately convert text from one language to another in real time (can be used in combination with Lex & Polly for speech applications)
- AWS DeepLens - a deep learning enabled video camera
- Amazon Forecast - a machine learning time-series forecasting service
2. You have to give the algorithm examples of right answers
On the most simple level, machine learning is about putting data in and getting a prediction out. For example, based on a users demographic information what upsell product are they most likely to buy? The data that goes into the algorithm are called features, and you can have a lot of them in a complex model. The output or prediction is called the label, and there should only be one per "row" of data. The label is the right answer, an example of what you want the algorithm to do. Data without a label isn't useful for a machine, because the algorithm doesn't know what it is trying to predict. The process of giving the data a label is a core part of training a machine algorithm. AWS facilitates this process with several tools:
- Amazon SageMaker Ground Truth - allow you to identify raw data including images, text files, and videos, and add informative labels to create high-quality training datasets for your machine learning models.
- Amazon Mechanical Turk - a crowdsourcing marketplace that allows outsourcing of labeling jobs to a distributed real human workforce.
3. You can compile your machine learning algorithm for performance
Because many of trained machine learning algorithms need to operate in realtime (100ms or less) performance is key. If you take even a few seconds to recommend a related product, you may miss your opportunity. SageMaker Neo optimizes (aka "compiles") trained machine learning models for use on cloud instances and edge devices (even IoT & mobile) to run up tp x25 faster with no loss in accuracy. It currently supports the following ML algorithms:
4. SageMaker Canvas is a "point-and-click" version of Amazon SageMaker
Announced in December 2021, SageMakers Canvas is a "visual point-and-click" version of Amazon SageMaker designed to allow Business Analysts to harness the power of machine learning. As a lot of machine learning is actually data munching, and visual analysis of data, it's possible that this will catch on, but we're still at the very early stages. In my experience a lot of the machine learning space is still very much academic, requiring an understanding of some pretty nuanced vernacular. For example, precision, accuracy and recall are all very specific things in the ML space, but that nuance is lost on people who haven't had to study statistics or who don't have experience with navigating the AWS ecosystem. Here, for example is the registration screen:
5. SageMaker is a low-level tool, and there are more powerful tools in the AWS ML ecosystem
Amazon SageMaker is a powerful, super sophisticated tool and a great place for people to start if they are interested in getting their machine learning certification, but it is a low-level, high-customizable service. Depending on your specific needs, it's likely that one of the higher-level Deep Learning/Machine Learning tools in the Amazon ecosystem will be a better fit. While this ecosystem is always growing here are some of the more interesting machine learning algorithms available:
- Amazon Rekognition
- Amazon Lex
- Amazon Polly
- Amazon Comprehend
- Amazon Translate
- AWS DeepLens
Technically a "Deep learning" service, Amazon Rekognition is machine learning image and video processing algorithm that was initially trained on Amazon Prime Photos. It is capable of a lot of out-of-the-box functionality including:
- Content moderation
- Face detection and analysis
- Face compare and search
- Celebrity recognition
- Text detection
- Object detection
- Custom labeling
- Video segment detection
- Personal Protective Equipment (PPE) Detection
All of this is accessible via an API in which the video is encrypted in transit and at rest, making it an amazing bolt-on service for expanding metadata information for visual libraries of content. The algorithm can also be trained for better performance against particular datasets (lesser known celebrities, etc.). If you want to learn more I've written about the top 5 things to know about Amazon Rekognition.
The same technology that powers Amazon's home speaker, Alexa, Lex is an API-based services that provides Automatic Speech Recognition (ASR) -which identifies the words being used- along with Natural Language Understanding (NLU) -which identifies intent- to enable custom voice or text-based chatbots. Lex is extensible allowing the use of custom vocabularies.
Amazon Polly reverses the flow of information from speech-to-text, converting text back into spoken word. Out of the box there are several different voices, of different genders, and there is support for more than 2 dozen languages. Using Lex together with Polly, it is possible to build a completely voice powered application.
Amazon Comprehend is an algorithm designed to consume massive amounts of written text using Natural Language Processing (NLP), and output a number of insights including:
- Discovering insights and relationships in text
- Identify language based on text (is this Spanish or English?)
- Extract key phrases, places, people, brands or events
- Understand positive or negative (was this a positive review?)
- Automatically organize a collection of text files by topi
Amazon Translate is a neural machine translation service which understands over a dozen languages which enables:
- Fluent translation of text
- Localization for international users
- Easy translation of large volumes of text efficiently
AWS DeepLens is a hardware device with custom on the edge inference engine that is capable of calculating its first inference just 10 minutes after unboxing.
Is the AWS Machine Learning Certification worth it?
For those of you considering getting a Machine Learning certification on one of the big cloud providers (AWS, Google, Azure), this is an overview of some of the ready-made AWS Deep Learning algorithms, designed to showcase what the AWS machine learning ecosystem is capable of. If you are wondering if the AWS Machine Learning Certification is worth it, I answer that question by looking at volume of Google search trends, Gartner's Magic quadrant, and my own professional experience.