Everything you need to know about AWS: Part 3
By Kelly Dent
Welcome to the final instalment in our three-part series on everything you need to know about AWS! In this blog post, we’ll be taking a deep dive into various AWS products and services, looking at some of the most popular expert-level questions on everything from support, to EC2, S3, Redshift and beyond. Want to start from the beginning? Head on over to Everything you need to know about AWS: Part one or Part two to catch up before diving into the final batch of questions.
What kind of issues are covered by AWS Support?
Your AWS Support covers a wide range of performance and production issues for AWS products and services, along with other key stack components, including:
- “How to” questions about AWS services and features
- Best practices to help you successfully integrate, deploy, and manage applications in the cloud
- Troubleshooting API and AWS SDK issues
- Troubleshooting operational or system problems with AWS resources
- Issues with Management Console or other AWS tools
- Problems detected by Health Checks
- A number of third-party applications such as OS, web servers, email, databases, and storage configuration
AWS Support does not include code development, debugging custom software or performing system administration tasks.
What algorithm does Amazon Machine Learning use to generate models?
Amazon Machine Learning operates using an industry-standard logistic regression algorithm to generate models which can be trained on datasets up to 100GB in size. Data can be read across three types of data stores:
- Files in Amazon S3
- Results from an Amazon Redshift query
- Results from an Amazon RDS query executed against a database running a MySQL engine
Data from other products can generally be exported for use in Amazon Machine Learning via Amazon S3. The product also offers powerful model evaluation features.
Amazon EC2 and Amazon S3
What can developers do now that they couldn’t before thanks to EC2?
Previously, smaller developers wouldn’t have had the capital needed to get their hands on massive compute resources. They certainly wouldn’t have had the resources or capacity required to handle any unexpected load spikes, either. Enter Amazon EC2.
According to the Jefferson Frank AWS salary survey, 89% of respondents reported working with Amazon EC2, making it arguably the most popular AWS product right now. Amazon EC2 makes it possible for any developer to use Amazon’s vast resources with no investment up-front and absolutely no compromise on performance. Developers are free to work and experiment, safe in the knowledge that no matter how fast their businesses may grow, scaling up to meet their needs will always be cost-effective and straightforward.
If there’s a spike in computing requirements, EC2 responds instantaneously, giving you control over how many resources are used at any given time. In contrast, traditional hosting services allocate a fixed number of resources for a pre-determined amount of time, meaning that you would typically experience a limited ability to respond when usage levels change unexpectedly or experience significant peaks.
How quickly will systems be running with EC2?
It usually takes less than 10 minutes. The amount of time taken to boot up depends on a few factors including the size of the Amazon Machine Image (AMI), the number of instances being launched, and how recently that AMI has been launched. It’s worth knowing that launching an AMI for the first time could take a little longer to boot.
What is Amazon Athena?
Amazon Athena is an interactive query service you can use to analyze data in Amazon S3. Athena is serverless and uses standard SQL and Presto. It works with a number of different data formats, including CSV, JSON, ORC, Apache Parquet, and Avro.
What is Amazon Elastic MapReduce?
Amazon EMR is a web service that allows users to process massive amounts of data in an easy and cost-effective way. It uses a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3.
Amazon EMR makes it possible for customers to access exactly the capacity required to carry out data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research.
What is Amazon CloudSearch?
Amazon CloudSearch is a fully-managed service that makes it easy to set up, manage, and scale a search solution for your website or application.
Amazon CloudSearch provides several benefits over running your own self-managed search service including easy configuration, auto-scaling for data and traffic, self-healing clusters, and high availability with Multi-AZ. Through the AWS Management Console, users can create a search domain and upload the data to be made searchable, and Amazon CloudSearch automatically provides the required resources and deploys a highly tuned search index.
What is Amazon Elasticsearch Service?
Amazon Elasticsearch Service helps you with every aspect of domain setup, from provisioning infrastructure capacity in the network environment you request right up to installation of the Elasticsearch software.
Once your domain is up and running, Elasticsearch automates common administrative tasks, such as performing backups, monitoring instances and patching software, ultimately saving you time. It integrates with Amazon CloudWatch, producing metrics that show you the state of your domains, with the option to modify domain instance and storage settings to make tailoring your domain as straightforward as possible.
What is Amazon Kinesis?
Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs. Through Amazon Kinesis, you can add various types of data such as clickstreams, application logs, and social media to a data stream from countless sources, and that data is made available in seconds.
What is Amazon Redshift?
Amazon Redshift is a fast, fully managed data warehouse used to analyze data via standard SQL and any existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against massive amounts of structured data, with most results returning in seconds.
Redshift is praised for its lightning-fast querying, with queries distributed and parallelized across multiple physical resources. It scales easily, and automatically patches and backs up the data warehouse.
What is Amazon QuickSight?
Amazon QuickSight is a cloud-powered business analytics service that allows employees within an organization to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime, on any device.
What is AWS Data Pipeline?
AWS Data Pipeline is a web service used to schedule regular data movement and data processing activities in the cloud. It integrates with on-premises apps, as well as cloud-based storage, to allow users to access data as required. Based on a schedule you define, your pipeline regularly performs processing activities such as distributed data copy, SQL transforms, MapReduce applications, or custom scripts against destinations such as Amazon S3, Amazon RDS, or Amazon DynamoDB.
What is AWS Glue?
AWS Glue is a fully-managed, pay-as-you-go, extract, transform, and load (ETL) service that automates the time-consuming steps of data preparation for analytics. AWS Glue automatically discovers and profiles data via the Glue Data Catalog, recommends and generates ETL code to transform source data into target schemas, and runs the ETL jobs on a fully managed, scale-out Apache Spark environment to load data into its destination. It also allows users to set up, orchestrate, and monitor complex data flows.
How much data can I store in Amazon S3?
You can store an unlimited amount of data, with individual S3 objects ranging from 0 bytes to a maximum of 5 terabytes. The largest object that can be uploaded in a single PUT is 5 gigabytes. For objects larger than 100 megabytes, you should consider using the Multipart Upload capability.
What storage classes does Amazon S3 offer?
There are four storage classes:
- Amazon S3 Standard (for general storage of frequently-accessed data)
- Amazon S3 Standard-Infrequent Access (for less frequently-accessed data)
- Amazon S3 One Zone-Infrequent Access (for less frequently accessed data)
- Amazon Glacier (for long-term archiving)
More detailed information regarding the different storage classes can be found on AWS’s official site.
How is Amazon S3 data organized?
Amazon S3 is a simple key-based object store. When data is stored, you’re given a unique object key that can be used to retrieve the data at a later time. Alternatively, customers can utilize S3 Object Tagging to organize data across all S3 buckets.
How reliable is Amazon S3?
The S3 Standard storage class is designed for 99.99% availability, while the S3 Standard-IA storage class and S3 One Zone-IA are designed for 99.9% and 99.5% availability respectively.
Payments and Billing
How do you pay for AWS?
All AWS services are available on demand without any long-term contracts. AWS operates on a pay-as-you-go pricing model which allows you to adapt to fluctuating business needs; you’re free to adapt services according to your requirements, no matter how quickly things change. This, in turn, allows you to focus on your business more effectively and efficiently.
Is there a free option?
AWS’s Free Tier is designed to give you hands-on experience with AWS Cloud Services before making a more substantial commitment to the provider. The AWS Free Tier includes services with available for 12 months following your AWS sign-up date, as well as additional service offers that do not automatically expire at the end of your 12-month AWS Free Tier term.