Skip to main content
  1. 🔰Posts/
  2. 🗂️My Trainings/
  3. AWS Trainings and Certifications/
  4. 🏅AWS Certified Cloud Practitioner/

Databases

📚 Part 12 of 25: "AWS Cloud Practitioner" series.

·1281 words·7 mins

Relational Databases #

A relational database is a type of database that organizes data into rows and columns, which collectively form a table where the data points are related to each other.

Data is typically structured across multiple tables, which can be joined together via a primary key or a foreign key. These unique identifiers demonstrate the different relationships which exist between tables, and these relationships are usually illustrated through different types of data models.

No-SQL Databases #

  • No-SQL = non-relational databases
  • No-SQL databases are purpose built for specific data models and have flexible schemas for building modern applications

Benefits:

  • Flexibility (easy to evolve data model)
  • Scalability (designed to scale out by using distributed clusters)
  • High-Performance (optimized for a specific data model)
  • Highly functional (types optimized for the data model)

Use cases: Key-value, document, graph, in-memory, search databases

RDS and Aurora #

Amazon RDS #

RDS stands for Relational Database Service. It is a managed DB service.

It allows creating databases in the cloud that are managed by AWS:

  • Postgres
  • MySQL
  • MariaDB
  • Oracle
  • Microsoft SQL Server
  • IBM DB2
  • Aurora (AWS Proprietary)

Advantage of using RDS vs deploying DB on EC2:

  • RDS is a managed service
    • Automated Provisioning and OS patching
    • Continuous backups and restore to specific timestamp (Point in Time Restore)
    • Monitoring dashboards
    • Read replicas for improved read performance
    • Multi-AZ setup for DR
    • Maintenance windows for upgrades
    • Scaling capability (both, vertical and horizontal)
    • Storage backed by EBS
  • Not possible to SSH into DB instances (managed service)

Example RDS application architecture #

Source: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html

Amazon Aurora #

  • Aurora is proprietary technology from AWS (not open sourced)
  • PostgreSQL and MySQL are both supported
  • Aurora is “AWS Optimized” and claims to be 5x performance improved over MySQL on RDS and 3x the performance of Postgres on RDS
  • Aurora storage automatically grows and increments of 10GB (up to 128 TB)
  • Aurora costs about 20% more than RDS

Amazon Aurora Serverless #

  • Automated (on-demand) database with autoscaling based on actual usage
  • PostgreSQL and MySQL are both supported as Aurora Serverless DB
  • No capacity planning required
  • Least management overhead
  • Pay-per-second, COULD BE more effective

Use cases: infrequent, intermittent or unpredictable workloads.

Aurora with no management overhead = Aurora Serverless. #

Create RDS database #

Aurora and RDS > Create a database

RDS Deployment options #

  • Read Replicas
    • Scale the read workload of your DB
    • Can create up to 15 replicas
    • Data is only written to the main DB
  • Multi-AZ
    • Failover in case of AZ outage (High-Availability)
    • Data only read/written to the main DB
    • Can only have 1 AZ as a failover
  • Multi-Region
    • Multi-Region (Read Replicas)
    • Writes only to the main database
    • Local performance for global reads
    • Additional replication cost
    • Use case: DR in another region

More: Configuring and managing a Multi-AZ deployment for Amazon RDS

Other Database Types #

Amazon ElastiCache #

  • The same way RDS is to get managed Relational Databases, ElastiCache is to get managed Redis or Memcached.

  • Caches are in-memory databases with high performance and low latency

  • Helps reducing load from databases with read-intensive workloads

  • AWS taking care of OS maintenance, patching, optimizations, setup, configuration, monitoring, failure recovery and backups

More: https://docs.aws.amazon.com/elasticache/

DynamoDB #

  • Fully managed, Highly Available with replication across 3AZ
  • No-SQL database - not a relational DB
  • Scales to massive workloads, distributed, “serverless”
  • Millions of requests per second, trillions of row, 100s TB of storage
  • Fast and consistent performance
  • Single-digit millisecond latency
  • Integrated with IAM for security, authorization and administration
  • Low cost and auto scaling capabilities
  • Standard & Infrequent Access (IA) Table Class

DynamoDB Accelerator (DAX) #

  • Fully Managed in-memory cache for DynamoDB
  • 10x performance improvement when accessing DynamoDB tables

DAX is only used for DynamoDB where ElastiCache can be used for other databases.

DynamoDB Global Tables #

  • Makes DynamoDB table accessible with low latency in multiple-regions
  • Active-Active replication (read/write to any AWS Region)

Redshift #

  • Redshift is based on PostgreSQL
  • It’s OLAP - Online Analytical Processing (analytics and data warehousing)
  • Load data once every hour, not every second
  • 10x better performance than other data warehouses
  • Scales to PBs of data
  • Columnar storage of data (instead of rows)
  • Massively Parallel Query Execution (MPP)
  • Pay-as-you-go based on the instances provisioned
  • Has a SQL interface for performing queries

Redshift Serverless #

  • Auto Scaling
  • Run analytics workload without managing data warehouse infrastructure
  • Pay only for what you use
  • Use cases: Reporting, real-time analytics

Amazon EMR #

  • EMR stands for “Elastic MapReduce
  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amounts of data
  • The clusters can be made of hundreds of EC2 instances
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, machine learning, web indexing, big data

Athena #

  • Serverless query service to perform analytics against S3 objects
  • Uses standard SQL language to query the files
  • Supports CSV, JSON, ORD, Avro, Parquet
  • Pricing: $5 per TB of data scanned
  • Use cases: Business intelligence, analytics, reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail logs, etc.
Exam tip: analyze data in S3 using serverless SQL = Athena #

QuickSight #

Allows creating dashboards for services used in AWS. Per-session pricing.

  • Serverless machine-learning powered business intelligence service to create interactive dashboards
  • Use cases:
    • Business analytics
    • Building visualisations
    • Ad-hoc analysis
    • Get business insights using data
  • Integrated with RDS, Aurora, Athena, Redshift, S3

More: https://docs.aws.amazon.com/quicksight/

DocumentDB #

Aurora version for MongoDB (NoSQL database).

  • MongoDB is used to store, query and index JSON data
  • Fully Managed, Highly Available with replication across 3AZ
  • DocumentDB storage automatically grows in increments of 10 GB

Neptune #

  • Fully managed graph database
  • A popular graph dataset would be a social network
    • Users have friends
    • Posts have comments
    • Comments have likes from users
    • Users share and like posts
  • Highly Available across 3AZ with up to 15 replicas
  • Build and run applications working with highly connected datasets = optimized for those complex queries
  • Can store up to billions of relations and query the graph with milliseconds latency
  • Use cases: knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking

Amazon Timestream #

  • Serverless time series database
  • Automatically scales up and down to adjust capacity
  • Store and analyze trillions of events per day

Amazon managed Blockchain #

  • Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority
  • Amazon managed Blockchain is a managed service that allows:
    • Join public Blockchain networks
    • Create your own scalable, private network
  • Compatible with:
    • Hyperledger Fabric
    • Ethereum

AWS Glue #

Managed Extract, Transform and Load (ETL) service.

  • Useful to prepare and transform data for analytics
  • Fully serverless service

DMS #

DMS - Database Migration Service

  • Quick and secure migrate databases to AWS

  • The source database remains available during the migration

  • Homogeneous migrations: i.e. Oracle to Oracle

  • Heterogeneous migrations: i.e. MSSQL to Aurora

Database Summary #

  • Relational Databases - OLTP: RDS & Aurora (SQL)
  • Differences between Multi-AZ, Read Replicas, Multi-Region
  • In-memory Database: ElastiCache
  • Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
  • Warehouse - OLAP: Redshift (SQL)
  • Hadoop Cluster: EMR
  • Athena: query data on Amazon S3 (serverless & SQL)
  • QuickSight: dashboards on your data (serverless)
  • DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
  • Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
  • Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
  • Glue: Managed ETL (Extract Transform Load) and Data Catalog service
  • Database Migration: DMS
  • Neptune: graph database
  • Timestream: time-series database

» Sources « #

» Disclaimer « #

This series draws heavily from Stephane Maarek’s Ultimate AWS Certified Cloud Practitioner course on Udemy.

His content was instrumental in helping me pass the certification.

About the instructor
🌐 Website📺 YouTube
💼 LinkedIn𝕏 x.com

ℹ️Shared for educational purposes only, no rights reserved.