How to choose right database and storage for the application

For decades, organizations have been using a traditional relational database and trying to fit everything there, whether it is key/value-based user session data, unstructured log data, or analytics data for a data warehouse. However, the truth is, the relational database is meant for transaction data, and it doesn’t work very well for other data types.

Similarly, for specific data needs, you should choose the right tool that can do the heavy lifting, and scale without compromising performance. Solution architects need to consider multiple factors while choosing the data storage to match the right technology. Here are the important ones:

Durability requirement: How should data be stored to prevent data corruption?
Data availability: Which data storage system should be available to deliver data?
Latency requirement: How fast should the data be available?
Data throughput: What is the data read and write need?
Data size: What is the data storage requirement?
Data load: How many concurrent users need to be supported?
Data integrity: How to maintain the accuracy and consistency of data?
Data queries: What will be the nature of queries

In the following table, you can see different types of data with examples and appropriate storage types to use. Technology decisions need to be made based on storage type, as shown here:

Data Type	Data Example	Storage Type	Storage Example
Transactional, structured schema	User order data, financial transaction	Relational database	Amazon RDS, Oracle, MySQL, PostgreSQL, MariaDB Microsoft SQL Server
Key-value pair, semi-structured, unstructured	User session data, application log, review, comments	NoSQL	Amazon DynamoDB, MongoDB, Apache HBase, Apache Cassandra, Azure Tables,
Analytics	Sales data, Supply chain intelligence, Business flow	Data warehouse	IBM Teradata, Netezza, Greenplum, Google Amazon Redshift, BigQuery
In-memory	User home page data, common dashboard	Cache	Redis cache, Memcached Amazon ElastiCache,
Object	Image, video	File-based	SAN, Amazon S3, Azure Blob Storage, Google Storage
Block	Installable software	Block-based	NAS, Amazon EBS, Amazon EFS, Azure Disk Storage
Streaming	IoT sensor data, clickstream data	Temporary storage for streaming data	Apache Kafka, Amazon Kinesis, Spark Streaming, Apache Flink
Archive	Any kind of data	Archive storage	Amazon Glacier, magnetic tape storage, virtual tape library storage
Web storage	Static web contents such as images, videos, HTML pages	CDN	Amazon CloudFront, Akamai CDN, Azure CDN, Google CDN, Cloudflare
Search	Product search, content search	Search index store and query	Amazon Elastic Search, Apache Solr, Apache Lucene
Data catalog	Table metadata, data about data	Meta-data store	AWS Glue, Hive metastore, Informatica data catalog, Collibra data catalog
Monitoring	System log, network log, audit log	Monitor dashboard and alert	Splunk, Amazon CloudWatch, SumoLogic, Loggly

As you can see in the preceding table, there are various properties of data, such as structured, semi-structured, unstructured, key-value pair, streaming, and so on. Choosing the right storage helps to improve not only the performance of the application but also its scalability. For example, you can store user session data in the NoSQL database, which will allow application servers to scale horizontally and maintain user sessions at the same time.

While choosing storage options, you need to consider the temperature of the data, which could be hot, warm, or cold:

For hot data, you are looking for sub-millisecond latency and required cache data storage. Some examples of hot data are stock trading and making product recommendations in runtime.
For warm data, such as financial statement preparation or product performance reporting, you can live with the right amount of latency, from seconds to minutes, and you should use a data warehouse or a relational database.
For cold data, such as storing 3 years of financial records for audit purposes, you can plan latency in hours, and store it in archive storage.

How to register a lifetime free Udemy Business account

Learn Docker in 10 minutes for beginners

Data verification in Kafka with Schema Registry

How to choose right database and storage for the application

Related Posts