Volume 1
The Hitchhiker’s Guide to Cloud Databases
This book is a map of modern data and analytics technologies and to navigate the labyrinthine web of disparate architectures and approaches. It distills and synthesizes the writer’s experiences in dealing with a multitude of products, architecture choices and solution tradeoffs.
The data management world is a confusing jumble of competing and overlapping technologies. There are numerous and overwhelming choices and options to modernize current infrastructure. The noise is only getting shriller by the day. This book attempts to declutter the mess and provide clarity on the end-to-end data and analytics lifecycle. Volume 1 of the book covers the area of cloud and databases.
table of contents
Chapter 1: Introduction
Chapter 2: Hybrid Multi Cloud
Chapter 3: Data Analytics Cloud Capabilities
Chapter 4: Data Stores Classification
Chapter 5: Data Stores Characteristics
Chapter 6: Data Stores Products
Chapter 7: Analytical Data Persistence
Chapter 8: Analytical Data Store Products
Chapter 9: Data Migration
Three things to know about this book:
Heavy emphasis on tactical advice. The best strategy is execution. There is no better alternative than to experiment and execute. Every failure is a step towards eventual success. This book explores methodologies with concomitant technical implementations to experiment quickly, learn lessons and refine deployments. It favors concrete examples over abstract concepts. Decision-making is being pushed down and developers and engineers have more power today than any time in the past.
Liberal use of vendor products. Concepts come to life with real world examples of their implementations. This book is not a comparison of competing products but uses the products to demonstrate the architectural choices, landscape and case studies. Good products stagnate and new products are constantly introduced. What used to be cutting edge, such as the use of ML in a product, have become table stakes over time and no longer provide a competitive edge. Hence readers are advised to always check the vendors’ documentation for the newest update. Use this book as a guide.
A giant list of lists. The goal is to list the most reasonable and well known technological choices in order to provide a navigational map for the various topics that cut across the end-to-end data and analytics pipeline. But why lists? Lists are everywhere. Even history is an ordered list of events (which is essentially what a streaming engine like Kafka is). Lists help readers discern lessons and patterns to deploy the good ones, reject the bad ones and devise new enhancements. This book follows the MECE principle to capture possible reasonable approaches without repetition although it is not always easy to do so.
This volume starts with a peek at the challenges facing modernizing the data value chain. It then covers guiding principles for building modern data and analytics pipelines. The major focus of this volume is in understanding the key cloud capabilities as they pertain to the data space.
Data stores are the foundation of any data and analytics architecture. The problem is that there are over 300 active databases and more new ones are constantly being added. This book has developed a number of taxonomies to parse the vast space of operational databases, analytical offerings, relational databases, NoSQL or non relational; databases and open source technologies. It analyzes products that offer different data models such as graphs, time series, JSON documents etc.
The last chapter of the book provides guidance on migration strategies between legacy and new options.