Showing posts with label System Design. Show all posts
Showing posts with label System Design. Show all posts

Monday, November 2, 2020

Designing Data Intensive Application | Notes 1 | Introduction

Data-intensive applications are pushing the boundaries of what is possible by making use of these technological developments. We call an application data-intensive if data is its primary challenge—the quantity of data, the complexity of data, or the speed at which it is changing—as opposed to compute-intensive, where CPU cycles are the bottleneck

The tools and technologies that help data-intensive applications store and process data have been rapidly adapting to these changes. New types of database systems (“NoSQL”) have been getting lots of attention, but message queues, caches, search indexes, frameworks for batch and stream processing, and related technologies are very important too. Many applications use some combination of these.

Sometimes, when discussing scalable data systems, people make comments along the lines of, “You’re not Google or Amazon. Stop worrying about scale and just use a relational database.” There is truth in that statement: building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization. However, it’s also important to choose the right tool for the job, and different technologies each have their own strengths and weaknesses. As we shall see, relational databases are important but not the final word on dealing with data.

Ref:https://github.com/ept/ddia-references

This book is arranged into three parts:

In Part I, we discuss the fundamental ideas that underpin the design ofdata-intensive applications. We start in Chapter 1 by discussing what we’re actuallytrying to achieve: reliability, scalability, and maintainability; how we need to think aboutthem; and how we can achieve them. In Chapter 2 we compare several different datamodels and query languages, and see how they are appropriate to different situations. InChapter 3 we talk about storage engines: how databases arrange data on disk so that wecan find it again efficiently. Chapter 4 turns to formats for data encoding (serialization)and evolution of schemas over time.

In Part II, we move from data stored on one machine to data that isdistributed across multiple machines. This is often necessary for scalability, but brings with ita variety of unique challenges. We first discuss replication (Chapter 5),partitioning/sharding (Chapter 6), and transactions (Chapter 7). We thengo into more detail on the problems with distributed systems (Chapter 8) and what itmeans to achieve consistency and consensus in a distributed system (Chapter 9).

In Part III, we discuss systems that derive some datasets from other datasets. Deriveddata often occurs in heterogeneous systems: when there is no one database that can do everythingwell, applications need to integrate several different databases, caches, indexes, and so on. InChapter 10 we start with a batch processing approach to derived data, and we build upon it with stream processing in Chapter 11. Finally, in Chapter 12 we put everythingtogether and discuss approaches for building reliable, scalable, and maintainable applications inthe future.

Thursday, June 11, 2020

System Design Basics | Post 2 System Design Template

  1. ask for requirements - extremely important. DO NOT jump into designing without this first
    • functional/nonfunctional/scope
  2. capacity estimation (was asked to skip b/c not as relevant)
    • bandwidth/storage
  3. high level design
    • client/server/application/database
  4. component design - pick your best area and suggest to start there. Say something like "I could do deeper into client/server/database, but I think {INSERT YOUR BEST AREA} is a good place to start. Do you agree?" (got this from a Byte-by-Byte seminar)
  5. scale it up (I made an acrononym here to help me remember, it's a little silly but I'll share)
    • MSCANDaLS
      • Mapreduce,
      • Scaling,
      • Caching,
      • Asynchronous Processing,
      • Network metrics,
      • Database denormalization,
      • Loadbalancing,
      • Sharding


Sunday, May 3, 2020

System Design Basics | Post -1

These days System Design questions have become an important part of any interview process. Not just for interviews but to for the overall improvement  of oneself, one must have the knowledge of System design so that good foundation can be laid from the very beginning.
In coming posts i will touch this topic.

Flow

A. Understand the problem and scope

Define the use cases, with interviewer's help.
Suggest additional features.
Remove items that interviewer deems out of scope.
Assume high availability is required, add as a use case.

B. Think about constraints

Ask how many requests per month.
Ask how many requests per second (they may volunteer it or make you do the math).
Estimate reads vs. writes percentage.
Keep 80/20 rule in mind when estimating.
How much data written per second.
Total storage required over 5 years.
How much data reads per second.

C. Abstract design

Layers (service, data, caching).
Infrastructure: load balancing, messaging.
Rough overview of any key algorithm that drives the service.
Consider bottlenecks and determine solutions.

Source: https://github.com/jwasham/coding-interview-university#system-design-scalability-data-handling