In-Memory Database vs. In-Memory Data Grid

1.
Aerospike을 조사하다 In Memory Data Grid을 알았습니다. 제가 아는 Grid는 HTS를 개발할 때 가장 활용도가 높은 Grid 입니다. 그런데 Grid? 영어로 Grid는 격자입니다. 그래서 Data Grid를 여러 곳에 데이타를 나누어 사용하는 의미가 아닐까 추측을 해볼 수 있습니다. 위키는 이렇게 Data Grid를 정의합니다. Distributed Data를 강조합니다.

The data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes.

Data Grid에 In-Memory가 붙었으므로 당연히 DRAM이나 Flash Memory로 구축한 Data Grid임을 알 수 있습니다. 그러면 In Memory Data Grid를 어떻게 정의하는지 알아보죠.

WhastIS.com의 정의입니다. Gartner에 따르면 빅데이타의 등장과 함께 성장한 기술이네요.

An in-memory data grid (IMDG) is a data structure that resides entirely in RAM (random access memory), and is distributed among multiple servers. Recent advances in 64-bit and multi-core systems have made it practical to store terabytes of data completely in RAM, obviating the need for electromechanical mass storage media such as hard disks.

According to industry analyst firm Gartner Inc., IMDGs are suited to handle big data’s “big-three V’s”: velocity, variability, and volume. IMDGs can support hundreds of thousands of in-memory data updates per second, and they can be clustered and scaled in ways that support large quantities of data. Specific advantages of IMDG technology include:

Enhanced performance because data can be written to, and read from, memory much faster than is possible with a hard disk.
The data grid can be easily scaled, and upgrades can be easily implemented.
A key/value data structure, rather than a relational structure, provides flexibility for application developers.
The technical advantages provide business benefits in the form of faster decision making, greater productivity, and improved customer service.

In Memory Data Grid Technologies에 나온 정의입니다.

an IMDG is an ‘off the shelf’ software product that exhibits the following characteristics:

The data model is distributed across many servers in a single location or across multiple locations. This distribution is known as a data fabric. This distributed model is known as a ‘shared nothing’ architecture.

All servers can be active in each site.

All data is stored in the RAM of the servers.
Servers can be added or removed non-disruptively, to increase the amount of RAM available.
The data model is non-relational and is object-based.
Distributed applications written on the .NET and Java application platforms are supported.
The data fabric is resilient, allowing non-disruptive automated detection and recovery of a single server or multiple servers.

2.
In Memory Data Grid 기술은 빅 데이타 시대, 대용량의 데이타를 빠르게 처리하고자 요구를 In Memory DataBase와 다른 방식으로 해결하고자 합니다. IMDG 회사인 Gridgain의 자료중 일부입니다. Memory의 제한을 High Scalability 기술로 극복한 것입니다.

Since memory is a much more limited resource than disk, IMDGs are built from ground up with a notion of horizontal scale and ability to add nodes on demand in real-­time. IMDGs are designed to linearly scale to hundreds of nodes with strong semantics for data locality and affinity data routing to reduce redundant data movement.
In-Memory Data Grid White Paper중에서

이와 비슷한 기술이 Distributed Cache입니다. 이와 어떻게 다를까요?

A data grid is an in-memory distributed database that is designed and optimized for fast processing of data. Data bottlenecks are eliminated and application performance improves because the data grid adds an easily accessible and complementary layer alongside the existing databases and the application. Data grids can be used with traditional line of business relational databases, NoSQL databases, or streaming data environments. The data grid architecture uses techniques for data distribution and data partitioning in such a way that data can be processed very quickly with low latency. Data that is needed frequently or quickly can be stored in the data grid and distributed across multiple servers to speed up access. This characteristic of data grids frees up the traditional database from the burden of providing the right level of scalability and performance. As a result, the traditional database can be used to store data that is needed less frequently.

Data grids are intended to break the model of the traditional scale up architectures that require organizations to add servers, additional databases, management software and complex programming techniques to compensate for low latency and decreased throughput. Many organizations try various approaches only to encounter new performance and scalability challenges due to increasing cost, complexity, or additional data bottlenecks. For example, one frequently used approach to remove data bottlenecks and speed up performance is caching. While caching – storing recently used data so that it can be quickly accessed as needed – can improve scalability, there are some limitations. Initially caching provides applications with quicker access to required data and makes sure that the data is in the right place and form. As you continue to scale, however, you may reach the maximum amount of memory available in the cache.

Data grids have more advanced capabilities than a typical distributed cache. These capabilities contribute to improvements in speed, reliability, and consistency.

• Advanced querying. Data grids provide extensive querying capabilities including the ability to search by values using Map Reduce. This allows for continuous querying as needed in high volume transactional systems such as for financial trading or ecommerce. The continuous querying capability means that your computations will always remain up to date.
• Keep data in sync. Data grids support the ability to replicate data across geographically separated data centers and keep the data in sync.
• Support for streaming data. Parallel processing allows a data grid to be an effective platform for stream processing to analyze trending events and fast querying.
• Elastic caching. As your data size increases you can scale out without the need for reprogramming.
• Transaction capabilities. A data grid can increase the number of transactions an organization can handle by storing a discreet set of data for fast access by multiple applications.
Improving Application Scalability with In-Memory Data Grids중에서

In-Memory Database와 Distributed Cache 기술이 융합하여 등장한 기술이 IMDG인 듯 합니다. 물론 직접적인 관계는 없지만.

in_memory_data_grid

3.
그러면 IMDB와 IMDG는 어떤 관계일까요? 아마도 Database와 Data Grid의 차이가 아닐까 합니다. Redhat은 이렇게 설명합니다.

Traditionally… Store everything in a DB!
Modern requirements DBs not particularly good at horizontal scaling
One size doesn’t fill all! DBs are not bad, but they’re not the solution to every problem

IMDG회사인 Gridgain은 이렇게 설명합니다.

Speed Only Vs. Speed + Scalability
One of the crucial differences between In-Memory Data Grids and In-Memory Databases lies in the ability to scale to hundreds and thousands of servers. That is the In-Memory Data Grid’s inherent capability for such scale due to their MPP architecture, and the In-Memory Database’s explicit inability to scale due to fact that SQL joins, in general, cannot be efficiently performed in a distribution context.
It’s one of the dirty secrets of In-Memory Databases: one of their most useful features, SQL joins, is also is their Achilles heel when it comes to scalability. This is the fundamental reason why most existing SQL databases (disk or memory based) are based on vertically scalable SMP (Symmetrical Processing) architecture unlike In-Memory Data Grids that utilize the much more horizontally scalable MPP approach.
It’s important to note that both In-Memory Data Grids and In-Memory Database can achieve similar speed in a local non-distributed context. In the end – they both do all processing in memory.
But only In-Memory Data Grids can natively scale to hundreds and thousands of nodes providing unprecedented scalability and unrivaled throughput.
In-Memory Database Vs. In-Memory Data Grid: Revisited중에서

이런 배경으로 Gridgain은 아래와 같이 IMDB와 IMDG을 비교합니다. In-Memory Computing을 도입할 때 AS-IS와 요구에 따라 다른 선택이 가능해 보입니다.

 

gridgain

마지막으로 이상의 내용을 기술적인 관점에서 더 자세히 알고자 하면 아래 글을 읽어보세요. 2012년 글이지만 공이 들어간 글입니다. 그런데 Data Virtualization은 또 무엇인지. 세상이 너무 빨리 바뀌네요.(^^)

이제 필요한 것은 In Memory Data Grid

Leave a Comment

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다

이 사이트는 스팸을 줄이는 아키스밋을 사용합니다. 댓글이 어떻게 처리되는지 알아보십시오.