NoSQL and Data Scalability 2.0
NoSQL and Data Scalability 2.0Keeping Up With the World of Non-Relational DatabasesProvides an introduction to basic NoSQL and Data Scalability terminology and techniques and exhibits in-depth examples of popular NoSQL technologies. IntroductionThis Refcard provides an introduction to basic NoSQL and Data Scalability terminology and techniques and exhibits in-depth examples of popular NoSQL technologies,including architectures,common uses,& more. NoSQL and Data Scalability 2.0 demystifies the latest techniques in high-volume data storage,search,and management by explaining how they work and when to apply them.
Section 2
Scalable Data ArchitecturesScalable data architectures have evolved to improve overall system efficiency and reduce operational costs. Specific NoSQL databases may have different topological requirements,but the general architecture is the same.
In general,NoSQL architectures offer:
Cloud readiness describes the database being used as a service and the ability to deploy the storage grid and cluster manager to a cloud provider.
Section 3
NoSQLNoSQL describes a horizontally scalable,non-relational database with built-in replication support. Applications interact with the database through a simple API,and the data is stored in a schema-free repository as large files or data blocks. The repository is often a custom file system designed to support NoSQL operations with high replication. NoSQL Databases ClassificationDatabase TypesUsesDocumentDocuments,semi-structured dataColumnRead/write raw time series dataGraphNamed entities,semantic queries,associative data setsKey-ValueKey-value pair,where the values can be complex and mixed data structures (e.g. a document)Multi-modelTwo or more database types,including relational databases and the types listed above,with a common database manager for all While all database types are in common use,document stores are most often associated with NoSQL systems due to their pervasiveness in web and mobile content handling applications. Is NoSQL For You?Does your app design...
If you checked off four or more items from the list,then NoSQL is a good fit for you. Always On just means that users will have access to complete app functionality at all times. In mobile app and gaming contexts,it can mean access to data that is "a bit behind" the effective system state (i.e. eventual consistency is acceptable). NoSQL Performance and TCO ComparisonTotal cost of ownership (TCO) depends on functionality and complexity. A higher TCO may be acceptable when performance (throughput or scalability) is a primary concern.
Document and key-value stores are most popular because of their ease of use,flexibility,and applicability across many problem domains—at a reasonable TCO. Tip: Graph databases are excellent replacements for complex relational models because relationships between entities (or graph edges) are more efficient and better suited for high-performance applications than using explicit joins and foreign-keys. Which Data Store Model to Use?The flowchart in Figure 3 describes how to choose the most appropriate database or store for the application.
Section 4
Cloud DatabasesDemand-based scaling is an attractive proposition for running NoSQL systems on the cloud; it maximizes the advantages of running the application on cloud-based providers like AWS,Azure,or Google Cloud Computing.
Tip: Billing overruns are very easy when using a Database-as-a-service. Engage a usage/cost monitoring system to help manage expenses and to avoid nasty surprises.
Section 5
Very High-Volume Data StoresMany applications require the storage of very large binary data sets. Traditional data stores can't handle them because their size makes it impractical. Enter the High-Volume Data Store (HVDS). Most very high-volume data applications are used in scientific or financial problem domains. These applications rely on dedicated,optimized binary data formats that allow quick access,manipulation,and data format description within a single scope. High-Volume Data Characteristics
HVDS Characteristics
Rule of Thumb: High-Volume Data Stores handle very few I/O operations,but each consists of very large amounts of data; NoSQL handles lots of I/O operations on small amounts of data. Data is stored in HVDS prior to initial processing,where the HVDS API provides more efficient access than the file system or a database system. The intermediate or final results move to NoSQL or relational stores for end-user reporting and manipulation. HVDS maps onto local files,like Hadoop's HDFS,allowing volumes to move between files systems (e.g. HFS+ to NTFS) without issues. Most HVDS originates in scientific research organizations,and portability is a primary design concern. HVDS Workflow
Section 6
Document Database: Couchbase ServerCouchbase Server is a document-based database that bridges the gaps between scalable key-value stores,relational database querying,and robustness capabilities. Its characteristics include:
Couchbase Server provides datacenter consistency and partition tolerance. The database is based on the independent scaling and replication model shown in Figure 5. Data is handled across 3 different service zones: indexing,querying,and data. The service zones have different scalability requirements according to their function. The Couchbase Server software and general documentation is athttp://docs.couchbase.com/admin/admin/Couchbase-intro.html. Each database service node can replicate data to its peer,and each cluster can replicate to other clusters. Couchbase Server provides facilities for cross datacenter replication (XDCR),simplifying disaster recovery,high availability,and data locality scenarios. In-Memory Cache == Higher PerformanceCouchbase Server performs very well during writes because it uses a memory-first mechanism. Data is written to the in- memory cache with a fast response to the caller. Couchbase Server asynchronously replicates the data to other nodes or clusters,updates the indices,and persists the data to disk. Database clients may override any of these operations to make them synchronous. A read request is guaranteed to always get the most recent result at the time of the beginning of the request. Document FormatCouchbase Server handles JSON documents. Being a schemaless database,any valid document can be committed to the database. Couchbase Server assigns two additional attributes to each document upon creation for tracking the document’s unique ID (_id) and revision number (_rev). These attributes are required for all operations other than creation. A typical document and its cross-language representation could be: {
"type" : "Person",
"name" : "Tom",
"age" : 42
}
Dynamic languages offer a closer object mapping to JSON than compiled languages. Tip: "type" is just a JSON attribute in this example. It's good practice to define a document type to simplify queries,but it isn't required. ViewsViews are the primary query and reporting tool in Couchbase Server. A view is just a JavaScript function that maps view keys to values. Views are stored on the server and used when needed. They are only updated upon request (query or report),not upon document creation or updates. For example,in a database that contains Person and Animal objects,a view for listing all the instances of “Person” could be: function (d) { // d ::= document
if (d.type == "Person")
emit(d.name,{ d.name,d.age });
}
The output will be something like this: { "total_rows": 1,"offset": 0,"rows":
[ { "id": "6921","key": "Tom",
"value": {
"name": "Tom",
"age": 42 } } ] }
View operations are defined in terms of MapReduce techniques. Stream-Based ViewsCouchbase Server also introduced stream-based views based on the Data Change Protocol (DCP). A stream-based view submits the query to the managed cache. The managed cache asynchronously updates the disk queue,the actual disk,or replicates the query to another node. View queries may include the stale data freshness flag with one of these settings:
A configurable,automatic process updates the indices at configurable intervals based on whether changes within a threshold have occurred. Spatial ViewsCouchbase Server enables the creation of multi-dimensional spatial indices containing geometry data that can express information based on geometries within a multidimensional range. Some examples include:
The spatial views reference covering geospatial and arbitrary data collections information is available from http://docs. couchbase.com/4.0/admin/Views/spatial-views.html N1QL – A SQL-Like Language for DocumentsWhile views are powerful,they are somewhat cumbersome to manage. Couchbase Server introduced N1QL (pronounced “nickel”) to ease integration with legacy reporting systems and to assist programmers in unleashing more efficient,maintainable,and robust queries. Its main features include:
The N1QL example in Figure 6 shows the language’s flexibility in dealing with schemaless documents: Download the full Refcard for more info,and check out a Couchbase Server N1QL quick reference at http://query.couchbase.com Couchbase Server Common Applications
Couchbase Server Drawbacks
Section 7
Graph Database: Neo4jNeo4j is an embeddable database with transactional capabilities that stores data in graphs. Entities are stored as graph nodes,and relationships between nodes are stored as edges connecting them. Its main features include:
The Neo4j downloads and documentation are available from:http://neo4j.com . Figure 7 shows how Neo4j may be embedded in a JVM-based application,where Neo4j exposes a set of Java packages to make direct calls to the database.
The Neo4j stand-alone configuration in Figure 8 exposes a RESTful API and runs on dedicated servers to optimize memory usage since objects and indices are memory-mapped.
Tip: Neo4j is available under commercial and open-source licenses. Commercial options include a high availability cluster configuration. CachingNeo4j excels at content delivery and query speed because it offers two different caches:
Both caches have a number of configuration options; consult the Neo4j web documentation for details. Core: Property GraphThe property graph is made up of nodes,relationships,and properties. A graph database manages all the storage and searching aspects of property graph traversal.
Relationships connect nodes in a directed graph with a start and an end node:
Source: Graph Databases,Robinson,Webber,& Eifrem,O'Reilly,2014 Querying Neo4j With CypherCypher is a declarative query language specific to Neo4j that describes database operation patterns. It's loosely based on SQL,though it features ASCII text constructs to represent patterns and directionality.
Cypher enables users to describe what to create,update,delete,or select from the graph without requiring an explicit description of how to do it. It describes the nodes,attributes,and relationships in the property graph.
Check out the DZone Refcard Querying Graphs with Neo4j for more details on writing queries with Cypher,available athttp://refcardz.dzone.com/refcardz/querying-graphs-neo4j. Cypher and Java
Common Use Cases
Neo4j Drawbacks
Section 8
Staying CurrentDo you want to know about specific projects and use cases where NoSQL and data scalability are the hot topics? Follow the author's data science and scalability feed:http://twitter.com/ciurana
Section 9
PUBLICATIONSBy Eugene Ciurana
From:https://dzone.com/refcardz/nosql-and-data-scalability-20 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |