Top 5 Considerations When Evaluating NoSQL Databases: #2 the Query Model

logos-nosqlQuery Model

Each application has its own query requirements. In some cases, it may be acceptable to have a very basic query model in which the application only accesses records based on a primary key. For most applications, however, it is important to have the ability to query based on several different values in each record. For instance, an application that stores data about customers may need to look up not only specific customers, but also specific companies, or customers by a certain size, or aggregations of customer sales value by zip code or state.

It is also common for applications to update records, including one or more individual fields. To satisfy these requirements, the database needs to be able to query data based on secondary indexes. In these cases, a document database will often be the most appropriate solution.

Document Database

Document databases provide the ability to query on any field within a document. Some products, such as MongoDB, provide a rich set of indexing options to optimize a wide variety of queries, including text indexes, geospatial indexes, compound indexes, sparse indexes, time to live (TTL) indexes, unique indexes, and others. Furthermore, some of these products provide the ability to analyze data in place, without it needing to be replicated to dedicated analytics or search engines. MongoDB, for instance, provides both the Aggregation Framework for providing real-time analytics (along the lines of the SQL GROUP BY functionality), and a native MapReduce implementation for other types of sophisticated analyses. To update data, MongoDB provides a find and modify method so that values in documents can be updated in a single statement  to the database, rather than making multiple round trips.

Graph Database

These systems tend to provide rich query models where simple and complex relationships can be interrogated to make direct and indirect inferences about the data in the system. Relationship-type analysis tends to be very efficient in these systems, whereas other types of analysis may be less optimal. As a result, graph databases are rarely used for more general purpose operational applications.

To try and tame the complexity that would come from using a multitude of storage technologies, the industry is moving towards the concept of “multi model” databases. Such designs are based on the premise of presenting multiple data models within the same platform, thereby serving diverse application requirements. For example, MongoDB 3.4 introduces graph computing natively within the database, enabling efficient traversals across graphs, trees, and hierarchical data to uncover patterns and surface previously unidentified connections.

Key-Value and Wide Column Databases

These systems provide the ability to retrieve and update data based only on a single or a limited range of primary keys. For querying on other values, users are encouraged to build and maintain their own indexes. Some products provide limited support for secondary indexes, but with several caveats. To perform an update in these systems, multiple round trips may be necessary: first find the record, then update it, then update the index. In these systems, the update may be implemented as a complete rewrite of the entire record irrespective of whether a single attribute has changed, or the entire record.


TAKEAWAYS:

  • The biggest difference between non-relational databases lies in the ability to query data efficiently.
  • Document databases provide the richest query functionality, which allows them to address a wide variety of operational and real-time analytics applications.
  • Key-value stores and wide column stores provide a single means of accessing data: by primary key. This can be fast, but they offer very limited query functionality and may impose additional development costs and application-level requirements to support anything more than  basic query patterns.