As a data analyst professional with interests in databases, I became captivated by the emergence of distributed databases, also known as NoSQL. NoSQL has been around for awhile but not until now did it gain increasing popularity among medium and large enterprises. To help myself gain better understanding of NoSQL, I have read several YouTube tutorial videos as well as database articles on NoSQL and MongoDB, one of the most popular NoSQL databases. A lot of the literature and blogs delve into the technical aspects of NoSQL, but very few provide introductory explanations that beginners will understand. That is exactly the purpose of this blog, to give simple, laymen-term explanations of NoSQL vs. SQL anyone with interest on database can follow.
Below I will share my thoughts on various characteristics of databases and for each I compare SQL with NoSQL.
Partition tolerance: When a network or one of the nodes of a database is down, it is known as network partition. Because most SQL databases use either a local storage or shared storage for all database instances (See picture called “SQL structure” below), the occurrence of a partition will cause the database to be unavailable. If the network is down while a query is being executing, all the partial updates will be undone and rolled back to the time before the query is executed in order to ensure atomicity. Fortunately, many of the relational database vendors are large entities (e.g. Microsoft, IBM, Oracle) and have ample resources to help clients cope with server outage and quickly get back online.
NoSQL, on the other hand, uses multiple data centers across different geographical regions to store data with no shared resources. (See picture called “NoSQL structure” below) So when the server is down, NoSQL database will automatically move to a nearby server that is in working condition to preserve the data’s continuity. Thus users can still perform reads or writes to the database. As business trips become more and more common for data professionals these days, the need to deploy data centers across multiple regions or even nations is magnified. Queries can be executed on the nearest data center, which reduces time lag. Also, the replication of data to various data centers does not require separate software like SQL relational database does. It’s simply built-in!
SQL structure:
NoSQL structure:
Technical Support: SQL is arguably better in this aspect. More established. NoSQL is relatively new. Many of them start-ups.
Analytics: I believe traditional SQL databases is better with respect to analytics. NoSQL databases are geared towards simple data manipulation (e.g. insert, read, update, delete) on web and cloud-based applications. To go beyond that purpose requires higher programming expertise. In addition, a lot of popular BI tools, for example Tableau, do not offer direct connection to NoSQL databases. SQL databases, on the other hand, can connect to BI tools more easily. My previous project as a Data Analyst relies in part on Tableau, and I was able to quickly gather data from the SQL database. Below link shows the ease of connecting Tableau to MySQL database:
https://onlinehelp.tableau.com/current/pro/online/mac/en-us/examples_mysql.html
In addition, a lot of NoSQL developers are still in learning mode to become proficient with these new databases.
Ease to Use: Most SQL databases shared a standard and straight-forward syntax protocol so their coding are very similar, if not identical in many cases. Even rookies can pick up simple SQL procedures easily by reading online tutorials such as W3Schools.com. On the contrary, NoSQL syntaxes do not have a standardized set of query syntax and they tend to require steeper learning curve. Terminologies also differ slightly between the 2 schemes. In SQL databases, a row of data is known as a record and a collection of records is known as a table. In NoSQL databases, a row of data is called document and a combination of documents is called a collection as opposed to table. To create a collection (table) in NoSQL, users do not explicitly write the column names and types in the beginning like SQL databases. Instead they create documents in blocks of codes and within each document block the column names are specified. Let’s compare the syntax of some commonly used procedures on SQL with NoSQL:
To create a table/collection:
MySQL-
- First specify the table name and create the columns:
- Insert records into the columns:
NoSQL (MongoDB)–To create a collection (table) in NoSQL, users do not explicitly write the column names and types in the beginning like SQL databases. Instead they create documents in blocks of codes and within each document block the column names are specified:
Db.inventory.insert ([
{
“stock_ID”: 708926,
“product_name”: “Canon PowerShot M7900”,
“product_type”: “electronics”,
“price”: 699
},
{
“stock_ID”: 708933,
“product_name”: “Microsoft Surface Book”,
“product_type”: “electronics”,
“price”: 1699
}
])
To update records/documents:
MySQL-
To update the price of the second item to $1999, run the following query:
NoSQL (MongoDB):
To do the same procedure, run the following codes:
Db.inventory.update(
{“_id”: ObjectID{“557990er4188aa99ab”},
{“stock_ID”: 708933,
“product_name”: “Microsoft Surface Book”,
“product_type”: “electronics”,
“price”: 1999
} )
As you can see, to update a single entry (price) in MongoDB you have to include the Object ID generated automatically by the database, as well as pasting all other column records from that row, even though you only need to update one value in the price column. The field-value pair characteristic of NoSQL is very similar to JSON concept.
In my humble opinion, NoSQL syntaxes seem more complex and less coherent than SQL syntaxes. My previous 7-month project as a Data Analyst contractor at Google required writing a lot of SQL queries to pull global logistics data on Android devices. My team uses a forked version of MySQL called plx SQL. The syntaxes are similar to most other SQL databases and the loading speed once I executed the query was instantaneous no matter how complicated my queries were. Another downside of NoSQL is that its update queries likely require more time to write than update queries for SQL because the former requires inclusion of the system-generated Object ID as well as values from all columns for the row that contains the value you want to update.
Maintenance: Due to the novelty of NoSQL databases, they require a lot of skills to get installed and connect successfully to a corporate server initially. For the same reason, keen attentions are also required for the maintenance of such databases.
In summary, I think NoSQL has added considerable values to the database sector with a very innovative approach on data storage. However, it’s still relatively new and does have a lot of limitations not present in SQL. There are even databases out there that combines the features of SQL and NoSQL. Nonetheless, they are more of rarity than the norm and one of the most successful Hybrid SQL/NoSQL company (Xeround) has even gone bankrupt. As information technology evolves, I have no doubt NoSQL will become more and more mainstream, and serve as a complement, rather than substitute, to SQL in the foreseeable future.