It was not before I heard of murmurs of Big data in the database glossaries that I heard of NoSQL databases. Almost all the Web 2.0 companies and big guns in the industry are diverting their radar on RDBMS alternatives. NoSQL is an unplanned product of all such researches and explorations. Started for something, did something and found something in the form of NoSQL. I am not in a criticizing tone but I am trying to defend the capabilities of RDBMS, which of course and perhaps would not be overshadowed by the scaling out features of NoSQL.
I thought it would be might of help to the community to pen down my findings on the topic. Its not complete but the findings are still in alive mode.
NoSQL: The Concept
The NoSQL Database is a new infant in the database world. The concept evolved in 2009 as the outcome of brainstorming done in the area of high volume storage. NoSQL provides a database model which doesn’t complies neither with the relational model of database nor with the ACID features of the database language. By name, it appears as a counter of SQL language, but its a myth. Both SQL and NoSQL can coexist in a system and share no relation between them. NoSQL stands for Not Only SQL.
By virtue of its violation of basic database features, NoSQL cannot be referred as the database by soul, but it appears just as a data store or repository where the acmodel is majorly need oriented. Till date, there are more than 160 NoSQL databases available in the market. The major users of NoSQL database model are Facebook, Linkedin, Twitter, Google and Amazon.
NoSQL offers a flexible database model which can be accessed and monitored from middle tier. It has no specific language of its own (unless UnQL comes in). One of the most famous model is key value pair model. Other model can be document centric, graphical, tabular, column oriented and object databases.
The basic idea after the NoSQL evolution is to design a distributed data store with large scale data flow. The WEB2.0 platform discovered new attributes of data access. The web data is not only read only but the readers are also allowed to interact with the web data. Subsequently, the web data generates huge traffic and the data size increases steeply. This exponential growth of industrial data (mostly from social media and search engines) require massive scalability, low latency and data on demand facility in a simplified database model.
The relational database worked well with the information storage philosophies but failed to justify the revolutionary growth of data in the current times. In addition, the relational database system provides a non distributed, vertically scalable, schema oriented and licensed platform for data management activities. NoSQL on the other hand, is an open source, non relational, distributed and horizontally scalable database system which can withstand high volume of mixed-up data with low latency but high availability.
The major accomplishments of NoSQL database are
- Distributed architecture allows the implementation of replication mechanism to ensure consistent and unbreakable data flow
- Horizontally scalable ensures that new server nodes can be added, if required to enhance the performance and efficiency. Note that RDBMS has no such property to scale horizontally.
- Not schema dependent and non relational. Data storage paradigm is flexible as per the developer. In addition, there are no tables, constraints, join or relations to deal with. It completely behaves as a data store.
- Compliance with BASE (Basically Available, Soft state, Eventually consistent). The BASE model talks more on the data availability and replication consistency of master servers. “Basically available” implies that the data must always be available partially and progressively after a transaction. The data consistency in NoSQL is not stringent as in RDBMS i.e. the data remains in soft state. It may or may not be readily available as soon as the transaction gets over. There is a scope of small amount of latency in the availability. Such degree of data consistency is best suited for social media and not finance sector. Thus, the data is eventually consistent but not instantaneous.
- Complies with the Consistency and Partition tolerance of CAP theorem. CAP theorem states that a database model must obey anyone of Consistency, Availability or Partition tolerance. Note that conventional RDBMS compiles with Consistency and Availability.
- Ability to store huge amount of structured or semi structured data and its retrieval
- Not much technical expertise required
- Less maintenance and administration overheads
As a new database in the industry, the NoSQL database received mixed responses. The database users who were loyal to relational model rejected the need oriented approach of this database. On the other hand, the people who were facing capacity issues with RDBMS readily adopted the NoSQL database. It was in 2009 when NoSQL started competing with RDBMS.
Key-Value stores – Based on Amazon’ dynamo.
- Columnar Family stores
- Document database – inspired by Lotus Notes. Mainly for document centric and semi structured data
- Graph database
Here are some pioneer and famous NoSQL databases used by the companies dealing with huge volumes of data.
- Amazon Dynamo (Used in ecommerce applications)
- Amazon SimpleDB (Used in webservices)
- Google BigTable
- Cassandra (Used by Facebook)
I shall be back with more findings as they find me.
- SQL Vs. NoSQL: Which Is Better? (developers.slashdot.org)
- NoSQL or SQL? Do you have to choose? (rackspace.com)
- From 2010: NoSQL (Old Wine, New Bottle) (architects.dzone.com)
- NoSQL Rebels Aim Missile at Larry Ellison’s Yacht (wired.com)
- Why NoSQL Should Be Called “SQL with Alternative Storage Models” (architects.dzone.com)