class: inverse, center, middle # NoSQL No Problems? .footnote.pull-right[github/[tanelih](https://github.com/tanelih)] --- class: center, middle # What is NoSQL? Let's examine some of the defining features of both **Relational** and **NoSQL** databases. --- .left-column[ # What is NoSQL? ## Relational Databases ] .right-column[ Traditional databases like MySQL and PostgreSQL are considered **relational** in the way they model data. ] -- count: false .right-column[ **SQL** is the language used to query and update the database. ```sql SELECT * FROM users WHERE id = 1; ``` ] -- count: false .right-column[ Strong **transaction** model ensures the data is correct and up to date. Transactions that go wrong can be rolled back in **isolation**, without affecting other transactions. Databases, which implement these properties can be considered **ACID**. ] -- count: false .right-column.footnote[ **ACID** stands for **Atomicity**, **Consistency**, **Isolation** and **Durability**. ] ??? **Atomicity** means that transactions are "all or nothing", meaning that if one part of the transaction fails, it will leave the database unchanged. **Consistency** means that transactions will bring the database from one state to another, including the effects set by constraints, cascades and triggers. **Isolation** means that transactions will "appear" to execute serially, so that the state of the database will also transact from one valid state to another "serially". **Durability** simply means that the data is durable, once a transaction is complete, the resulting state is stored in *non-volatile memory*. --- .left-column[ # What is NoSQL? ## Relational Databases ## NoSQL Databases ] .right-column[ Top notch **scalability** and **performance**. Weak data model ensures you can change your **schema** on the fly without breaking anything. ] ??? Changing the schema when you're already running a production application can however introduce changes where you must run something akin to migrations. -- count: false .right-column[ Performance tends to come with a cost... ] -- count: false .right-column[ Different NoSQL databases define their own way of accessing the data, there isn't really a standard like **SQL**. ] ??? Most however offer some form of a HTTP-based API to interact with the data, exception being MongoDB where connections are simple TCP connections. -- count: false .right-column[ Lack of **joins** and **collapses** quickly leads to having to implement these in your application code or using a third party library. ```javascript users.remove({ id: user.id }, function(err, user) { books.remove({ author: user.id }, function(err, book) { notes.remove({ book: book.id }, function(err, note) { // ... you can see where this is going })})}) ``` ] -- count: false .right-column.center[ ![](http://emojipedia-us.s3.amazonaws.com/cache/7e/d0/7ed0401e63825aa8be667a293776b498.png) ] --- class: center, middle # Categories of NoSQL --- .left-column[ ## Key-value ] .right-column[ Probably the simplest form of data storage. You can assign values to keys, that's it. It really is that simple. ```bash > SET juukelis "puukelis" OK > GET juukelis "puukelis" ``` This simplicity leads to extreme ease of scaling the system: where the traditional SQL based systems scale vertically, the NoSQL systems tend to scale horizontally. Check out [Redis](http://redis.io). ] ??? Horizontal scaling means that we can scale by adding several cheap servers, to run the system in "scale", whereas vertical scaling means to add more power to a single machine. --- .left-column[ ## Key-value ## Document ] .right-column[ Most document oriented databases store the data in the **JSON** format, with some using the **Binary JSON (BSON)**. This means that it's quite easy to visualize the data. ```json { "users": [ { "id": 12, "name": "Andrew Hunt", "books": [ { "title": "The Pragmatic Programmer", "authors": [ 12, 37 ] } ] } ] } ``` Being a derivative of the key-value database, it inherits the ease of scalability, while retaining the more advanced features. Check out [MongoDB](https://www.mongodb.org). ] --- .left-column[ ## Key-value ## Document ## Graph ] .right-column[ ### Graph Graph databases are basically a set of nodes, with n-amount of connections in between them. Even complex queries are highly performant since these connections are stored as first class citizens. You can often find graph databases in use when dealing with highly connected data. For example, social and logistic data is usually modeled as a graph. Check out [Neo4J](http://neo4j.com). ] --- .left-column[ ## Key-value ## Document ## Graph ## Column Oriented ] .right-column[ ### Column Oriented Column oriented databases process data by the **column**. This means increased performance for queries, which only touch a handful of columns. If you need to run operations, which touch all of the stored documents but only a specific column in them, this might be your choice. Check out [Cassandra](http://cassandra.apache.org). ] --- class: inverse, middle, center # Relational data in the land of NoSQL [ case contriboard ] --- .left-column[ #### contriboard ] .right-column[ Let's define some of the basic entities in our application through some simple use cases. ] -- count: false .right-column[ - **User** creates a **Board** - **User** creates a **Ticket** on the **Board** - **User** moves a **Ticket** around on the **Board** ] -- count: false .right-column[ Given the above, our data might look like this: ```json { "users": [{ "id": 10, "boards": [{ "id": 1, "tickets": [{ "id": 20, "position": { "x": 0, "y": 0 } }] }] }] } ``` ] --- .left-column[ #### contriboard ] .right-column[ Let's introduce another use case. ] -- count: false .right-column[ - **User** invites another **User** to the board. ] -- count: false .right-column[ ```json { "users": [ { "id": 10 }, { "id": 11 } ], "boards": [{ "id": 1, "users": [ 10, 11 ], "tickets": [{ "id": 20, "position": { "x": 0, "y": 0 } }] }] } ``` ] -- count: false .right-column[ Uh oh. We've introduced the concept of references to our database. ] --- .left-column[ #### contriboard ] .right-column[ - **User** wants to have the **Ticket** exist in another **Board** at the same time. ] -- count: false .right-column[ ```json { "users": [ { "id": 10 }, { "id": 11 } ], "boards": [{ "id": 1, "users": [ 10, 11 ], "tickets": [ 20 ] }, { "id": 1, "users": [ 10 ], "tickets": [ 20 ] }], "tickets": [{ "id": 20, "position": { "x": 0, "y": 0 } }] } ``` ] -- count: false .right-column[ That's a lot of **relations** for a **non-relational** database... ] --- .left-column[ #### contriboard ] .right-column[ Let's fetch some data and populate any references. ] -- count: false .right-column[ ```javascript collection('boards').find({}, (err, boards) => { var userIDs = boards.map(board => board.users) .flatten().unique() var ticketIDs = boards.map(board => board.tickets) .flatten().unique() collection('users').find({id: {$in: userIDs}}, (err, users) => { collection('tickets').find({id: {$in: ticketIDs}}, (err, tickets) => { var populated = boards.map(board => { board.users = board.users .map(user => users.find(user)) board.tickets = board.tickets .map(ticket => tickets.find(ticket)) return board }) // finally !! }) }) }) ``` ] -- count: false .right-column[ Send help. ] --- .left-column[ #### contriboard ] .right-column[ Let's use a popular library for MongoDB, [Mongoose](http://mongoosejs.com). ] -- count: false .right-column[ ```javascript boards.find({}).populate('users', 'tickets') .exec((err, boards) => { // done !! }) ``` ] -- count: false .right-column[ There is a reason for the popularity of third party libraries such as Mongoose. ] -- count: false .right-column[ Interestingly Mongoose also allows us to define **schemas** for our data. ```javascript const UserSchema = Mongoose.Schema({ id: Mongoose.Schema.Types.ObjectID, name: { type: String, maxlength: 8 }, books: [{ ref: 'book', type: Mongoose.Schema.Types.ObjectID }] }) ``` ] --- .left-column[ #### contriboard ] .right-column[ Let's delete a ticket. ```javascript collection('tickets').remove({ id: 20 }, function(err) { // ... }) ``` ] -- count: false .right-column[ Easy, right? What's our database looking like? ] -- count: false .right-column[ ```json { "users": [ { "id": 10 }, { "id": 11 } ], "boards": [{ "id": 1, "users": [ 10, 11 ], "tickets": [ 20 ] }, { "id": 1, "users": [ 10 ], "tickets": [ 20 ] }], "tickets": [ ] } ``` ] ??? Dangling references, all around... --- class: center, middle ![](http://emojipedia-us.s3.amazonaws.com/cache/7e/d0/7ed0401e63825aa8be667a293776b498.png) --- .left-column[ #### contriboard ] .right-column[ Let's delete a ticket, and any dangling references. ```javascript collection('tickets').remove({ id: 20 }, function(err) { if(err) { throw err } var query = { tickets: 20 } var update = { $pull: { tickets: 20 } } collection('boards').update(query, update, function(err) { if(err) { throw err } // ... }) }) ``` Image if we had referenced that ticket in multiple other collections. ] --- # Right Tool for the Right Job -- count: false Visualize your application's data and operations. Think about what sort of a database fits your use case the best. -- count: false > Note that if you're still in the MVP stage, where you don't exactly know > what sort of forms your data will be taking, consider NoSQL for minimal > amount of time spent configuring things. -- count: false Your application might require multiple databases, for example an instance of PostgreSQL for the data that is queried and operated upon often, something like a MongoDB for statistics and Redis for session storage. -- count: false Remember there are no silver bullets... -- count: false but there are ways of making things seem more sane. --- class: inverse, middle, center # Questions? --- class: middle, center # An Exercise --- Connect to your MongoDB instance via the `mongo` shell. Replace the address with the address of your running `mongod` instance. ```bash $ mongo 127.0.0.1:27017/nosql-no-problems ``` ??? If the MongoDB instance requires authentication, here's how we do it: ```javascript > use admin > db.auth(admin_user, admin_pwd) > use nosql-no-worries > db.createUser({ user: 'huojo', pwd: 'huojo', roles: [{ role: 'readWrite', db: 'nosql-no-worries' }] }) ``` -- count: false Now that you have an open shell, check that you're using the `nosql-no-problems` database by running the following command. ```bash > db nosql-no-problems ``` -- count: false Notice how we didn't actually create the database, instead they're created on the fly as they're needed. Try changing to another database by running the following, and then back to the `nosql-no-problems` database to which we initially connected to. ```bash > use mysql-my-problems switched to db mysql-my-problems > use nosql-no-problems switched to db nosql-no-problems ``` --- Run the following command: ```bash > show dbs ``` Notice how you won't see the databases we used just now. This is because we didn't add any content yet, so let's do just that. -- count: false Let's add a user to the `users` collection. You can run a `help` command in order to find out how. ```javascript > help > db.help() > db.collection.help() ``` -- count: false ```javascript > db.users.insert({ nick: 'huojo' }) WriteResult({ "nInserted" : 1 }) ``` Once again we didn't need to create anything in order to insert data into a collection. --- Fetching data is fairly straightforward. ```javascript > db.users.find() { "_id" : ObjectId("55fbea124db1df100bb36a68"), "nick" : "huojo" } ``` -- count: false Notice how MongoDB uses BSON **ObjectIds** internally. They're basically the equivalent to SQL database's primary keys. They have a few interesting properties. ```javascript > ObjectId().valueOf() 55fbf09e4db1df100bb36a6a > ObjectId().getTimestamp() ISODate("2015-09-18T11:08:38Z") ``` -- count: false Let's query by the `_id` attribute. ```javascript > db.users.find({ _id: "55fbea124db1df100bb36a68" }) > ``` -- count: false The `_id` attribute is an **ObjectId**. ```javascript > db.users.find({ _id: ObjectId("55fbea124db1df100bb36a68") }) { "_id" : ObjectId("55fbea124db1df100bb36a68"), "nick" : "huojo" } ``` --- Let's update our user. ```javascript > db.users.update({ _id: ObjectId("55fbea124db1df100bb36a68") }, { firstName: "Jouni", lastName: "Huotari" }) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.users.find({ _id: ObjectId("55fbea124db1df100bb36a68") }).pretty() { "_id" : ObjectId("55fbea124db1df100bb36a68"), "firstName" : "Jouni", "lastName" : "Huotari" } ``` -- count: false Looks like we overwrote our user. In order to just update the user, we can use the `$set` operator. ```javascript > db.users.update({ _id: ObjectId("55fbea124db1df100bb36a68") }, { $set: { nick: 'huojo' } }) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.users.find({ _id: ObjectId("55fbea124db1df100bb36a68") }).pretty() { "_id" : ObjectId("55fbea124db1df100bb36a68"), "firstName" : "Jouni", "lastName" : "Huotari", "nick" : "huojo" } ``` --- Finally we'll remove the user. ```javascript > db.remove({ _id: ObjectId("55fbea124db1df100bb36a68") }) WriteResult({ "nRemoved" : 1 }) > db.users.find().count() 0 ``` -- count: false As you've probably guessed by now, the `mongo` shell is actually a JavaScript shell. This means we can do stuff like... -- count: false Pluck attributes from a collection. ```javascript > db.users.find().map(function(user) { return user._id }) [ ObjectId("55fbfaa74db1df100bb36a6c") ] ``` Ghetto population. ```javascript > db.users.find().map(function(user) { user.books = db.books.find({ author: user._id }).toArray() return user }) [{ "_id" : ObjectId("55fbfaa74db1df100bb36a6c"), "nick" : "huojo", "books" : [{ "_id" : ObjectId("55fbfbe04db1df100bb36a6d"), "name" : "The Good Book", "author" : ObjectId("55fbfaa74db1df100bb36a6c") }] }] ``` --- class: inverse, middle ## References and Reading Material - PostgreSQL - http://www.postgresql.org - Redis - http://redis.io - Cassandra - http://cassandra.apache.org - MongoDB - https://www.mongodb.org - Neo4j - http://neo4j.com MongoDB Use Cases - http://docs.mongodb.org/ecosystem/use-cases/ Why you should never use MongoDB - Sarah Mei - http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/