Azure CosmosDB Overview

Introduction To Cosmos DB

Cosmos DB is a fully managed, globally distributed NoSQL database for managing data at a large scale.

Cosmos DB features support for a range of different APIs for data querying such as SQL, MongoDB and Cassandra. There are also SDKs for several programming languages such as Java, .NET and Python.

As Cosmos DB is fully managed, you do not have to worry about maintenance tasks such as hardware upgrades, security patching etc and can just focus on how best to manage your data.

Use Cases

Cosmos DB is ideal for high performance applications that need to handle large amounts of data or requires fast response times. Generally speaking, if your application needs to be always online with fast response times in any location globally and to be easily scalable, then Cosmos DB is a suitable option.

Consistency Levels

Cosmos DB has 5 consistency levels: eventual, consistent-prefix, session, bounded-staleness and strong. There are tradeoffs with regards to performance and availability for each level.

The strong consistency level offers the strongest level of consistency but at the expense of availability, latency and throughput. The eventual consistency level offers the weakest consistency, but has higher availability, higher throughput and lower latency. The other consistency levels fall along this spectrum.

Request Units

Request units are a kind of “credit” used for the cost of database operations. If you exceed the number of request units then your requests will get throttled. Every read and write operation will incur a cost in terms of request units.

There are three different modes that you can create your Cosmos DB account in:

Provisioned Throughput Mode

In provisioned throughput mode, you provision the number of request units on a per second basis. The number of RUs is provisioned in increments of 100. You can increase or decrease this number via Azure Portal, or programmatically.

Serverless Mode

In this mode, you don’t need to provision the throughput. At the end of the billing period, you get charged for the number of request units that you have used.

Autoscale Mode:

This mode is suitable for mission critical workloads that have unpredictable usage / traffic. You are able to automatically scale the throughput of the database based upon actual usage.

Partitions and Choosing a Partition Key

In order to meet the performance needs for the application, Cosmos DB makes use of partitioning to scale the containers within a database. Cosmos DB has two kinds of partitions: logical, and physical. A logical partition has a 20GB limit. A physical partition is an internal implementation of Cosmos DB and has a size limit of 50GB. Each physical partition can have many logical partitions. Every item within the same logical partition has the same partition key value.

A partition key consists of two components - the partition key path, and the partition key value. As as example, if we had a record:

{
    "customerId": 1234567890,
    "firstName": "John",
    "lastName": "Doe",
    ...
}

and the partition key customerId, then the partition key path would be “/customerId” and the partition key value would be 1234567890.

Acceptable characters for the partition key path component are the alphanumeric and underscore characters. The partition key value can either be a string or numeric.

Choosing a good partition key will have an impact on the scalability and the performance of your database. A good partition key should distribute throughput consumption evenly across the logical partitions. For example, if you are storing details about customers in your database, you might choose to use the customer ID as a partition key.

APIs Provided By Cosmos DB

SQL API

MongoDB API

Cassandra API

Table API

Gremlin API

Comparisons To AWS DynamoDb