Finally we are getting to the point where we are calling into Azure Document Db.
In the last post we built the application so that it was talking to the Web API which stored the invoices as in memory objects.
So Azure Document Db?
Right then so let’s talk about Azure Document Db. I found the following resources really useful when learning about Azure Document Db:
- Microsoft Virtual Academy – Azure Document Db https://mva.microsoft.com/en-us/training-courses/developing-solutions-with-azure-documentdb-10554?l=nqc8Zy97_1404984382
- Azure Document Db Getting Started Guide – https://azure.microsoft.com/en-gb/documentation/articles/documentdb-get-started/
- Azure Document Db Performance Tips – https://azure.microsoft.com/en-gb/blog/performance-tips-for-azure-documentdb-part-1-2/
Azure Document Db Architecture
The diagram below shows the architecture of the Azure Document Db, we have an Azure document db account. Within the account there are databases and within a database are collections which contain the JSON documents inside.
Azure DocumentDb Account
This is the guide that I used for creating my Azure Document Db account.
The approach is to login through the Azure Portal:
- Click on DocumentDB Accounts
- Click Add
- Provide an ID
- Resource Group
- Location (suggest choosing the nearest data centre)
- Click Create
In a few moments your Document Db will be created!
Install Azure Document Db Nuget Package
Before I could get coding I needed to add support for DocumentDb to my MVC project.
This was achieved by adding the Azure Document Db NuGet Package: Microsoft.Azure.DocumentDb
Once installed, this added all the appropriate references and assemblies to my project.
Connecting to the Azure Document Db Database with DocumentClient
The DocumentClient is the gateway into the Azure Document Db, it provides a way for us to connect to the Azure Doument Db instance.
When creating the DocumentClient object, it requires the following parameters.
- Uri – represents the endpoint for the Azure Document Db
- SecureString – represents the key to access the document db system.
- ConnectionPolicy – this allows you to decide how to connect to Azure Document Db. I used the ConnectionPolicy.Default option however there are gains in performance by using a Direct Connection Mode. More information can be found here: https://azure.microsoft.com/en-gb/blog/performance-tips-for-azure-documentdb-part-1-2/
- ConsistencyLevel – this parameter provides details on how the database works and how data become consistent. The Azure Document Db is a distributed database and therefore there is a compromise between availability, performance and data consistency. There are four different types of consistency available, I generally use session consistency. To find out more about the different consistency levels and how they work. https://azure.microsoft.com/en-gb/documentation/articles/documentdb-consistency-levels/
Anyway, lets create the Azure DocumentDb Database.
The Azure Document Db Database is stored in an Azure Document Db account and it is a storage container which contains Collections and Users.
The way that we create a database is use a method on the DocumentClient called CreateDatabaseAsync.
This function is called with an Id parameter which provides the name of the database.
Once the database has been created then subsequent calls to access the database use the DocumentClient method ReadDatabaseAsync().
This function requires a Uri which is created using the UriFactory.CreateDatabaseUrl() function by passing in the name of the document id.
Now that we have the database, we need to create the collection where our documents will be stored within the database.
The last piece of the puzzle when it comes to Azure Document DB are the DocumentDB Collections. These are containers which store JSON documents.
When we are creating a DocumentDB Collection we need to think about a couple of things:
- IndexingPolicy – this provides information about the type of data that is being stored and therefore the type of indexing that is possible with this data. Also the IndexingPolicy provides a choice around how consistent the data is. We will go into this in more detail later on in the article.
- RequestOptions – these options are used to denote how much throughput the DocumentDb Collection can support. This will also have an effect on the cost of the DocumentDb Collection. There are a number of different options each with a maximum number of Requests Per Second. Currently, I am using the smallest amount of 400RPS
- Data Structure – if we need serious performance we may need to shard our data so that we can get the required request per second. Each DocumentDb Collection has a restriction on the maximum throughput. If that throughput is exceeded then the call will be rejected with an HTTP Status of 429. In the response there is a header added x-ms-retry-after-ms. This provides details of the number of milliseconds that should be waited before retrying the call into the DocumentDb Collection. Therefore, this will need to be factored into the design of the DocumentDB if the maximum of 5000RPS is going to be a bottleneck.
Anyway, with all these we need to describe the process of creating a DocumentDb Collection. The code base is simlar in structure to the creation of the DocumentDb Database.
Indexing Policies are really interesting and will have an impact on a number of areas. These include performance and storage costs.
The type of index will denote the type of actions that can be performed on data:
- hash index – provides the ability to equality
- range index – provides equality and other filters such as more than or less than
Another capability of an Index Policy is to configure the latency when a change is made to the document collection. Either the write to change the document is waited for (sync) or it is not waited on (async)
- Consistent – the write process is synchronous and is waited for before the update is completed
- Lazy – the write process is asynchronous and the is not waited for before control is retuned back to the process that performed the update
- None – no index is associated and with this then document can only be accessed via their id
There is a lot more to index policies and more information on indexing policies can be found here:
The following section discusses how we bring all of the above together and implement our Azure Document Db. As I write this, I need to say that this was my first attempt and it will need some refactoring. We will pick that up in a subsequent post as it will be good to explain that process!
So the first step was to create the classes to connect to the Azure Document DB Account.
A folder called DataAccess was created in the Invoice WebApi project. There were three classes created:
- DatabaseConfig – used to load the configuration used to connect to the Azure Document DB account
- InvoiceDatabase – this is used to create and connect to the Azure Document Db Database
- InvoiceDocumentCollection – this is used to create and connect to the Azure Document Collection within the Database
- InvoiceDataContext – this wraps all the Azure Document DB entities into one class. This includes the InvoiceDatabase and InvoiceDocumentCollection so that they are initialised and created.
All the three Invoice Database classes follow the same pattern. That is they implement a singleton pattern which is accessed via a GetCurrent() function. This function returns the singleton instance of the object wrapped up by a Task<> object.
This approach was used to implement a suitable pattern for use with asynchronous calls. All the Azure Document Db functions such as ReadDatabaseAsync are as the name suggest asynchronous calls.
With my initial implementation I had issues with blocking of the async call. The thread never got the chance to return the result back from the async method and the thread hung. I had no idea what was going on and thought it was some issue with Azure DocumentDb.
I found the following articles useful in helping me to understand what was happening and why these deadlocks were occurring:
- Stephen Cleary: Don’t block async calls – http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html
- Stephen Cleary: MSDN Article – All about the Synchronization Context – https://msdn.microsoft.com/en-us/magazine/gg598924.aspx
The key takeaway that I got from these articles is if you are going to call into ASync functions then try and make everything asynchronous!
Anyway, now it is time to talk about each of the implementations.
Database Config Class
This class provides the mechanism to load in the Azure Document Db configuration used by the application.
I will be looking at how we can move the storage of the configuration out of the web.config so its centralised and more secure. Plus it will make life easier for keeping my code in GitHub too .
Invoice Database Class
This class provides the implementation used to ensure that the Database is created if it does not already exist. It also ensures that the Azure DocumentClient is created and setup using a singleton pattern. The singleton pattern ensures that the connection to Azure Document Db is created and exists for the lifetime that the WebAPI is loaded.
The implementation of the InvoiceDatabase object is shown below:
As mentioned previously, the pattern for this InvoiceDatabase class is to provide a static method GetCurrent() which will return a Task<InvoiceDatabase> parameter because the calls to the Azure DocumentDB Database layer are asynchronous calls.
The code will attempt to read the database, if this fails then the database will be created.
The InvoiceDocumentCollection class is implemented as shown below:
Again in a similar approach the pattern is to have a static method called GetCurrent() which creates an instance of the object if it does not already exist. The constructor calls into an Initialize() method which ensures that the Document Collection is there, even if it has to be created.
It is within the DocumentCollection that we define the IndexingPolicy for string content and also the number of requests that can be processed at one time. In this code base we are using 400RPS which is the minimum.
The InvoiceDataContext class is implemented as shown below:
This object is the wrapper for all things Azure Document Db in this solution. The code follows the same pattern as before. It has implemented a GetCurrent() function which returns back a Task<InvoiceDataContext>.
The function will get the current database configuration, get the current InvoiceDatabase instance and along with the database, get the current InvoiceDocumentCollection.
Please notice that the Initialise() functions which have no data to return are returning back ask objects with no return data type specified. This is rather than using void which seemed to cause issues with the code locking.
Now that we have covered the core Azure Document Db classes in this project, lets talk about the implementation of the system.
The Invoice Repository has changed to implement the calls in Azure DocumentDb.
Please note: the implementation for the Update function has not been completed yet. This is so that I can show you later an issue which I had when trying to update a document in Azure.
So lets have a look at how the Invoice Repository is now setup. First of all the repository has a couple of new private member variables:
- InvoiceRepository – this is our singleton instance of the repository
- InvoiceDataContext – this is an instance of the object InvoiceDataContext which will be our gateway into the Azure DocumentDb.
- List<InvoiceForm> – this now redundant and we will remove later on
Let’s talk about how we get data. So we have a GetAll() function which returns IOrderedQueryable<InvoiceForm> this allows us to decide when we want the query to execute and provides additional commands via the Azure Document Db Linq provider, Microsoft.Azure.Documents.Linq to decide the data we would like.
The point that we make where we can decide when the query is executed is really important. If we were to get all the data by calling .ToList() on the entire dataset this would cause the Azure Document Db to download all the data from the database, which could be rather slow!
The Document Collection is accessed by creating a System.Uri using the DocumentCollection’s AltLink property to call into the Azure Document Db to create the DocumentQuery object using CreateDocumentQuery as the code as is shown below
All of the other data retrieve function such as Get(string reference) makes use of the GetAll() function as a repeatable function.
The Invoice Controller represents the WebApi for accessing the data through into Azure. The implementation is shown below:
The InvoiceController has been changed to now include a private member field to hold the InvoiceRepository object.
The InvoiceController WebApi calls have now been changed to return Task<> objects so that they can be used in an asynchronous way. The InvoiceController needs to be Initialised() before each call is made into the repository. This is to make sure the DataContext object in the InvoiceRepository is valid and created correctly. When using the API we found in testing that the DataContext object would become invalid. I will be looking into this in more detail to see if there is a reason why this might be happening.
Setting up the web.config
I made some changes to the web.config to incorporate the new settings required. The changes have included the configuration of the Azure Document Db system. These include the following settings:
- Document Db Url
We will be looking to move these out of web.config and into some other configuration container so that we can change these settings without having to redeploy our code. Also it makes this blog series easier as I do not have to keep moving the configuration out of the code base before I upload the source code into GitHub.
The configuration information can be found here:
Primary and Secondary Keys
Document Db Url is found on the Azure Portal, https://portal.azure.com, in the Document Db Account section.
The Microsoft team provide some great tips for improving performance with Azure Document Db.
One of them is related to DocumentClient. They suggest that you create one DocumentClient instance as a singleton. This is then reused throughout the lifetime of your application’s instance.
Azure Portal Tools for Document Db
Some of the tooling is very nice in the Azure Portal and Azure Document Db is one of them.
I found the ability to browse the data via the Azure Portal very useful, to access your DocumentDb data do the following:
- browse to https://portal.azure.com
- login and use the new search feature at the middle top of the page
- type Document
- Select DocumentDB Accounts
- Choose your DocumentDb Account
- Select your database
- Select a Collection
- Click on Document Explorer
- Select a document
- You can now see the JSON object in the viewer
- It is possible to make changes to the data via this window too
There are a number of other features but this should get you started and help you explore the data in the database.
The source code that accompanies this blog post for the Web API project can be found here:
Conclusion and Next Episode
I did have a few problems and by far the biggest one I have already mentioned, is the Async and Await issues. I would definitely recommend reading Stephen Clary’s articles.
The other thing that did take me sometime to get my head around are the different links. Each link is a URI and every object in Azure Document Db has a path to reference it by. It is important that you learn about these. I will talk about these in the next post.
At the time of implementing the Azure Document Db side I did have some problems with updating documents. However, I have talked enough in this post and will take you through those issues and also introduce to Application Insights. I hope that you can join me.