DynamoDB for blogging
Let me cover one of the drawbacks of using AWS Free Tier for blogging. Specifically, I want to talk about the DynamoDB database.
What is DynamoDB
Dynamo is a NoSQL database that is highly available, durable, and scalable. It allows key-value or document data storage.
When you create a table in DynamoDB, you should specify the hash (primary) key and the sort key it will use for data access.
You can create a secondary index using a different hash key and sort key. This will effectively create a second table with the same data copied over but with other keys for extra searching/fetching of the data.
What’s the point?
Since the database is NoSQL, how data is organized differs from what people are used to with SQL databases and Normal Forms.
Why is this important? Because the data schema depends on the data access patterns. It is not enough to store data in any form and then try to query it. It won’t just work this way. So, to store the data correctly, you need to know how you’ll get it back.
General example access patterns for a blog are:
- Get a single blog post
- Get a list of blog posts on a specific page
- List tags ever used in the blog
- List blog posts by a specific tag
How data is accessed
So, when you query the data, DynamoDB can query the data by either using a hash key or running a table scan.
Getting a piece of data by hash key is quick and nice. In terms of blogging, fetching post info using some key works perfectly well.
But obviously, the table scan is slow. Especially if the table is big and you must apply some filtering along the way.
Sorting also does not work during the scan. For example, you can still get all your data back, but it will not be sorted by the sort key (there is no `ORDER BY` like in SQL, remember?), like you would expect. The sort key only works within items with the same hash key. Yes, the hash key can be non-unique if you use a composite primary key.
Pagination? Well, it is there. When you get a scan or query result, if there is more - you get the last processed ID in the response. To get a second page, you feed it with the subsequent request to DynamoDB, and it will pick up where it left off the last time and effectively return the second page. Well, that is nice, but this only allows you to do Next-Previous pagination, which is inconvenient for the end user. This type of pagination requires additional development effort to remember each of the keys (if a user wants to go to Previous several times in a row).
Anything else?
Well, yes. Like in any NoSQL database, we don’t have joins, foreign keys, etc. So once you introduce tags (blog post comments are quite easy, by the way), you are in trouble. You can still save tags, that is for sure. But to do any analytics or filtering - this is where it does not work well. For example, you want tags by popularity (post number). Or when a user clicks on a tag, the user expects a search by that tag to happen, right? That is a problem.
What if you made a typo or want to rename a tag and have it applied everywhere? This all adds up quite a lot.
I didn’t have it all when I created this post, so these are my thoughts on how I’ll manage this kind of stuff one day.
Recap
Accessing a single post by a key is a great workflow.
Rendering the posts page is a bit more complex with sorting and pagination (page number). But you’re OK until you have less than 500-1000 posts to manage. After that, there may be performance issues.
Do you need any advanced features? Comments are OK and easy. Tagging is OK as well, but with some additional details around it. Do you want some searching? You'd better look for another solution or use a Search Engine to do the job for you.