All About Database Partitioning
Database partitioning is the process of taking a larger table and splitting it into smaller parts or chunks. Most enterprise level database systems including Oracle, MySQL, PostgreSQL, Cassandra, and Microsoft SQL support partitioning. Since smaller sets of information are easier to work with, partitioning will improve query response times and it will potentially make your database easier to work with. Partitions can be stored on different files and hard disk drives, or they may be in a completely different location. If partitioning is implemented correctly, an end user will not even be aware that data has been partitioned in the database backend.
The procedure for creating partitions will vary depending on the database system you are using. In Microsoft SQL Server Management Studio, partitions an be easily created the same way you create new tables and indexes. Other databases will offer similar tools.
Database partitioning is not to be confused with database replication, where an entire database is copied (i.e., master/slave partitioning), or database clustering where a database, instead of an individual table, is spread among computers. And a partition is also different from an index, which simply points to data in an existing table. In fact, each partition will have its own indexes, views and transaction logs.
For the purposes of queries and other database operations, different partitions are often treated like they are one table.
Horizontal vs Vertical Partitioning
There are to ways to partition data, horizontal and vertical. In horizontal partitioning entire rows of data are stored in different tables. In vertical partitioning different fields are split into different tables. Horizontal tables will have the same structure as they track the same fields. Vertical tables have different structures as they track different fields.
Data Sharding and Data Partitioning
Sometimes if data is spread across different computers (or different server instances on the cloud), the process is called “data sharding” instead of data partitioning. Data sharding only works well for horizontal partitioning. Other then the stipulation that data be stored on different computers, sharding and horizontal partitioning are the same.
You may consider sharding if your computer has run out of memory, and you are unable to add more memory. In this case it may make sense to add another database node on a different computer, so you can keep up with your data processing needs. Sharding can also assist with load balancing, where you want to intentionally split database load among different computers or servers.
When to use Database Partitioning?
Partitioning can add complexity to your system and it an also involve significant changes to your database architecture, so it only should be used if you really need it. Before deciding to partition you should try understand exactly where you are experiencing performance bottlenecks.
Having said that, if you are experiencing slow database queries, slow data loads, slow backups, deadlocks and longer running maintenance jobs, partitioning may offer solutions. Also, if you know that your application will require future scaling as your company grows, partitioning will be much easier to setup from the beginning.
What are the Advantages of Partitioning?
Partitioning can provide operational flexibility by allowing you to schedule maintenance functions (like backup and restore) and long run reporting jobs by partition. This distribution can distribute operational load across your system, leading to smoother performance and less bottlenecks.
Adding Data to Partitioned Tables
There are several techniques to adding information in a partitioned database:
- Round Robin – new data is added to partitioned table sequentially. This approach will only work if all tables have available space, but since no data parsing required for table assignment, it can be very quick. Distribution of data will also be even, resulting in “balanced partitioning”.
- List – new data is added based on a top-level identifier and data is placed into a table depending on the value of the identifier.
- Range Insertion - new data is added to a table depending on the value of a field. For example, you could setup a table for each letter of the alphabet. When you need to add new data, data that starts with a particular letter is inserted into a partitioned table. Other examples of range insertion could be a geographic insert, where data is added to a table depending on an address, zip code or location it contains. These approaches only works well if data is evenly distributed in ranges. Otherwise, some tables could end up with a lot more data than others, resulting in “unbalanced partitioning”. And again, this approach will only work if all tables have available space.
- Hash / Expression Based – new data is added based on the value of a mathematical function on data. For example, if the data represents a sale price, the function could be dividing by 10 and the remainder value (i.e. modulus) could be used to assign a table. The advantage of this approach is the expression used to assign the table can be tweaked to ensure equal data distribution. Ideally this tweaking will prevent one table from filling up before others.
- Hybrid Partitioning – this can be a combination of any of the above techniques (round robin, range or expression).
How to Search Database Partitions
Database systems that support partitioning will automatically figure out how to search different partitions based on the query. For best performance you should ensure that the partition key (or an index key) is used by your common queries. Intelligent databases will be able to use this key to determine which partition to search, speeding up performance.
For example, if you always search for names, and your partitions are alphabetic range based, your system will know which partition to search based on the name you are looking for.
Tracker Ten Software
While our Tracker Ten windows desktop database software does not support automatic partitioning, it is a file-based system. This means that you can manually put your data into separate files. And if needed we can custom create utilities that support querying from multiple files at once. Please contact us for details.