The MySQL relational database management system (RDBMS) supports a wide range of data types. The data type that you chose for your table columns is important because it determines how MySQL will store your data. It’s possible that any one of multiple data types could be used to store a piece of data, but which one is the best or optimal data type to use?
I would define optimal as taking the least amount of storage space and/or being the fastest in terms of search or data retrieval. The MySQL manual says:
“For optimum storage, you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. Of the types that represent all the required values, this type uses the least amount of storage.”
MySQL Data Types
MySQL supports a number of data types in several categories: numeric types, date and time types, and string (character) types. Data types have been created to serve a specific purpose. Some are optimal for storing small amounts of text, others large amounts of text, others dates, others numbers etc… When creating your tables it is best practice to understand what type of data each column will be storing and choose the appropriate data type. This is a large and complex area so I aim to give a brief overview with some useful examples and tips.
Example of Incorrect Data Type Usage
The TEXT data type is designed to store string types and has a maximum length of 65,535 bytes (roughly equivalent to the same number of characters). Therefore it’s intended for storing relatively large strings. The VARCHAR data type stores variable-length strings with a range of 0 to 255 before (MySQL 5.0.3). Therefore this is intended for storing relatively small strings. The storage required for TEXT columns is the length of the data plus 2 bytes, whereas the VARCHAR columns require the length of the data plus one byte. It’s not much of a saving but when you have a table with millions of records every little helps!
When storing a number you could theoretically use TEXT or VARCHAR data types and you’d still be able to add, edit, delete and search your data as you would expect. However, it’s much better to store the number in one of MySQL’s numeric data types. Storing a number using the optimal data type means that it uses less storage space and your table can be searched faster.
Examples and Best Practices
Numbers should usually be stored using one of MySQL’s numeric data types. The numeric data types that you’ll probably use most often are TINYINT, SMALLINT, MEDIUMINT, INT and DECIMAL.
The first four numeric data types are used to store integers (whole numbers including negative numbers) with varying minimum and maximum values. For example, TINYINT columns can store integers ranging from -128 to 127 and INT can store integers ranging from -2147483648 to 2147483647. The storage requirement for TINYINT columns is 1 byte and for INT columns it is 4 bytes. (SMALLINT and MEDIUMINT cover the numbers in between these values.) Therefore if you know that a column will hold integers ranging from 1 to 99 you should use the TINYINT data type, as it requires less storage.
Unsign Your Integers!
You can extend the numeric range of these numeric data types by giving them the UNSIGNED attribute. This means that negative values are not permitted and allows TINYINT columns to store integers ranging from 0 to 255 and INT columns to store integers ranging from 0 to 4294967295. This means that you can use a data type with a smaller storage requirement for a larger number of negative values are not needed.
Storing Money in MySQL
Or more accurately, storing monetary values! Whilst you could, or course, store monetary values as strings (e.g. VARCHAR), the optimal data type is DECIMAL. This numeric data type is used to store exact numeric data values (i.e. numbers with a decimal point). The MySQL manual explains that DECIMAL is “used to store values for which it is important to preserve exact precision, for example with monetary data“.
When declaring a DECIMAL column the precision and scale are specified, for example DECIMAL(5,2). MySQL explains that:
“In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. If the scale is 0, DECIMAL and NUMERIC values contain no decimal point or fractional part.“
Strings (characters, text etc…) can also be stored using a number of different data types. The main types that you’ll come across are CHAR, VARCHAR and TEXT. As we saw previously, the TEXT data type is used to store large strings (65,535 bytes) so you’d usually use this to store larger amounts of text.
So what about storing short strings? Should you use CHAR or VARCHAR? Both data types can hold between 0 and 255 bytes (VARCHAR can hold up to 65,535 bytes as of MySQL 5.0.3) so how to you choose? Each data type has its advantages and disadvantages.
The length of a CHAR column is fixed to the length that you declare when you create the table, and stored values are right-padded with spaces to the specified length. So if you define a column as CHAR(255) even if you only insert a few characters the storage requirement will be 255 bytes. However, CHAR columns also have a speed advantage. Because they are fixed-length, MySQL can search CHAR columns faster than variable-length columns. However, a caveat is that the whole table must be fixed-length in order to benefit from the increased performance. If you have any other VARCHAR or TEXT columns in the table then the rows become variable-length and there is no speed increase.
Values in VARCHAR columns are variable-length strings. In contrast to CHAR, VARCHAR values are stored as a one-byte length prefix plus data. So as with the example above, if you define a column as VARCHAR(255) and only insert 5 characters the storage requirement will only be 6 bytes. Therefore if your strings will be of variable length, the VARCHAR data type will require less storage.
CHAR vs. VARCHAR
If your table will only contain fixed-length columns then use CHAR to take advantage of MySQL’s faster searching. However, if any of your columns will be variable length then use VARCHAR for its smaller storage requirement.
Whilst you could store a date as a text string, MySQL has more optimal ways of storing dates!
One perfectly valid way to store dates would be as a Unix timestamp. A Unix timestamp is a 10-digit representation of the number of seconds since the Unix epoch of 1st January 1970. This could be stored as a CHAR column, requiring 10 bytes storage, but the most optimal data type is the numeric INT(UNSIGNED) type, requiring just 4 bytes storage.
Unix timestamps are particularly useful if you’re using PHP, as the date() function will display them in your chosen format and you can calculate dates by adding or subtracting a number of seconds (86400 is equivalent to 24 hours etc…).
One drawback of using Unix timestamps is that they’re not human readable straight from the database. Would you know what date 1262986313 refers to? Another is that due to the 10-digit length, Unix timestamps may have problems at 3:14:07AM on 19th January 2038 when the value 9999999999 is reached!
Until recently I used Unix timestamps to store dates in MySQL, but I’ve now started using the DATETIME data type. This is designed specifically to store both the date and time as 2010-01-08 15:00:00 (for example).
One disadvantage is that this requires 8 bytes storage – double that of a Unix timestamp. But I believe that the advantages of using DATETIME rather than Unix timestamps outweighs this disadvantage.
Firstly dates stored using DATETIME are human-readable straight out of the database. So if you’re browsing your data with phpMyAdmin you can see what your dates actually mean!
Secondly MySQL has a whole host of built-in functions to work with native DATETIME formats. You can add and subtract dates, calculate the difference between dates, output the date in your preferred format and much more using MySQL alone and without involving additional PHP (for example) scripting.
And finally, if you really want to output your DATETIME column as a Unix timestamp, MySQL has a built-in function to do that, too!
Note that MySQL can also store the year alone, the date alone and a number of related types.
ENUM columns provide an efficient way to define columns that can contain only a given set of values. So if a column will only ever contain ‘var 1’, ‘var 2’ or ‘var 3’ (for example), you should use the ENUM data type. Due to the way that MySQL stores the data it is much more efficient than storing the data as a string.
An example of when this is useful is setting a switch to specify whether a user account is active or disabled. Instead of storing ‘active’ or ‘disabled’ as VARCHAR values, use ENUM instead.
Although data can often be stored in various data type columns, there’s usually one data type that is optimal for your data. Using that optimal data type will bring benefits in terms of storage and possibly speed, and is best practice for any website developer.