How to Choose the Optimal MySQL Data Type

The MySQL relational database management system (RDBMS) supports a wide range of data types. The data type that you chose for your table columns is important because it determines how MySQL will store your data. It’s possible that any one of multiple data types could be used to store a piece of data, but which one is the best or optimal data type to use?

What’s Optimal?

I would define optimal as taking the least amount of storage space and/or being the fastest in terms of search or data retrieval. The MySQL manual says:

“For optimum storage, you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. Of the types that represent all the required values, this type uses the least amount of storage.”

MySQL Data Types

MySQL supports a number of data types in several categories: numeric types, date and time types, and string (character) types. Data types have been created to serve a specific purpose. Some are optimal for storing small amounts of text, others large amounts of text, others dates, others numbers etc… When creating your tables it is best practice to understand what type of data each column will be storing and choose the appropriate data type. This is a large and complex area so I aim to give a brief overview with some useful examples and tips.

Example of Incorrect Data Type Usage

The TEXT data type is designed to store string types and has a maximum length of 65,535 bytes (roughly equivalent to the same number of characters). Therefore it’s intended for storing relatively large strings. The VARCHAR data type stores variable-length strings with a range of 0 to 255 before (MySQL 5.0.3). Therefore this is intended for storing relatively small strings. The storage required for TEXT columns is the length of the data plus 2 bytes, whereas the VARCHAR columns require the length of the data plus one byte. It’s not much of a saving but when you have a table with millions of records every little helps!

When storing a number you could theoretically use TEXT or VARCHAR data types and you’d still be able to add, edit, delete and search your data as you would expect. However, it’s much better to store the number in one of MySQL’s numeric data types. Storing a number using the optimal data type means that it uses less storage space and your table can be searched faster.

Examples and Best Practices

Numbers

Numbers should usually be stored using one of MySQL’s numeric data types. The numeric data types that you’ll probably use most often are TINYINT, SMALLINT, MEDIUMINT, INT and DECIMAL.

The first four numeric data types are used to store integers (whole numbers including negative numbers) with varying minimum and maximum values. For example, TINYINT columns can store integers ranging from -128 to 127 and INT can store integers ranging from -2147483648 to 2147483647. The storage requirement for TINYINT columns is 1 byte and for INT columns it is 4 bytes. (SMALLINT and MEDIUMINT cover the numbers in between these values.) Therefore if you know that a column will hold integers ranging from 1 to 99 you should use the TINYINT data type, as it requires less storage.

Unsign Your Integers!

You can extend the numeric range of these numeric data types by giving them the UNSIGNED attribute. This means that negative values are not permitted and allows TINYINT columns to store integers ranging from 0 to 255 and INT columns to store integers ranging from 0 to 4294967295. This means that you can use a data type with a smaller storage requirement for a larger number of negative values are not needed.

Storing Money in MySQL

Or more accurately, storing monetary values! Whilst you could, or course, store monetary values as strings (e.g. VARCHAR), the optimal data type is DECIMAL. This numeric data type is used to store exact numeric data values (i.e. numbers with a decimal point). The MySQL manual explains that DECIMAL is “used to store values for which it is important to preserve exact precision, for example with monetary data“.

When declaring a DECIMAL column the precision and scale are specified, for example DECIMAL(5,2). MySQL explains that:

In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. If the scale is 0, DECIMAL and NUMERIC values contain no decimal point or fractional part.

Strings

Strings (characters, text etc…) can also be stored using a number of different data types. The main types that you’ll come across are CHAR, VARCHAR and TEXT. As we saw previously, the TEXT data type is used to store large strings (65,535 bytes) so you’d usually use this to store larger amounts of text.

So what about storing short strings? Should you use CHAR or VARCHAR? Both data types can hold between 0 and 255 bytes (VARCHAR can hold up to 65,535 bytes as of MySQL 5.0.3) so how to you choose? Each data type has its advantages and disadvantages.

CHAR

The length of a CHAR column is fixed to the length that you declare when you create the table, and stored values are right-padded with spaces to the specified length. So if you define a column as CHAR(255) even if you only insert a few characters the storage requirement will be 255 bytes. However, CHAR columns also have a speed advantage. Because they are fixed-length, MySQL can search CHAR columns faster than variable-length columns. However, a caveat is that the whole table must be fixed-length in order to benefit from the increased performance. If you have any other VARCHAR or TEXT columns in the table then the rows become variable-length and there is no speed increase.

VARCHAR

Values in VARCHAR columns are variable-length strings. In contrast to CHAR, VARCHAR values are stored as a one-byte length prefix plus data. So as with the example above, if you define a column as VARCHAR(255) and only insert 5 characters the storage requirement will only be 6 bytes. Therefore if your strings will be of variable length, the VARCHAR data type will require less storage.

CHAR vs. VARCHAR

If your table will only contain fixed-length columns then use CHAR to take advantage of MySQL’s faster searching. However, if any of your columns will be variable length then use VARCHAR for its smaller storage requirement.

Dates

Whilst you could store a date as a text string, MySQL has more optimal ways of storing dates!

Unix Timestamp

One perfectly valid way to store dates would be as a Unix timestamp. A Unix timestamp is a 10-digit representation of the number of seconds since the Unix epoch of 1st January 1970. This could be stored as a CHAR column, requiring 10 bytes storage, but the most optimal data type is the numeric INT(UNSIGNED) type, requiring just 4 bytes storage.

Unix timestamps are particularly useful if you’re using PHP, as the date() function will display them in your chosen format and you can calculate dates by adding or subtracting a number of seconds (86400 is equivalent to 24 hours etc…).

One drawback of using Unix timestamps is that they’re not human readable straight from the database. Would you know what date 1262986313 refers to? Another is that due to the 10-digit length, Unix timestamps may have problems at 3:14:07AM on 19th January 2038 when the value 9999999999 is reached!

DATETIME

Until recently I used Unix timestamps to store dates in MySQL, but I’ve now started using the DATETIME data type. This is designed specifically to store both the date and time as 2010-01-08 15:00:00 (for example).

One disadvantage is that this requires 8 bytes storage – double that of a Unix timestamp. But I believe that the advantages of using DATETIME rather than Unix timestamps outweighs this disadvantage.

Firstly dates stored using DATETIME are human-readable straight out of the database. So if you’re browsing your data with phpMyAdmin you can see what your dates actually mean!

Secondly MySQL has a whole host of built-in functions to work with native DATETIME formats. You can add and subtract dates, calculate the difference between dates, output the date in your preferred format and much more using MySQL alone and without involving additional PHP (for example) scripting.

And finally, if you really want to output your DATETIME column as a Unix timestamp, MySQL has a built-in function to do that, too!

Note that MySQL can also store the year alone, the date alone and a number of related types.

Miscellaneous

ENUM

ENUM columns provide an efficient way to define columns that can contain only a given set of values. So if a column will only ever contain ‘var 1’, ‘var 2’ or ‘var 3’ (for example), you should use the ENUM data type. Due to the way that MySQL stores the data it is much more efficient than storing the data as a string.

An example of when this is useful is setting a switch to specify whether a user account is active or disabled. Instead of storing ‘active’ or ‘disabled’ as VARCHAR values, use ENUM instead.

Conclusion

Although data can often be stored in various data type columns, there’s usually one data type that is optimal for your data. Using that optimal data type will bring benefits in terms of storage and possibly speed, and is best practice for any website developer.

Further Reading

9 Comments

  1. sathish

    how to store multi line text in mysql?
    I’m not getting any idea yet… please help!!!

  2. Barnaby Knowles

    Hi Sathish. You can store multi-line text in a column that uses the text data type.

    It may also be possible to store multi-line text in a varchar column if you inserted the appropriate line break characters, but I’ve never tried that.

  3. Deadpan110

    Thanks for providing a nice and simple reference.

    I like to think I am an adept coder, but I always turn to Google for reference – I was looking for the optimal method to store a Unix timestamp and found the answer here.

    I am now wondering if my code will still be in use after the 19th January 2038 and your explanation of MySQL Dates has given me food for thought.

  4. tejashree

    you can use longtext or mediumtext datatype to store large data in mysql

  5. mauricio

    SMALLINT can be between -128 to 127 not 1 99. “if you know that a column will hold integers ranging from 1 to 99 “.

    I have a question:
    If I have a column in a table varchar(255) but It only is used between 5 and 10 characteres, for performance is the same define this column varchar(255) or varchar(10) ? is the same with any both?

    Thanks

  6. Barnaby Knowles

    SMALLINT can be between -128 to 127 not 1 99. “if you know that a column will hold integers ranging from 1 to 99 “.

    That’s correct, 1-99 was just an example of a range of numbers that you might be wanting to store.

    As for VARCHAR(10) vs. VARCHAR(255) I’m not sure. I would expect that a smaller definition would offer better performance, but it’s likely to be negligible unless your table contains a LOT of rows.

    If your table has contains only fixed-length fields you might want to consider using CHAR(10) instead because that will perform better due to it having a fixed length. MySQL will pad out strings less than 10 characters in the background.

  7. Satheeh Kumar
    name         Friend's name
    ---------|------------------------
    John        George
    Jenny      John
    Mat      George
    George    John 
    --------------------------------------------------
    Id      name         Friend's name
    -----|-------------|------------------------
    1     John                      4
    2     Jenny                    1
    3      Mat                       4
    4     George                  1   
    --------------------------------------------------------------

    which is a better and faster to querry structure?

  8. Barnaby Knowles

    Hi Satheeh,

    It really depends on what you are trying to achieve. This looks more like a database normalisation consideration rather than data type.

    If speed is your most important consideration, then the first example would be most suitable because all of the information is in one row. Therefore you only need to perform one SELECT with no JOINs to get the desired data. However, your table will use more space because the “Friend’s Name” column will be a VARCHAR data type rather than some form of numeric type. Also bear in mind that errors can occur when storing the name in two different columns (e.g. user types a name incorrectly, or a name needs changing).

    I speed is not your number one consideration then the second example would be more optimal. The “Friend’s Name” column could be a numeric type, thus using less space. And because that column is a foreign key then errors in the data are reduced. However, your query would be slightly slower because you would have to add a JOIN into your SELECT query.

  9. I use for the Unixtimestamp BIGINT unsigned to make it safe for 19th January 2038 at 3:14:07AM.

    A few years ago, I searched for a better solution, but to make the values smaller by another base, isn’t a good idea, it consumes to much time but less storage, a little bit paradox, heh?

Leave a Reply

Your email address will not be published. Required fields are marked *