MySQL :: MySQL 5.1 with Maria Reference Manual :: 10.5 Data Type Storage Requirements

MySQL 5.1 with Maria Reference Manual :: 10 Data Types :: 10.5 Data Type Storage Requirements

10.6 Choosing the Right Type for a Column »

Section Navigation [Toggle]

10 Data Types
10.1 Data Type Overview
10.2 Numeric Types
10.3 Date and Time Types
10.4 String Types
10.5 Data Type Storage Requirements
10.6 Choosing the Right Type for a Column
10.7 Using Data Types from Other Database Engines

10.5. Data Type Storage Requirements

The storage requirements for each of the data types supported by MySQL are listed here by category.

The maximum size of a row in a MyISAM table is 65,535 bytes. (However, each BLOB or TEXT column contributes only 9–12 bytes toward this size.) This limitation may be shared by other storage engines as well. See Chapter 13, Storage Engines, for more information.

Important

For tables using the NDBCLUSTER storage engine, there is the factor of 4-byte alignment to be taken into account when calculating storage requirements. This means that all NDB data storage is done in multiples of 4 bytes. Thus, a column value that would take 15 bytes in a table using a storage engine other than NDB requires 16 bytes in an NDB table. This requirement applies in addition to any other considerations that are discussed in this section. For example, in NDBCLUSTER tables, the TINYINT, SMALLINT, MEDIUMINT, and INTEGER (INT) column types each require 4 bytes storage per record due to the alignment factor.

An exception to this rule is the BIT type, which is not 4-byte aligned. In MySQL Cluster tables, a BIT(M) column takes M bits of storage space. However, if a table definition contains 1 or more BIT columns (up to 32 BIT columns), then NDBCLUSTER reserves 4 bytes (32 bits) per row for these. If a table definition contains more than 32 BIT columns (up to 64 such columns), then NDBCLUSTER reserves 8 bytes (that is, 64 bits) per row.

In addition, while a NULL itself does not require any storage space, NDBCLUSTER reserves 4 bytes per row if the table definition contains any columns defined as NULL, up to 32 NULL columns. (If a MySQL Cluster table is defined with more than 32 NULL columns up to 64 NULL columns, then 8 bytes per row is reserved.)

When calculating storage requirements for MySQL Cluster tables, you must also remember that every table using the NDBCLUSTER storage engine requires a primary key; if no primary key is defined by the user, then a “hidden” primary key will be created by NDB. This hidden primary key consumes 31-35 bytes per table record.

You may find the ndb_size.pl utility to be useful for estimating NDB storage requirements. This Perl script connects to a current MySQL (non-Cluster) database and creates a report on how much space that database would require if it used the NDBCLUSTER storage engine. See ndb_size.pl, for more information.

Storage Requirements for Numeric Types

Data Type	Storage Required
`TINYINT`	1 byte
`SMALLINT`	2 bytes
`MEDIUMINT`	3 bytes
`INT`, `INTEGER`	4 bytes
`BIGINT`	8 bytes
`FLOAT(p)`	4 bytes if 0 <= `p` <= 24, 8 bytes if 25 <= `p` <= 53
`FLOAT`	4 bytes
`DOUBLE [PRECISION]`, `REAL`	8 bytes
`DECIMAL(M,D)`, `NUMERIC(M,D)`	Varies; see following discussion
`BIT(M)`	approximately (`M`+7)/8 bytes

Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table.

Leftover Digits	Number of Bytes
0	0
1	1
2	1
3	2
4	2
5	3
6	3
7	4
8	4

Storage Requirements for Date and Time Types

Data Type	Storage Required
`DATE`	3 bytes
`TIME`	3 bytes
`DATETIME`	8 bytes
`TIMESTAMP`	4 bytes
`YEAR`	1 byte

The storage requirements shown in the table arise from the way that MySQL represents temporal values:

DATE: A three-byte integer packed as DD + MM×32 + YYYY×16×32
TIME: A three-byte integer packed as DD×24×3600 + HH×3600 + MM×60 + SS
DATETIME: Eight bytes:
- A four-byte integer packed as YYYY×10000 + MM×100 + DD
- A four-byte integer packed as HH×10000 + MM×100 + SS
TIMESTAMP: A four-byte integer representing seconds UTC since the epoch ('1970-01-01 00:00:00' UTC)
YEAR: A one-byte integer

Storage Requirements for String Types

In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.

Data Type	Storage Required
`CHAR(M)`	`M` × `w` bytes, 0 `<= M <=` 255, where `w` is the number of bytes required for the maximum-length character in the character set
`BINARY(M)`	`M` bytes, 0 `<= M <=` 255
`VARCHAR(M)`, `VARBINARY(M)`	`L` + 1 bytes if column values require 0 – 255 bytes, `L` + 2 bytes if values may require more than 255 bytes
`TINYBLOB`, `TINYTEXT`	`L` + 1 bytes, where `L` < 2⁸
`BLOB`, `TEXT`	`L` + 2 bytes, where `L` < 2¹⁶
`MEDIUMBLOB`, `MEDIUMTEXT`	`L` + 3 bytes, where `L` < 2²⁴
`LONGBLOB`, `LONGTEXT`	`L` + 4 bytes, where `L` < 2³²
`ENUM('value1','value2',...)`	1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
`SET('value1','value2',...)`	1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)

Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value.

To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character. For a breakdown of the storage used for different categories of utf8 characters, see Section 9.1.10, “Unicode Support”.

VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors:

The actual length of the column value
The column's maximum possible length
The character set used for the column, because some character sets contain multi-byte characters

For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes).

Note

The effective maximum number of bytes that can be stored in a VARCHAR or VARBINARY column is subject to the maximum row size of 65,535 bytes, which is shared among all columns. For a VARCHAR column that stores multi-byte characters, the effective maximum number of characters is less. For example, utf8 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared to be a maximum of 21,844 characters.

The NDBCLUSTER storage engine in MySQL 5.1 supports variable-width columns. This means that a VARCHAR column in a MySQL Cluster table requires the same amount of storage as it would using any other storage engine, with the exception that such values are 4-byte aligned. Thus, the string 'abcd' stored in a VARCHAR(50) column using the latin1 character set requires 8 bytes (rather than 6 bytes for the same column value in a MyISAM table). This represents a change in behavior from earlier versions of NDBCLUSTER, where a VARCHAR(50) column would require 52 bytes storage per record regardless of the length of the string being stored.

TEXT and BLOB columns are implemented differently in the NDB Cluster storage engine, wherein each row in a TEXT column is made up of two separate parts. One of these is of fixed size (256 bytes), and is actually stored in the original table. The other consists of any data in excess of 256 bytes, which is stored in a hidden table. The rows in this second table are always 2,000 bytes long. This means that the size of a TEXT column is 256 if size <= 256 (where size represents the size of the row); otherwise, the size is 256 + size + (2000 – (size – 256) % 2000).

The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values. See Section 10.4.4, “The ENUM Type”.

The size of a SET object is determined by the number of different set members. If the set size is N, the object occupies (N+7)/8 bytes, rounded up to 1, 2, 3, 4, or 8 bytes. A SET can have a maximum of 64 members. See Section 10.4.5, “The SET Type”.

Previous / Next / Up / Table of Contents

User Comments

Posted by Marc Bouffard on August 4 2004 1:41pm

[Delete] [Edit]

Had a lot of trouble finding the maximum table size in bytes for capacity planning. More specifically it was InnoDB tables that I had a problem with. Average row size is good, but I wanted maximum row size.

I checked several products and could not find what I wanted. Some of the tables I deal with are 300+ fields and so manual calculation was not practical.

So I wrote a little perl script that does it. Thought it might be of some use, so I include it here...it does all field types except enum/set types. It does not calculate anything regarding index size.

Just do a mysqldump -d (just the schema) of your DB to a file, and run this perl script specifying the schema file as the only argument.
----------------------------------------------------------------
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;

my %DataType =
("TINYINT"=>1,
"SMALLINT"=>2,
"MEDIUMINT"=>3,
"INT"=>4,
"BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3,
"DATETIME"=>8,
"TIMESTAMP"=>4,
"TIME"=>3,
"YEAR"=>1,
"CHAR"=>'$M',
"VARCHAR"=>'$M+1',
"TINYBLOB"=>'$M+1',
"TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2',
"TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3',
"MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4',
"LONGTEXT"=>'$M+4');

my $D;
my $M;
my $dt;

my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>)
{
chomp;
if ( $_ =~ s/create table[ ]*([a-zA-Z_]*).*/$1/i )
{
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*$*[0-9,]*$*|SMALLINT[ ]*$*[0-9,]*$*|MEDIUMINT[ ]*$*[0-9,]*$*|INT[ ]*$*[0-9,]*$*|BIGINT[ ]*$*[0-9,]*$*|FLOAT[ ]*$*[0-9,]*$*|DOUBLE[ ]*$*[0-9,]*$*|DECIMAL[ ]*$*[0-9,]*$*|NUMERIC[ ]*$*[0-9,]*$*|DATE[ ]*$*[0-9,]*$*|DATETIME[ ]*$*[0-9,]*$*|TIMESTAMP[ ]*$*[0-9,]*$*|TIME[ ]*$*[0-9,]*$*|YEAR[ ]*$*[0-9,]*$*|CHAR[ ]*$*[0-9,]*$*|VARCHAR[ ]*$*[0-9,]*$*|TINYBLOB[ ]*$*[0-9,]*$*|TINYTEXT[ ]*$*[0-9,]*$*|BLOB[ ]*$*[0-9,]*$*|TEXT[ ]*$*[0-9,]*$*|MEDIUMBLOB[ ]*$*[0-9,]*$*|MEDIUMTEXT[ ]*$*[0-9,]*$*|LONGBLOB[ ]*$*[0-9,]*$*|LONGTEXT[ ]*$*[0-9,]*$*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";

Posted by Rich Tomasso on April 5 2005 4:30pm

[Delete] [Edit]

Here's a modification of Marc's script above that also handles ENUM's. Enjoy.

#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;

my %DataType =
("TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3,
"INT"=>4, "BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "TIME"=>3, "YEAR"=>1,
"CHAR"=>'$M', "VARCHAR"=>'$M+1',
"ENUM"=>1,
"TINYBLOB"=>'$M+1', "TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2', "TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3', "MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4', "LONGTEXT"=>'$M+4');

my ($D, $M, $dt);

my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;

open (TABLEFILE,"< $ARGV[0]");

LOGPARSE:while (<TABLEFILE>) {
chomp;
if ( $_ =~ s/create table[ ]`*([a-zA-Z_]*).*`/$1/i ) {
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*$*[0-9,]*$*|SMALLINT[ ]*$*[0-9,]*$*|MEDIUMINT[ ]*$*[0-9,]*$*|INT[ ]*$*[0-9,]*$*|BIGINT[ ]*$*[0-9,]*$*|FLOAT[ ]*$*[0-9,]*$*|DOUBLE[ ]*$*[0-9,]*$*|DECIMAL[ ]*$*[0-9,]*$*|NUMERIC[ ]*$*[0-9,]*$*|DATE[ ]*$*[0-9,]*$*|DATETIME[ ]*$*[0-9,]*$*|TIMESTAMP[ ]*$*[0-9,]*$*|TIME[ ]*$*[0-9,]*$*|YEAR[ ]*$*[0-9,]*$*|CHAR[ ]*$*[0-9,]*$*|VARCHAR[ ]*$*[0-9,]*$*|TINYBLOB[ ]*$*[0-9,]*$*|TINYTEXT[ ]*$*[0-9,]*$*|ENUM[ ]*$*['A-Za-z_,]*$*|BLOB[ ]*$*[0-9,]*$*|TEXT[ ]*$*[0-9,]*$*|MEDIUMBLOB[ ]*$*[0-9,]*$*|MEDIUMTEXT[ ]*$*[0-9,]*$*|LONGBLOB[ ]*$*[0-9,]*$*|LONGTEXT[ ]*$*[0-9,]*$*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/$.*$//g;
$dt =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";

Posted by Alex Gheorghiu on January 30 2009 9:11am

[Delete] [Edit]

The above scripts are not taking into account several important information (so they are outdated)

1. the database/table encoding.
If you have an UTF8 encoding for a varchar(100) that it will take up 300 bytes (3 bytes per UTF symbol)
"[...]As of MySQL 4.1, to calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character."

2. enum can have either 1 or 2 bytes
"[...]The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values."

Posted by Rafał Michalski on February 19 2009 10:07pm

[Delete] [Edit]

Here I wrote another script based on Marc's, that takes into account what Alex wrote and more.
It calculates VARCHAR/CHAR/TEXT taking CHARSET or COLLATION into account, calculates properly SET and ENUM size, DECIMAL/NUMERIC is calculated according to >5.0.3 packed standard.
Calculates also least row byte size for dynamic row length tables.
It uses "mysql" and "mysqldump" tools internally.
Any argument to this script is provided as an argument for mysqldump.
Example: {scriptname} --all-databases
Please report any bug, especially when it comes to size calculations. Enjoy.

----------- copy here --------------
#!/usr/bin/perl
use strict;
$| = 1;

my %DataType = (
"TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3, "INT"=>4, "INTEGER"=>4, "BIGINT"=>8,
"FLOAT"=>'$M<=24?4:8', "DOUBLE"=>8,
"DECIMAL"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"NUMERIC"=>'int(($M-$D)/9)*4+int(((($M-$D)%9)+1)/2)+int($D/9)*4+int((($D%9)+1)/2)',
"BIT"=>'($M+7)>>3',
"DATE"=>3, "TIME"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "YEAR"=>1,
"BINARY"=>'$M',"CHAR"=>'$M*$CL',
"VARBINARY"=>'$M+($M>255?2:1)', "VARCHAR"=>'$M*$CL+($M>255?2:1)',
"ENUM"=>'$M>255?2:1', "SET"=>'($M+7)>>3',
"TINYBLOB"=>9, "TINYTEXT"=>9,
"BLOB"=>10, "TEXT"=>10,
"MEDIUMBLOB"=>11, "MEDIUMTEXT"=>11,
"LONGBLOB"=>12, "LONGTEXT"=>12
);

my %DataTypeMin = (
"VARBINARY"=>'($M>255?2:1)', "VARCHAR"=>'($M>255?2:1)'
);

my ($D, $M, $S, $C, $L, $dt, $dp ,$bc, $CL);
my $fieldCount = 0;
my $byteCount = 0;
my $byteCountMin = 0;
my @fields = ();
my $fieldName;
my $tableName;
my $defaultDbCL = 1;
my $defaultTableCL = 1;
my %charsetMaxLen;
my %collationMaxLen;

open (CHARSETS, "mysql -B --skip-column-names information_schema -e 'select CHARACTER_SET_NAME,MAXLEN from CHARACTER_SETS;' |");
%charsetMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <CHARSETS>);
close CHARSETS;

open (COLLATIONS, "mysql -B --skip-column-names information_schema -e 'select COLLATION_NAME,MAXLEN from CHARACTER_SETS INNER JOIN COLLATIONS USING(CHARACTER_SET_NAME);' |");
%collationMaxLen = map ( ( /^(\w+)/ => /(\d+)$/ ), <COLLATIONS>);
close COLLATIONS;

open (TABLEINFO, "mysqldump -d --compact ".join(" ",@ARGV)." |");

while (<TABLEINFO>) {
chomp;
if ( ($S,$C) = /create database.*?`([^`]+)`.*default\scharacter\sset\s+(\w+)/i ) {
$defaultDbCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : 1;
print "Database: $S".($C?" DEFAULT":"").($C?" CHARSET $C":"")." (bytes per char: $defaultDbCL)\n\n";
next;
}
if ( /^create table\s+`([^`]+)`.*/i ) {
$tableName = $1;
@fields = ();
next;
}
if ( $tableName && (($C,$L) = /^\)(?:.*?default\scharset=(\w+))?(?:.*?collate=(\w+))?/i) ) {
$defaultTableCL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultDbCL);
print "Table: $tableName".($C||$L?" DEFAULT":"").($C?" CHARSET $C":"").($L?" COLLATION $L":"")." (bytes per char: $defaultTableCL)\n";
$tableName = "";
$fieldCount = 0;
$byteCount = 0;
$byteCountMin = 0;
while ($_ = shift @fields) {
if ( ($fieldName,$dt,$dp,$M,$D,$S,$C,$L) = /\s\s`([^`]+)`\s+([a-z]+)($(\d+)(?:,(\d+))?$|$(.*)$)?(?:.*?character\sset\s+(\w+))?(?:.*?collate\s+(\w+))?/i ) {
$dt = uc $dt;
if (exists $DataType{$dt}) {
if (length $S) {
$M = ($S =~ s/(\'.*?\'(?!\')(?=,|$))/$1/g);
$dp = "($M : $S)"
}
$D = 0 if !$D;
$CL = exists $charsetMaxLen{$C} ? $charsetMaxLen{$C} : (exists $collationMaxLen{$L} ? $collationMaxLen{$L} : $defaultTableCL);
$bc = eval($DataType{$dt});
$byteCount += $bc;
$byteCountMin += exists $DataTypeMin{$dt} ? $DataTypeMin{$dt} : $bc;
} else {
$bc = "??";
}
$fieldName.="\t" if length($fieldName) < 8;
print "bytes:\t".$bc."\t$fieldName\t$dt$dp".($C?" $C":"").($L?" COLL $L":"")."\n";
++$fieldCount;
}
}
print "total:\t$byteCount".($byteCountMin!=$byteCount?"\tleast: $byteCountMin":"\t\t")."\tcolumns: $fieldCount\n\n";
next;
}
push @fields, $_;
}
close TABLEINFO;

Add your own comment.

Top / Previous / Next / Up / Table of Contents