On one end, some languages are “type-less” languages, meaning that they don’t have data types. Examples include COBOL, Python, JavaScript, and more, however, they usually do have them, they just hide the type from us which can dynamically change as the program runs depending upon how the data is being used.
One the other end of the spectrum there are both loose and strictly, statically typed languages (like C/C++, Java) which require you to define the type of variable, and that type cannot change during the execution of the variable nor can it be redefined later on. Loosely types languages often allow you to change a data type as long it is goes into a larger datatype variable. So an int
can be cast into a long
, or a float
into a double
, but not the other way around. Strictly typed languages do not allow you to move between data types.
Regardless of where your language falls in the spectrum, you can be sure that certain types of variables are being used.
How the Computer Sees the Variable
When we create a variable, we do so by defining and declaring a variable. In Python we simply use a variable by giving it a value. In languages like C/C++ or Java, you will need to define a variable with a data type.
# Python Example
x = 10
// C++ or Java
int x = 10;
When you define a variable name, that name is for the developer(s) who will be working with it. To the computer, it cares about the memory location, and it maps the name to that memory location. In C/C++ you will sometimes hear about an lvalue, which is the memory location.
In comparison, the rvalue is the value that is stored at that location.
The computer will also need to know what type of data, so it knows how many bytes to read from memory. Each variable type will have a different amount of memory that is required to store it. All data is numeric in nature, it just may be converted into another usable type.
You can see a simple example of how bits are turned off and on on the page turning bits into bytes. Bits are the smallest form of data for a computer, with 8 bits making up a single byte in most modern computers. As the bits are turned on, they change the value of the byte based upon their location. The more bits used, the larger the number that can be represented. Some numbers used to use a single byte (8 bits), so they were limited to small numbers -128 to +127, where now its not uncommon for them to use 4 or more bytes (32 or more bits).
Primitive
Primitive data types are simple containers for storing data. They do not provide methods to perform tasks, or attributes about them.
Most primitives are numeric, at least in general.
Most people don’t think twice about types of numbers. To them, a number is a number is a number. It doesn’t matter how large a number is, if it is positive or negative. They don’t care if it has a decimal place in it, or is even imaginary.
Computers on the other hand, are very concerned about all of those things. But most languages will break down numbers into several specific types.
Boolean Values
A Boolean value may seem like the simplest value since you only need a single bit to read. There you could put a 0 for false or a 1 for true. However, computers cannot read a single bit, therefore it is often store in one or more bytes depending upon how it is efficient for a computer to store and use it.
In the “old” days, we’d manually combine multiple boolean values into a single variable and use a mask to read them. C had a lot of operators to make this easier, and kept us from using a lot of memory for simple boolean values. However, most modern languages allow you to just use a boolean datatype. This is because memory is relatively cheap and makes coding faster, easier, and less error prone.
In some languages, a value of true might be 1, while false is 0. The C language was known for zero (0) being false, and any other value being true. In other languages you might see 1 and -1. In others, they don’t reference a numeric value, even though if you were to look at the raw bits you could determine one. This was all dependent upon the language and how it was designed.
Characters
Depending upon your language, a single character may take between 1 or 2 bytes. A single byte can use used to display an ASCII character, however, it cannot display many non-English characters. So a second byte might be used to allow for more characters in more languages. There are Unicode characters which will cover over 14,000 characters from different languages.
In both cases, a number is assigned, and then a simple lookup table is used to determine which character should be displayed.
Integers
Integers are whole numbers that may be positive or negative. Generally, one bit will be used to identify if the number is positive or negative, and the remaining bits are used to determine the size of the number.
The maximum size of a number therefore is limited by 2 ^ bits available for storage. Now, this isn’t the number of bits available for overall storage in your computer, but by the size of the data type.
In (we’ll call it old) computer systems, you could have a 1 byte integer. This would allow you a number between -128 and 127. Some languages referred to this data type as a short, or short integer.
However, starting in the late 80/ early 90s, most computers used 2 bytes for integers, which gives you a range of -32768 to 32767.
In both of these cases, it was quite easy to go past the value that the integer could hold. This was sometimes called an overflow error.
To combat against that, and because memory was getting cheaper, the ANSI board defined integers to be four bytes long (ANSI 3.1.2.5). This gives you a range of -2,147,483,648 to 2,147,483,647, something much less likely to be over flowed.
However, you can still overflow the number with too large a value, and you have to be careful of how you work with numbers. For example, you might notice that the number is 10 digits long, just like a US phone number. However, you cannot use an integer to store most phone numbers, as you have to use an area code of 214 or less for signed integers.
Integers tend to be very fast to use because all of the data is easy bit mathematics for addition and subtraction. We also don’t have to worry about the accuracy of the number, since all of the numbers are whole.
That of course causes an issue. What if I want to use a number that is not a mathematical integer, but a mathematical real number. I.e. one that has a decimal place, or fraction as the number. This is where I would use a floating point number.
Floating Point Numbers
A floating point number can be simply described as a number with a decimal place. For example, 2.5, or 3 3/4.
Such numbers are often used in the real world, and therefore a method of representing them is found in the floating point numbers, or floats, as most languages reference them. (I have used a language that called them real
, in reference to the mathematics term.)
Floating point numbers allow for much larger, and smaller, numbers to be used, often while using only four bytes for a single precision floating point number, or eight in the case of what most languages refer to as a double
this is short hand for a double precision floating point number.
However, despite all of their great and common uses, they suffer from one huge issue. They are an approximation. While we won’t get into how a floating point number is calculated or stored, since it is fairly complex, we can say that floating point numbers are not always accurate.
A floating point number is given what’s known as a precision. This means that if you enter 2.5, you can be fairly confident that the computer is accurately enough representing that as 2.5. However, this does mean you might get 2.499999.
Generally, you’re precision will be good for the first several digits. However, the accuracy of the number will go down the further you move away from it’s accuracy.
To get more accurate, you could use the double, or need to find another way of storing the number.
Let’s look at two examples of accuracy in numbers and how, while floats can store larger and smaller numbers than an integer, they cannot solve all of our problems in working with numbers.
For a large number, consider a company attempting to send a craft to Mars. Now Mars is approximately 54.6 million kilometers from Earth, at its closest point, and an average of 225 million km.
If we write out 225,000,000 km, you will notice that the last two digits are not in the significant category. Even if the first set of seven digits was accurate and not estimated, the last two digits would be off, and that could drastically change where you were to land, or possibly even if you enter orbit, or skip off it’s atmosphere.
If you look at a smaller numeric scale, think about the angle which would change to direct one to Mars. Being off by a fraction of a degree could lead to huge changes over the course of a 225M KM distance.
Some languages are designed to work specifically with science applications, numerical calculations, and others. They often have larger sized numbers which allow for more precise precision in dealing with very large, or very small numbers.
But those are special case situations. Most of the time, we are using general purpose languages like C/C++, C#, Java, and Python. We need to understand the limits, and know if they will affect us when we are working on building systems. In most cases, a general purpose language will not cause us undo issues, but we need to understand if/when it will, and know when to look at other options.
Review of Data Types was originally found on Access 2 Learn