Data transformation is often simple like splitting a full name into the first and last, or vice versa. However, sometimes it is more complicated, much more complicated. I once wrote a data converter which had to convert data stored in a file with the size of data store being a certain number of bytes into different data format which used different sized bytes for the data types.
Whether it is simple or complex, you need to verify that the process is correct, and doesn’t fail any edge cases. For example, you might have to convert a series of names into two fields, one for the first name, and then the last name. However, when you look closer, you find that some of the full names also include middle names, others have nicknames, others have a suffix like Jr., or some combination. And while these edge cases may make up less than 2% of your total names, if you just split on a space, you will have a large number of faulty data transforms.
While most transformations are dealing with proprietary data, others can use built in data conversions – for example converting decimal data to hexadecimal and vice versa or converting data from Big Endian to Little Endian.
Note: Big and Little Endian is used to describe how the order of bytes for data is stored.
Roman Numeral Conversion
Since the data transformations often used are either proprietary or common and existing libraries exist, let’s look at another option. One for converting arabic decimal (base 10) numbers to Roman Numerals.
Knowing that I = 1, IV = 4, V = 5, IX = 9, X = 10, etc, can you build a data conversion tool to convert standard arabic numbers like we use, into Roman Numerals.
The answer is actually a little more confusing than what one might expect. Historians have said that there are some discrepancies as to how they showed some numbers. For example, 499 could be written as ID (one less than 500), but also CDIC (500-100 + 100 – 1 => 400 + 99), or even CDXCIX (500 – 100 + 100 – 10 + 10 – 1 => 400 + 90 + 9).
However, with the right formula, you can easily start to calculate how the numbers should be converted. In most languages this could be done with a series of if statements and/or switches, and then breaking the numbers down into their place components to covert the standard form into Roman Numerals.
To do that, you would take the number given, and look at the thousand’s place, and calculate that value, then look at the hundred’s place, etc.
Where, if you wanted to convert Roman Numerals into standard form, you would need to read in the values and do a conversion. This would mean reading in the string, and then looking at each character. If the number to the left is the same or larger, you could add the value. If it was smaller, then you would need to adjust your value down. At each delimiter, (eg V, X, L, C, D, or M) you add the items to the right and then add it to the running total.
For example, if you have VII, you could right from left to right, I then I then V. Since each character is the same or greater value, you add those together. First the two I’s, then the 5 for the V would be added to the 2 since it was a different delimiter for the numbering system.
Data Transformations was originally found on Access 2 Learn