Introduction to XML - Access 2 Learn

When we look at transferring data we have to know how we want to format that data to transfer. A common, simple, file allows for it to easily be written from one system and read by another. But we didn’t always have a simple format that could be used for this purpose.

We’d like to use comma delimited files, databases, or something like that, where we could list the field name as the first row, but it wasn’t necessarily easy to ready, especially with complex data like you would find in an object oriented environment where an object may have other objects as it’s child.

In older days, we would share the format of the file, especially if it was made up of complicated data formats – but that limits who can read it. The person who was going to read it would have to know what the fields were in what order. The developer would have to work with that format, and any changes to the format could cause the data transfer to fail.

A solution to that was the XML file format. It is a self descriptive format, which means that you can read the file and see the fields that make up the data. By reading the file you know what fields are being used by the file. As the file format changes, you can still read the file.

You can simply ignore data fields that you don’t need, but use the new fields as you expand your project.

While it seemed like a perfect solution – the end developer still often needs to know how the data is formatted, how it works. That way they can convert the data into their internal format and be able to use it.The XML format can be used by a system used to convert data, but outside of that, you will need to understand the data.

XML stands for the eXtensible Markup Language – kind of like HTML. The thing to remember is that it is for storing data. It doesn’t, it can’t, do anything as far as running a file, or converting information itself. The data just sits there.

However, there are some differences between HTML and XML. With HTML, the data is designed to be displayed, where XML is to be transported between systems, or used for data storage itself. HTML is made up of predefined tags, and those are all that are supposed to be used, with XML there are no predefined tags.

The first line in an XML file is optional, but should be the following:

<?xml version="1.0" encoding="UTF-8"?>

If it does exist, it must be the very first line. If it doesn’t, you should use UTF-8 for your file encoding. Note however, that some parsers will expect to see this line, and will not read the file if it is missing. Yes, it is optional and they shouldn’t do that, but it does happen.

This is like the doctype tag that starts an HTML file. It lets the program which is reading it, or writing it, know what the format is, before trying to read and parse it.

So what does the data look like? One can look at it like a tree structure with a root node and many child nodes, and some of the children having their own children. The root node is required, and it allows a place to start parsing.

Each node is made up of tags and attributes. The formatting of the file format is fairly specific, and required to be followed to officially be a properly exported XML file which can be read by different systems.

Generally, a file has an external tag, which contains all other tags. For example, a books tag might contain a series of book tags. This general format has a plural tag name, which contains one or more singular tags. This is by no means a requirement – just a practice you might see.

Tags can contain information by either having children tags, data themselves, or using attributes.

Every tag must have a closing tag. If it doesn’t then the closing tag slash is put at the end of the tag.

A tag with attributes must follow a name/value pair set. The name is independent, but the value must be enclosed by quotes.

<books>
    <book>
        <title>This is my Book</title>
        <author>John Smith</author>
        <edition>1</edition>
    </book>
</books>
<books>
    <book title="This is a Book" author="John Smith" edition="1"></book>
</books>
<books>
    <book title="This is a Book" author="John Smith" edition="1" />
</books>

Both of these examples are practically the same. They store the same data, however, they can be read slightly differently.

In the examples you can see that child tags and attributes are related to the parent. Data cannot cross between two tags. Luckily this makes it easy to determine what data goes with what entity.

Because the XML file type is well known, there are a lot of libraries which programming languages can use. Some are built into the language itself, others can be imported into the language, or are available from an internal library.

Different languages will have different libraries and different rules for how to work with them. Many languages might provide two different libraries or sets of classes. This way the developer can choose how to read the file.

Some libraries read the entire file into memory. While this is fine for small files, it can take a lot more memory than reading and parsing as you process it. Especially if your file is very large.

While each library is different, you will often find methods like, getChild(), getNext(), getParent(), getAttribute(), getValue(), etc. These are all used to access different nodes and attributes.

If you know the layout of the file, you may specify which child node or nodes you are interested in by the index. However, if you don’t know the layout, you may search by name.

Searching my an (numeric) index requires that the format remain constant – which isn’t something that can always happen.Consider the following example from the file: https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml

<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
<gesmes:subject>Reference rates</gesmes:subject>
<gesmes:Sender>
  <gesmes:name>European Central Bank</gesmes:name>
</gesmes:Sender>
<Cube>
  <Cube time="2020-04-03">
    <Cube currency="USD" rate="1.0785"/>
    <Cube currency="JPY" rate="117.10"/>
    <Cube currency="BGN" rate="1.9558"/>
    <Cube currency="CZK" rate="27.539"/>
    <Cube currency="DKK" rate="7.4689"/>
    <Cube currency="GBP" rate="0.87850"/>
    <Cube currency="HUF" rate="365.15"/>
  </Cube>
</Cube>
</gesmes:Envelope>

If we load that into an XML parser we can access the Cube elements to get the values with something like:

xmlDoc.DocumentElement.ChildNodes[2].ChildNodes[0].ChildNodes

But that can be a bit challenging to write and maintain, especially if they add, move, or remove a node for some reason before you get to the last set of nodes.

Introduction to XML was originally found on Access 2 Learn

Walter Wimberly

Assistant Professor

Walter Wimberly is an Assistant Professor at a regional college in Tennessee, teaching Computer Science in the Software Engineering track. He works as a student advisor, oversees curriculum changes, develops new courses, and manages the advisory panel.
Walter taught full time for about 7 years, before going back into “industry” as a full stack Software Developer for a dozen years. There he focused on web based projects coding in JavaScript/jQuery and utilizing the Bootstrap CSS Framework on the front-end, and coding in PHP, ASP/ASP.Net, SQL on the back-end.

Since he loves teaching, he taught as an adjunct web and digital media classes for eight (8) years, while working in industry, and has since returned to teaching full time.

He has been married for over 25 years, and is father to several special needs boys. As such, he is working on some projects to help others who have special needs to be self-sufficient, and support the care givers of those with special needs. Check out his Autism blog for more info.

Algorithms

Brute Force Searching

ByWalter Wimberly November 9, 2020November 9, 2020

Now in some cases, we’ll use a brute force method of searching. This is slow and inefficient from a run-time perspective. However, it is relatively fast to write. With a brute force search, you scan through every item in a list until you find the element. Why use Brute Force As you can imagine, scanning…

Algorithms

O(n log n) Algorithms

ByWalter Wimberly December 31, 2024December 31, 2024

A linear O(n log n) algorithm is a type of algorithm that has a time complexity of O(n log n), where ‘n’ represents the number of elements in the input data. This means that the running time of the algorithm increases linearly with the size of the input, but with an additional logarithmic factor. Now…

Algorithms

The Bubble Sort

ByWalter Wimberly November 9, 2020November 9, 2020

The Bubble Sort is one of the simplest sorts you can write, from a developer’s perspective. Unfortunately, it is one of the slowest to run from the computer’s perspective. A bubble sort works on the following idea. Given a list of values, start at the top value and look at the next value. If the…

Algorithms

Insertion Sort

ByWalter Wimberly February 3, 2020February 3, 2020

An insertion sort is a bit of a special case. It assumes that you will be given a set of values over time, and as you get them, you will insert the value into its correct location based upon the given values. Depending upon the data, and size, you could utilize different types of sort,…

Algorithms

Tower of Hanoi

ByWalter Wimberly March 23, 2020March 23, 2020

The Tower of Hanoi, sometimes called the Tower of Brahma puzzle, is one of the classic problems to look at if you want to learn recursion. It is good to understand how recursive solutions are arrived at and how parameters for this recursion are implemented. The basics of the problem is that you have a…

Algorithms

Depth First Searching

ByWalter Wimberly March 4, 2020March 4, 2020

A depth first search algorithm analyzes the first child node for a match, and if one is not found, then it moves to that nodes children. This repeats until either a depth limit is reached, or there are no more children. Because of the complexity of some trees, consider a list of all websites to…

Algorithms

Similar Posts