Databases Explained To My Grandma

Ghislain Fourny
5 min readMay 26, 2022

When can you tell that you really understand something? A very good test is to try and explain it to people with no knowledge in the area. Indeed, this forces you to stick to simple words and concepts. I thought I would try it out on databases. If you are not a technical person: this post is for you.

(Cuneiform writing, 2600 BC — Public Domain)

An early history of databases

To be fair, databases have existed since we are able to speak, that is, for almost 100,000 years. Of course, back then, the Homo Sapiens did not have any cutting-edge data center to store and share their knowledge.

Still, language was a necessary step. In the antiquity, tales of epic battles were told and sung from generation to generation, relying on oral tradition. People would learn them all by heart, and transmit them to the younger generations, which in turn would do so to the next one and so on.

Then three major revolutions changed this dramatically:

  • The invention of Writing: The first writings known to date were found in Mesopotamia and contained inventories and sales. Yes: a database! And if you look carefully at the horizontal and vertical lines on the picture on the top, you may even see that it is organized in some kind of “prototable”. For the first time, data did not have to fit in human memory and could be more reliably transferred throughout centuries.
  • The invention of the Printing Press: Johannes Gutenberg’s invention suddenly scaled up the production and diffusion of books, and hence of knowledge: no need to hire monks to copy them manually any more!
  • The invention of Computers: Computers marked the last necessary step towards modern databases, providing a way to retrieve and process data much faster, but also to validate it and ensure that it is consistent.

The modern era

Back in the sixties, data was typically stored as files on disks. That is, files just like a Word or an Excel file, but less fancy. Files were organized in hierarchies, a bit like you can store your documents in folders and drawers in your office. Thanks to hierarchies, data could be fetched reliably. Yet this was not really natural, and above all, not usable by anybody else than computer-savvy people.

Edgar Codd (IBM) founded the modern era of databases with a conceptual revolution. He exposed the data to the users in a more abstract way: as tables. Rows and columns. Everybody can understand tables. Maybe this is why spreadsheets are so popular and heavily used by business users and consultants. The most notable use case is, I think, that of the librarians, who previously relied on hand-written files and folders for the inventory of their books, and who could progressively switch to computerized databases.

Databases relying on tables are very established nowadays, the main three providers being Oracle, Microsoft, and IBM. IT people call them relational databases. You may also have heard of them as SQL databases.

The shapes of data

When you, as a human, interact with data, you need to have a conceptual model in your mind, a data model. This model shields you from unnecessary details on how the data is stored. A data model is very much about the shape of the data. And tables are not the only way data can look like. I’ve come to think that data comes in five main shapes:

  • Tables: imagine a list of students attending a class, organized in columns (last name, first name, city, etc). Or think of a spreadsheet.
  • Trees (hierarchies): imagine a genealogical tree that shows all your ancestors. Like the name indicates, they are organized in a tree structure.
  • Graphs: imagine a public transportation map with all stations linked together with trains, trams, buses and ships.
  • Cubes: popular with business analytics: slice, dice, roll up, cross-tabulate…
  • Unstructured data: text, images… they gave birth to the whole field of information retrieval.

In other words, relational databases support some of the shapes of data (tables), but do not support trees or graphs — at least not without hitting them with a hammer to force them to fit into a table.

BigData and NoSQL: so, what’s new?

The 2000s brought three main revolutions to the database world. The first one is that all five shapes mentioned above are supported: there are now databases that natively support trees (document stores), graphs (triple stores), cubes (OLAP), text/images (information retrieval systems) as well, in addition to the tables of the 70s (also supported by so-called column stores).

The second revolution is that new technologies can store and retrieve data in all these shapes efficiently at unprecedented scales: Terabytes, Petabytes, with Exabytes on the way (hello, CERN!).

The third revolution is that you can throw heterogeneous data into the same bag. It is a bit like putting books of various shapes, colours, topics on the same shelf and dealing with sorting them later.

Talking to with data

In order to interact with data, you need a language. However, the languages you use to talk with data ( query languages) are slightly different from typical programming languages. Put simply, these languages allow you to filter the data in order to only get what you need. The most prominent filters are called selection (with the class list example: say, “give me all people living in Palo Alto”) and projection (say, “give me only their first and last names). New generations of databases are excellent at performing these very efficiently and quickly.

Finally, efforts are made in graphical interfaces to databases, so that everybody can work with data with knowledge of the query language. Some call this interactive data.

So… what’s next?

It is always a tricky endeavour to tell what the future will be made of. Who would have said, 20 years ago, that we would all be connected and reachable with portable devices regardless of our location? For the interested reader though, I made some guesses about databases in a former post.

Originally published at https://www.linkedin.com.

--

--

Ghislain Fourny

Ghislain Fourny is a senior scientist at ETH Zurich with a focus on databases and game theory.