I recently fielded a question from a former co-worker who works as a senior Oracle Database Administrator (DBA). I spent much of my career with Oracle, and I’ve asked myself the same question:
I am one of those DBAs who can see Oracle’s market share gradually being eroded and replaced by a whole bunch of new database vendors. I would like to transition from Oracle to “Big Data” but am struggling to highlight a path to do so. Which technologies should I focus on? Which programming languages?
To answer this question, we need to take a step back and see things from a bigger, historical perspective.
How did we end up with DBAs?
Before we can answer this question, we need a workable definition of ‘DBA’. Is a DBA a form of system administrator, a business knowledge guru, or something in between? The answer lies in how the DBA role evolved – with the advent of relational database management systems (RDBMSs).
The RDBMS solved a whole series of problems that prevented people from getting value from their data. This initial success then led to the concept of the ‘Enterprise Data Model’, which in turn implied a need for someone to represent the enterprise, not the needs of one application. This is fundamentally what a DBA is – a curator of the enterprise’s data.
Questioning the need for DBAs
Developers didn’t always see the need for DBAs. They have traditionally been perceived as ‘expensive.’ But downtime is ‘expensive,’ and the only thing worse than being down is being up and spouting the wrong answers. Preventing either of these things from happening is the DBA’s job.
Over the last decade the major database vendors have added more and more functionality to ‘automate’ the DBA role, but in practice this makes the database less predictable. In some cases, more people are needed to control the ‘automated’ functionality’s side effects. A good analogy here is the introduction of ‘Fly By Wire’ in the aviation industry, which eliminated some hazards but replaced them with new ones.
Can we actually live without them?
Being older I am in a position to answer this question with a ‘yes’. I started my career before the ‘DBA’ role was invented and worked with file-based systems. The only fundamental difference between Hadoop and 9 track tapes is that a human has to hang a tape, whereas you can tell a computer to process a Hadoop Distributed File System (HDFS) file. The problem we faced then was not that tapes performed poorly (at least when compared to our goals), but that there was no single, live repository of the business’s data, and that what passed for a ‘data model’ was in fact a loose constellation of mini-schemas from individual applications. Building a single application is easy – building fifty and having them use the same data without errors is pretty much impossible. This is what the NoSQL community is seeing now.
What worries me is that the RDBMS became so ubiquitously successful that two negative things happened. The first was that people started using it in applications for which it wasn’t well suited, simply because ‘everything’ was in the database: XML, CLOBS and object layers tacked on top of an otherwise blameless RDBMS, for example. The second was that over time people tended to forget the value it provided and instead focused on the visible deficiencies – a bit like people who object to measles vaccinations because they’ve never encountered measles themselves. A wave of database innovation resulted, but not all of the innovators have experienced life before the RDBMS and thus don’t understand the world they are entering when they abandon concepts such as ACID. While not all applications need ACID, many do, and what might be perceived as an acceptable limitation for a standalone application might be totally unacceptable in an enterprise context.
The real value in a database is what you prevent from happening…
This sounds perverse but think about it: If I allow people to store anything they want in an unstructured key value store, I am betting that every single developer who ever works with this data will write code that can successfully interpret the contents. Adding a column in a conventional database borders on the trivial. Adding an extra attribute to an unstructured JSON object creates all sorts of issues about how the new code will co-exist with old data. A key but overlooked aspect of the DBA job description is that ‘curation’ involves forcing data to follow rules and standards so it is actually possible to process it with a computer program. It’s not about what you make possible. It’s about what you prevent. Curation also involves controlling access. Failing to control access used to be merely embarrassing, but the costs can now be measured in hundreds of millions of US$ and are getting higher every day.Iin some cases data breaches represent an existential threat to the companies involved.
Without DBAs history will repeat itself
Though the DBA role won’t vanish, we are in a period of chaos where people may believe it isn’t needed. Yet it is clear the future will have enterprises using multiple DB platforms to manage data, and there will still be a need for ‘’curators’ of that data. Companies can’t afford the learning curve of every developer becoming expert in every new technology. They also need a long term plan to create and manage an appropriate ‘zoo’ of DB technologies, instead of allowing industry fashions and the blank spaces in a developer’s resumes to dictate how and where the enterprise stores its most valuable asset – its data.