Robert’); DROP TABLE students;: The Importance of Input Sanitization in Databases

By Staff Contributor on November 27, 2019


Databases run the internet these days. Forums, online shopping, payment processors, and databases—most of them some form of SQL—are a massive part of the internet. That’s all well and good—very little would get done on the internet without the backing of databases—but when you deal with any form of user input, it’s paramount to clean it up before it’s sent to your database application.

To borrow an example from an XKCD comic, what would happen if an attacker entered into a form field: “Robert’);DROP TABLE CUSTOMERS;ALTER USER ‘db’@’localhost’ IDENTIFIED BY ‘passwordChange’;?”

If you shuddered at the results, you’re entirely right to do so. Command injection by unsanitized inputs is a common way for an attacker to gain unauthorized access to a database. In some cases, this can have cross-database ramifications (especially in multi-tenant or multi-database environments), though the prevalence of containerization and virtualization technologies has negated at least part of this attack. Not all organizations use these technologies, so it’s still important to make sure an attacker can’t jump to an unrelated database through any kind of unsanitized input.

In this blog post, I’ll explore the ways your organization can improve their input sanitization, from the traditional regular expression and error handling to the more modern way of using sanitization server-side libraries and gateways to take the hassle out of it and permit the set of sanitization rules to be rapidly updated as new vulnerabilities appear. I’ll also touch on the importance of testing before your database sanitization setup goes live.

The Basics

The first lesson anyone learns when setting up a web-to-database—or anything-to-database gateway where untrusted user input is concerned—is to always, always sanitize every input. Failure to sanitize inputs can lead to attackers including SQL code in form inputs so they can do any number of interesting things, ranging from deleting information from a database to injecting information.

Injecting information into a database can not only cause records in the database to be incorrect, they can lead to further compromise. A classic example of this is when a web server allows dangerous and powerful functions to be executed, such as the php system() call. Here, a data sanitization failure could lead to remote code execution on the server itself, an altogether much more serious problem.

Many database attacks have been carried out by exploiting poorly coded—or too relaxed—input sanitization rules to obtain remote command execution against the SQL server. However, this is by no means the only way to accomplish code execution attacks using SQL injection. Data sanitization is also important where system commands are executed with user-specified parameters, as in a router’s web interface, and this opens the door to appending partial commands into a database that will only be completed when predictable user input is received. These partial commands are often harder to detect.

There are many approaches to data sanitization, but most people start by blocking the most common characters, like “;” and “).” Developers writing their own input sanitization routines will usually rely on regular expressions to help filter out unwanted inputs, and indeed this is one of the first skills a web developer is expected to learn.

Whitelists and blacklists of commands, or partial commands, are also common, as is the use of extensive error handling. In fact, people in computer science circles say a well-designed application is approximately 50% error handling, to put things in perspective.

But security threats evolve at a rapid rate, and a manual list of banned strings won’t keep up with new attacks. It’s hard for any individual developer or administrator to know all of what needs to be watched for. It isn’t uncommon to see input sanitization efforts cover a wide range of potential attacks, but then leave out a seemingly obvious character like “‘,” as well as character strings that, due to the vagaries of encoding, might somehow be interpreted as database escape sequences.

Beyond Do-It-Yourself

Increasingly, organizations are turning to sanitization gateways or libraries to solve this problem. Highly tested, rapidly updated pieces of software sit between the untrusted user input and the database server, ensuring what gets to the database is either cleaned up, harmless, and correct; or cleaned up, harmless, and malformed in a way the database itself will reject, preventing remote code execution.

However, these gateways or libraries are not fire-and-forget affairs. Test everything, repeatedly, and don’t believe the use of third-party error-handling products removes the need for proper error handling.

The best way to test security is to try and break security. It’s crucial to test every corner case you can think of, repeatedly, from buffer overflowing to potentially cause a denial of service, to non-ASCII characters being interpreted as escape sequences somewhere down the line.

Database languages and engines are moving targets, so it’s of paramount importance to keep re-testing. Additionally, new vulnerabilities are discovered all the time, often something new is found every day. Organizations need to have a plan for keeping both their database servers and their database processing gateways up to date on a regular schedule.

As the internet grows, and more and more of our personal, professional, and governmental information is stored in web-queryable databases, organizations need to investigate their input sanitization processes as soon as possible. As GDPR goes into force across the world, database administrators need to ensure an errant “;” won’t result in the contents of their databases being spewed down an HTTP connection to a bad actor, or worse, the bad actor then having remote execution capabilities on a database. The consequences for, say, a government identity service are unpleasant to consider.

But these attacks are real, and organizations need to be prepared. From web forms to forums, text message inputs, and beyond, data sanitization needs to be universal.

Related Posts

Leave a Reply