code review – DiabloHorn

In the previous blog post you could read about my experiment with using Microsoft Application Inspector as a tool to:

- Scan a code base
- Identify technology components
- Visualize & determine risk

What we learned was that using a pretty coarse tool to establish a guesstimate of risk seems to be doable. We also learned that the output format lends itself very well to transfer knowledge about a code base.

But, how do we go from “seems doable” to “I’ve got a transferrable guesstimate on code risk”? My first idea was to just continue with merging the previous kibana visualizations into a nice interactive one and be done.

After some playing around I noticed that it wasn’t that easy! Further experimentation revealed, that the main reason I was struggling to visualize risk is the fact that I had no clue what I really meant by ‘code risk’. Sure, I know how to spot vulnerabilities in code and reason about it, but is that really a risk? I asked other pentesters and it was interesting that finding the vuln was easy, defining the risk was not. For example, a code base with clearly observable vulnerabilities is often viewed as a risk. What if those vulnerabilities are fixed, just enough, do we then still consider that the risk is present? If you are wondering what ‘just enough’ means, here is an example:

Vulnerability: SQL injection in ID parameter of an URL
Root cause: Untrusted user input used in a non-parametrized SQL query
Fix: Regex to only allow numbers

So, would this after implementing the fix: a) Be a vulnerability? b) Could we consider this a risk? I think that at the very least the query should also be rewritten to be parametrized. Why do we still hammer on changing the code if the vulnerability has been fixed? Because, speaking for myself, the used function or implementation of the functionality is flawed by design. So even if the exploitable vulnerability has been fixed, I still consider this risky code.

Yes, you could argue that it is the ^{use of the function} and not the ^{function itself} that carries the risk. For this blog and the purpose of this experiment, I’m not going to dive into those semantics.

For now, let’s dive a bit deeper into understanding risk then defining risk and hopefully visualizing risk in an interactive manner. The transferrable aspect is, of course, the fact that the knowledge is bundled into easy and structured file formats. The github for the POC files can be found here.

The benefits of being exposed to new subjects are that you tinker with them and start experimenting. Hopefully, this blog leads to some new ideas or at best revisits some established ideas and attempt to show that a less perfect approach might just work. Also keep in mind I’m by no means an expert in advanced automatic code / data flow analysis.

So at my current company, one of our units is doing some pretty cool work with ensuring that security operates at agile speeds, instead of being slow and blocking. One of their areas of focus is the automation of code reviews augmented with human expertise. One of my former colleagues Remco and I got chatting about the subject and he brought me up to speed on the subject. The promising developments in this area (as far as I understood it) concerns the ability to grasp, understand, process the language structure (AST), but also the ability to follow code flows, data types and values and of course lately the practical application of machine learning to these subjects. In a way mimicking how code-reviewers go through code, but using data flow techniques to for example track untrusted (external) input.

What is it good for? That was my first question. Turns out that if you have the above-described ability, you can more easily and precisely spot potential security flaws in an automated manner. It also enables you to create repeatable queries that enable you to quickly weed out security vulnerabilities and detect them if they or variants somehow creep back into the source.

Because just as with regular ‘user security awareness’, an automated and fool-proof process will beat ‘awareness’ every time. Having security aware developers is not bad, but having automated processes and process-failure detection is even better.

However, the above is pretty complex, so I decided to tinker with a less optimal and perfect solution and see what I could do with it. My main goal was to achieve the following:

Enable a guesstimate on what part of a code base could be considered ‘risky’ security wise. Capture the code reviewers knowledge and improve the guesstimate of ‘risky’ parts of a code base.

The above would result in an improved ability to process codebases according to a more risk-based approach, without continuously needing expensive experts. It would however not be fully precise, generation of false positives. The lack of precision I accept in return for the ability to at least have a guesstimate on where to spend focus and effort.

If you are still interested, keep on reading. Just don’t forget that I’m a big fan of: better to progress a bit than no progress at all.

Perfect is the enemy of good

Continue reading “[Part 1] Experimenting with visualizations and code risk overview”

Tag: code review

[Part 2] Interactive and transferrable code risk visualization

[Part 1] Experimenting with visualizations and code risk overview