OWASP Automated threat handbook

Some quick thoughts after reading the OWASP Automated Threat Handbook

These kinds of attacks are carried out by seemingly legitimate but actually malicious users of your application. The crux of it is: spend time thinking about how your application can be probed, scanned, scraped, flooded or most commonly have it’s otherwise normal functionality subverted.

The next step is to build in countermeasures. This is where a prioritized checklist would lend itself well to laying out which things to build or plan for. There is a huge variety of actionable steps to take: obfuscating urls such that your site can’t be spidered, adding page and session-specific tokens, purposely load-testing/flooding your app, performing user-agent fingerprinting to weed out some automated requests, writing test cases for abuse scenarios, monitoring for anomalous requests and dozens of other things. The priority depends on your infrastructure inventory and your risk tolerance which should hopefully have been clarified after your risk assessment (which you did, right?).

Automated threats kind of fall into two attacker use cases: application recon and unfair resource usage. In the context of e-commerce (where these threats are most common), this can mean sniping/hoarding concert tickets or scraping product prices, enumerating valid users or validating stolen credit cards. So, this means that when creating legitimate features take the time to think of how they can be subverted or have unintended uses. From a technical point of view, ensure you’ve performed security assessments and vulnerability scans, have monitoring and instrumentation in place and a whole bunch of other things like real-time detection and alerting from sources such as logs, DNS and computing resources. If you have health-checks you should have security-checks too, and spikes in CPU usage should be as concerning as spikes in unusual requests.

VPNs for all

Hey so it’s taken way longer than it should have but I finally got around to hosting my own VPN. The setup itself was pretty easy thanks to the Ansible scripts and documentation provided by the Algo project.

Although it works fine with all major cloud providers, I was hoping to host it on a raspberry pi. Unfortunately, due to the sorry state of ISPs here in Germany, my 2 in 1 modem/router lacks basic functionality like port forwarding, bridge mode, or the ability to use other firmware. Pi VPN is another excellent project that simplifies this task, assuming you have the ability to port forward.

Cryptopals!

I’ve been meaning to 1. Learn more python, 2. Increase my understanding of cryptography and 3. Prune my backlog of ‘interesting stuff to checkout someday maybe’. To that end, I’m gonna jump into the https://cryptopals.com challenges.

Python solutions over yonder: https://github.com/rpavlov/cryptopals

Encoding: Hexadecimal, base64

The difference between Base64 and hex is really just how bytes are represented. Hex is another way of saying “Base16”. Hex will take two characters for each byte - Base64 takes 4 characters for every 3 bytes, so it’s more efficient than hex. Assuming you’re using UTF-8 to encode the XML document,a 100K file will take 200K to encode in hex, or 133K in Base64. Of course it may well be that you don’t care about the space efficiency - in many cases it won’t matter. If it does matter, then clearly Base64 is better on that front. (There are alternatives which are even more efficient, but they’re not as common.)

There is more to base selection than efficiency, however.

Base64 uses more than just letters and numbers. Different implementations use different punctuation characters for indiciating padding, and making up the last two characters of the set of 64. These can include plus “+” and equal “=”. both problematic in HTTP query strings.

So one reason to favour base16 over base64 is that base16 values can be composed directly into HTTP query strings without requiring additional encoding. Is that important to you?

Notice that this is an additional concern, over and above efficiency. Neither base is inherently better or worse; they’re just two different points on a scale, at which you’ll find different properties that will be more or less attractive in different situations.

For example, consider base32. It’s 20% less efficient than base64, but is still suitable for use in HTTP query strings. Most of its inefficiency comes from being case-insensitive and avoiding zero “0” and one “1”, to mistakes in reproduction by humans.

So base32 introduces a new concern; ease of reproduction for humans. Is that a concern for you? If it’s not, you could go for something like base62, which is still convenient in HTTP query strings, but is case sensitive and includes zero “0” and “1”. base32 is case insensitive.

Why is XOR important in cryptography?

XOR represents the inequality function, i.e., the output is true if the inputs are not alike otherwise the output is false. A way to remember XOR is “one or the other but not both”.

Imagine you have a string of binary digits 10101 and you XOR the string 10111 with it you get 00010

Now your original string is encoded and the second string becomes your key if you XOR your key with your encoded string you get your original string back.

Python sidenotes

1: Statements vs functions

In python we have statements and functions, which is a little confusing. Certain things, in this case assert are functions in other languages. In python assert is a statement! Crazy!

So assert(False, "Oh no, something is wrong) will deceptively not trigger the error message. However, assert False, "Oh no..." will behave as expected:

/home/rpavlov/Projects/cryptopals/set1.py:30: SyntaxWarning: assertion is always true, perhaps remove parentheses? assert(len(byte_str1) == len(byte_str2), 'Byte arrays are of different lengths!')

In the switch from python2->python3 this is even more confusing, because print went from being an expression to a function. Coming from ruby, puts 'hey' and puts('hey') behave exactly the same.

2: zip()

The zip() function in Python 3 returns an iterator. It’s typically used to interleave two lists.

numbersList = [1, 2, 3]
numbersTuple = ('ONE', 'TWO', 'THREE', 'FOUR')

result = zip(numbersList, numbersTuple)

resultSet = set(result)
print(resultSet)

{(2, 'TWO'), (3, 'THREE'), (1, 'ONE')}

strList = ['one', 'two']
result = zip(numbersList, strList, numbersTuple)

resultSet = set(result)
print(resultSet)

{(2, 'two', 'TWO'), (1, 'one', 'ONE')}

3: bytearray([ord(char)] * len(hextobyte_array(buff)))

The purpose of this line is to expand a single character to a byte string of the same length as the one we want to xor against.

Misc notes

  • Ciphertext: result of running a piece of text throught a Cipher, otherwise known as an algorithm.
  • In cryptanalysis, frequency analysis (also known as counting letters) is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers.
  • Summarize the example in

Github pages domain takeover

Recently my partner’s website was defaced, in that it was serving new, different content at the old url. It is a simple static site hosted on github, which made the severity of this exploit initially puzzling. How could this happen? The site accepts no input and the DNS settings looked normal.

Luckily, this was one of those easily searchable and already known issues with Github pages. The short of it is, if you don’t have a CNAME file at the root of your project and someone else beats you to it, they can claim your custom domain and Github will happily serve their index.html instead! Other providers often require you to have a TXT record in your DNS records, but GH does not require proof of owernship in order to set a custom domain in the repository settings. The final take-away is you can’t trust even the big players to make sound technical decisions.

Here’s a great and even more detailed breakdown.

1 of 5 >>