The Cryptographer's Codex
An opinonated introduction and general resource for those interested in cryptography
Contents
Q | A |
---|---|
Not sure if cryptography is your thing? | Start with the introductory beginner series of challenges, they're a short series of 4 challenges that go from nothing to solving some real cryptography challenges |
Wanting to learn more about cryptography? | Work on cryptopals and follow the companion guide! |
Done both and looking for more? | Take a look at cryptohack for more challenges that cover additional topics like elliptical curves |
Want more low level systems-like cryptography? | Take a look at Block Breakers for a guided series on attacking block ciphers themselves |
Prefer lecture style lessons instead of getting hands on with challenges? | RPISEC has a bunch of streamed lectures covering a number of cryptography concepts. (If anyone has any textbook recommendations feel free to open a PR on the repo) |
Interested in building out your cryptography toolkit? | Check out some of the assorted guides |
Want to test your skills against a set of beginner/intermediate challenges? | Try out picoCTF, an excellent beginner CTF put on by CMU |
Already built up a foundation with cryptohack/cryptopals? | Dive straight into CTFs! Check out ctftime to get a list of upcoming events |
Tutorials
Resources
- Setting up a programming environment for solving CTF challenges
- Python basics, tips, and tricks for the beginner cryptographer
About
The Cryptographer's Codex is primarily consisted of challenges and documents written up by Arctic of Maple Bacon
The codex was an idea that I had when we (Maple Bacon) first hosted our beginner CTF for UBC students, SaplingCTF. The name was catchy so it stuck around and over time I built up more and more stuff that I had written up for various things so I thought I should just collect them together under site.
The end goal with the site is to have a place for us cryptographers to point interested CTF players towards, with a curated set of challenges, guides, and setup resources.
Contributing
If you have a guide/resource that you would like to see added to the page, feel free to open a PR on the github repo and ping @rctcwyvrn
Frequently asked questions
- Why cryptography?
- What does attacking cryptography teach you?
- Is cryptography just math?
- What skills do I need to get started?
Why cryptography?
It's a lot of fun! It feels great to think something is impossible to break, spend hours starting at it, and then eventually pry it apart and break it into pieces to make it decrypt out your flag.
The way I like to think about it is that solving CTF challenges usually involves trying to do something under a set of constraints. In binary exploitation the constraints are things like the state of the program, the stack, and the heap. In web the constraints are the various protections that websites implement.
In cryptography the constraints that we have to figure out how to weave our way through are the protocols themselves, the mathematics behind the operations and/or the implementation of them in the real world. I find crypto a lot of fun because these rules are arbitrary, they're just whatever the challenge author or protocol author thought of, leading to each challenge having it's own unique flavor rather than always having to deal with things like ASLR.
Cryptography is also uniquely vast in the kinds of ways that people try (and fail) to make connections secure, and the ways that we find to attack these protocols are also uniquely diverse. Having so many different ideas underpinning cryptography means there's a lot of things to learn and it's likely you'll find something that clicks for you and something you'll enjoy attacking, whether that's high level math-based attacks or low level cpu side-channel based attacks.
TLDR: Crypto challenges feel fresh and new because of the freedoms challenge authors have when writing their challenges. There's also an extremely wide range of cryptographic attacks and techniques so you can really look for what you find fun
What does attacking cryptography teach you?
Unfortunately, since not many people will ever have to write their own crypto code, cryptography as a CTF category is less useful for a typical CS career than things like pwn or web which teach you directly applicable skills for memory and website security respectively.
However this doesn't mean that cryptography doesn't teach you anything! Crypto and other CTF categories teach you one extremely important thing, which is arguably the most important thing you can learn by doing CTFs, an attacker's mindset.
What is an "attacker's mindset"? It's a way of thinking where you pick and pry and try to break whatever bit of code you're looking at. It's a way of reading code where you carefully examine each part and consider what assumptions each function makes and how that might go wrong. It's understanding code not just for what it wants to do, but trying to prove in your head that it only does what it wants to do and can't be manipulated otherwise.
Experience doing this is highly useful in security obviously, but is also just useful for writing correct code in general. Most modern programming languages don't provide many guarantees (rip dependent types) on correctness or state and it often comes down to the programmer to be able to mentally check the preconditions and postconditions of your code. If you're highly used to doing this in an attacker's mindset then you end up being quicker at finding how things can go wrong, instead of being focused on the happy paths where everything goes right
Outside of practical skills, for me a big general lesson was that shit can go wrong really fast. Cryptography breaks with even the smallest of mistakes and learning some crytographic attacks is a good way of really hammering that lesson in. Cryptography is extremely important but there's a reason why there's a constant stream of new algorithms and papers attacking existing ones. There's a reason why the NIST standardization process takes years to go from new algorithm to established standard.
Is cryptography just math?
No! There's plenty to do that won't touch any math at all!
Things with a lot of math:
- Most asymmetric cryptography (RSA, DH, ECC)
- Key exchanges, signing, asymmetric encryption
- Random number generation (LCG, LFSR, MT)
- Some post quantum cryptography (Lattices)
- Quantum cryptography (Lots of physics)
Things with no math:
- Most symmetric cryptography (AES)
- Ciphers, block cipher modes
- Some attacks on cipher construction start to involve some matricies (affine, linear attacks)
- Hashing (SHA)
- Some post quantum cryptography (SPHINCS+)
- Side channel attacks (Power, timing, cache)
Disclaimer: You may find that a lot of CTFs have challenges skewed towards the math categories due to it being easier to write challenges for things like RSA and ECC. That doesn't mean that cryptography is all math, CTF stuff just tends to be skewed that way.
What skills do I need to get started?
Generally across the board the requirements to do easy challenges in each category is the same
- A machine set up with your programming language of choice (typically
python
)- Needs to be able to do basic operations and communicate with servers (
pwnlib
for python)
- Needs to be able to do basic operations and communicate with servers (
- A willingness to learn
- This is extremely important! No matter how good you get at any CTF category, you'll always be banging your head feeling like a challenge is impossible and having no idea how to solve it. Being "good" at cryptography doesn't mean you instantly know how to solve something, it means you know how to learn how to figure out to attack something.
For the general "types" of challenges you find at CTFs, there are some requirements, but these really are quite minimal. I'm not trying to make these categories seem easier than they are, they really do just don't particularly require any other CS knowledge. You really can just sit down and learn how they work.
Attacking block ciphers (AES)
- The current standard block cipher, AES, is constructed with 4 steps. Substitution, XOR, shifting rows, and shifting columns
- It turns out that some of these have mathematical representations, but at their core these operations are extremely simple. This is by design and is why AES is so fast
Attacking block cipher modes (ECB, CBC, CTR, GCM)
- Block cipher modes usually involve simple operations like XORs
- GCM mode is the main exception, since it creates an authentication code with a signature as part of it's encryption. This neccessitates it being more mathematical and complicated.
Attacking hashing algorithms (SHA)
- Similarly to block ciphers, SHA2 is made up of a number of rounds that involve a number of simple steps (bit shifts and XORs only!)
- All hashes that use the Merkle–Damgård construction are very similar (MD5)
- Common attacks usually involve a similar set of skills, lots of bit manipulations and XORS
- Some statistical stuff maybe
Attacking RSA and Diffie Hellman
- RSA and DH rely on modular arithmetic. Understanding the basic algorithms is easy and just requires knowledge of how
mod
works - Understanding why RSA works is more involved and requires learning some number theory (Fermat's little theorem)
- Basic attacks on RSA and DH involve basic math
- For example, all the challenges in the
cryptopals
andbeginner asymmetric
sets of challenges don't require any of the more complicated math below
- For example, all the challenges in the
- More and more advanced attacks on RSA start to involve more and more complex mathematics
- RSA uses prime numbers and primes have a ton of unique mathematical properties that number theorists study. These sometimes become CTF challenges (eg: smooth primes)
- Both algorithms use modular integers, or in math terms, finite Galois fields. Galois fields are extremely well studied and therefore have a lot of interesting properties that can lead to CTF challenges (eg: small subgroup confinement)
- Neither of these are required at the start, but if you plan to really get into RSA and DH this is kinda where you're headed
Attacking ECC
- As the name implies (elliptical curve cryptography), you'll need to know about elliptical curves to learn about how ECC works
- Elliptical curves are just a certain type of curve (like a weird sideways parabola)
- The idea is to use points on this curve as a mathematical group because they have nice properties, so what ECC does is extend the algorithms for RSA and DH (which use Galois fields) to this new elliptical curve field. So similar algorithms, just points on a curve instead of numbers (which is weird if you haven't done this kinda math before)
Knock on wood
Beginner series
Basics
Let's start with what you'll need to solve this set of challenges
- Basic programming ability
- An environment with docker and python installed
What exactly will you be doing in this series of challenges?
Included are 4 beginner challenges attacking symmetric cryptography, ie cryptography where both you and your partner share a secret key. This type of encryption is what handles most secure connections between you and a server because it's fast and you only need to exchange a key to get started.
This series will attack a number of (relatively) modern symmetric encryption methods, showcasing how minor flaws or mistakes in implementations of these methods can result in a perfectly secure encryption becoming trivial to break!
These challenges were originally featured in a local CTF Maple Bacon put on for UBC students, if you want similar stuff for other categories, check out our challenge repo at ubcctf/sapling-ctf-2022
Note: The later challenges have spoilers for earlier challenges, so be careful! Don't skip ahead!!
Copilot my savior
Challenge
Copilot is great! Look at this great encryption function it wrote for me
Getting started
To solve this challenge you'll have to do some byte manipulations in python. If you haven't written python before, don't worry! Python is notoriously simple and easy to pick up, granted that you've written in an imperative language before.
- If you haven't set up a programming environment with python installed, start here
- To get started with some byte manipulations in python, read this
All in all this challenge is really straightforward, the main goal is to get comfortable with python and doing byte manipulations
Ok but how do I solve this thing???
You're given a program that takes a secret file (flag.txt
) and "encrypts" it, changing it to something that is unreadable (flag.txt.enc
). Your goal is to reverse this process
Some advice then:
- Look carefully at
encrypt.py
. What information do we not know? What does theencrypt
function do? What could you do to undo it's effects if you knew the???
value? - What happens if you XOR a byte by the same byte twice?
- Think about the
???
value. What are the possible values that it could be? Could you try to guess it? How many guesses would you need to make? How would you know if your guess was correct?
Cut and paste
Challenge
Let's say you're making a video conferencing software and you need an encryption algorithm, but which one to pick?
There's so many options? Let's just choose the first one we see.
There's no way a real company would do something like that... right?
- https://securityboulevard.com/2020/04/simple-illustration-of-zoom-encryption-failure/
- https://citizenlab.ca/2020/04/move-fast-roll-your-own-crypto-a-quick-look-at-the-confidentiality-of-zoom-meetings/
Files:
- client.py
- server.py
- Note: This file contains the flag hidden as base64, you could just decode it from there but that's no fun
- Dockerfile
- start.sh
Getting set up
This is a hosted challenge, meaning you need to communicate with a server that's running server.py
. How do you set this up?
You could
- A: (Recommended) Run the provided dockerfile by running the provided
./start.sh
- B: Try to run the script directly, this only makes sense to do if you don't have
docker
installed and don't want to set it up for whatever reason
Guide
Welcome to challenge 2, where we'll be jumping straight into a modern encryption algorithm! Now you may think that just because this is a real encryption algorithm that Zoom really did use in their software that this challenge is going to be hard to solve.
But that isn't the case! Cryptography can often times be mathematically unbreakable, but present many flaws and pitfalls when brought to the real world.
So let's take a look at how that can happen
How does modern encryption work?
What do you do when you want to make a message secret? The most common solutions use something that's called a block cipher.
What is a block cipher? It's a function that takes 16 bytes and a key, and encrypts it. The standard block cipher, AES, is extremely extremely good at doing this. It's been analyzed and tested to hell and back and no one has been able to break it. How it works is a bit complicated so you don't need to worry about it.
As a quick example. With the key "catscatscatscats"
- The encryption of "mymessageiscool!" is 7560e6979e56348cfe27616e0ca8567b
- The encryption of "mymessageiscool?" is 59acdc0dbe8c2432c673bb05c2034c62
- See how the results are completely different even though the inputs are very similar? That's a security feature of AES!
- AES is secure in that if I gave you just "7560e6979e56348cfe27616e0ca8567b", you would have no idea what my original message was without the key. But if you had the key, you can recover the original message just by running the decryption algorithm
You might be wondering, if we have this perfect block encryption function, how could we possibly try to break any encryption? Didn't you say AES is unbreakable1?
Well let me ask you, you have this function that encrypts 16 bytes exactly, how do you encrypt more? How do you, for example, encrypt 100 bytes?
Block cipher modes
Block cipher modes are what allows you to do variable size encryption. Let's consider the most simple block cipher mode
Here's how you encrypt a variable number of bytes
- Add bytes to the end of the message in a special format so you know that you should remove it later
- Keep adding bytes until the message is a multiple of 16, this is called "padding"
- Split it into a bunch of 16 byte messages and encrypt them all
- Append it all together, and send it off
This is called Electronic Code Book, or ECB mode for short. It has a few glaring weaknesses, which you'll have to discover to solve the challenge!
Some tips
- Take a look at this ECB mode visualizer and play around with it, it gives a simple UI for visualizing blocks of a message and the resulting blocks of the plaintext
- What happens when you edit half of a ciphertext? What does it decrypt to? What if you edit more/less of a block?
- What happens if you change the order of blocks?
- What exactly do you have to do to get the flag? What do you need to trick the server into thinking?
- Look at the challenge title! CTF challenges often hide little hints in the challenge titles
- In CTF challenges you should be constantly thinking about what you have control over, and how you can abuse this control. In this setup you have three things you can control
- The username and password you send when you register a user
- The bytes you send to the server when it asks for a token
- What happens if your token is incomplete? Will the server accept it? How badly can you mangle your token and still have the server be ok with it?
- The server uses
input()
to get your username and password, so you're limited to normal alphanumeric and punctuation characters, you can't input arbitrary bytes in there. How can you work around this?
What can you do with this? How can you trick the server into encrypting something that you can use?
At least we sure hope so
One two three
Challenge
Ok so ECB mode was a bust, what else can we try? What if don't use blocks at all?
Guide
Good job figuring out the last challenge!
You saw that ECB mode has one crucial weakness, each block gets encrypted and decrypted exactly the same way, no matter where it's located in the ciphertext. This led you to be able to move blocks around to wherever you wanted and have the plaintext change in the same way.
So now that we know ECB is bad, how can we circumvent this problem? How can we make our cipher mode not have this problem?
Primer: The one time pad
Let's say I want to encrypt a one byte message, and I decide to do so by XORing it with a random byte. I then happily send this off and you happen to be able to intercept it.
So you have this one byte encrypted message, which you know is something that I encrypted via XOR. Is it possible for you to figure out what the original message was?
It turns out that the answer is no! If the byte I used for the XOR was random, then the resulting encrypted byte is also completely random and unrelated in any way to my original message.
This is what is called "information theoretically secure". Even if you had infinite time, you could never figure out which message my original byte was. This encryption scheme is called the One-time pad
Now back to AES
Counter mode
Let's say we want to bring that idea to reality, how can we generate a bunch of random bytes to xor our message against?
This stream of random bytes needs to be
- Able to be generated from the key, so we can regenerate it when we want to decrypt the message
- But still random enough that it's secure
Here's how Counter (CTR) mode achieves this
- Start a counter at c = 1
- Pad c out to 16 bytes and encrypt it with your AES key
- Then you have 16 random bytes, use that to XOR with your message
- Then increment the counter and continue until you have enough random bytes to encrypt your entire message
So the resulting ciphertext is like
- The first 16 bytes are the first 16 bytes of the plaintext XORd with the aes encryption E of the byte 1, E(pad(1))
- The next 16 bytes are plaintext_2 XOR E(pad(2))
- then plaintext_3 XOR E(pad(3)) and so on
The stream of random bytes, ie E(pad(1)) | E(pad(2)) | E(pad(3)) ... is called the keystream
This evidently solves the issue with ECB we saw earlier, because if you move a block around then it'll be something completely different when you decrypt it, since it was encrypted with a different part of the keystream.
The twist
Actually I lied, the description I gave is not the real CTR mode, because it has a crucial weakness!
Your job is to find this weakness and decrypt the messages found in secret.enc
To solve this you'll need to figure out what this weakness is, use some googling to figure out why this weakness is so critical, and learn how you can exploit it.
Some tips:
secret.txt
contains many lines of english text and, of course, the flag- You may find that your code from the first challenge, "Copilot my savior", will come in handy
- There's no shame in buying hints if you're stuck!
Aside
As is the pattern with the earlier challenges, this is actually a mistake that you can make when using CTR mode. Here's an excerpt from a paper about a very closely related cipher mode, GCM mode.
"We investigate (redacted) issues with the GCM block cipher mode as used in TLS and focus in particular on AES-GCM, the most widely deployed variant. With an Internet-wide scan we identifed 184 HTTPS servers (redacted), which fully breaks the authenticity of the connections. Affected servers include large corporations, financial institutions, and a credit card company"
Hints
If you really need some hints, I've left some base64 encoded here
TG9vayBjYXJlZnVsbHkgYXQgdGhlIGVuY3J5cHQgZnVuY3Rpb24uIEl0IGdlbmVyYXRlcyB0aGUga2V5c3RyZWFtIGZ1bGx5IGluZGVwZW5kZW50bHkgb2YgdGhlIHBsYWludGV4dCEgVGhpcyBtZWFucyB0aGF0IHRoZSBrZXlzdHJlYW0gaXMgaWRlbnRpY2FsIGZvciBlYWNoIHBsYWludGV4dC4gS25vd2luZyB0aGlzLCB3aGF0IGNhbiB5b3UgZG8/Cg==
V2Uga25vdyB0aGF0IHRoZSBrZXlzdHJlYW0gaXMgdGhlIHNhbWUgZm9yIGVhY2ggcGxhaW50ZXh0LCBidXQgd2hhdCBkb2VzIHRoaXMgbWVhbj8gVGhpcyBtZWFucyB0aGF0IHRoZSBmaXJzdCBieXRlIG9mIGVhY2ggbGluZSBpcyBYT1JkIGFnYWluc3QgdGhlIHNhbWUgYnl0ZSwgaWUgdGhlIGZpcnN0IGJ5dGUgb2YgdGhlIGtleXN0cmVhbS4gU2FtZSBmb3IgdGhlIHNlY29uZCBieXRlIG9mIGVhY2ggbGluZSBhbmQgc28gb24uIFRoaXMgaXMgcmVhbCBzaW1pbGFyIHRvIHRoZSBjb3BpbG90IGNoYWxsZW5nZSBpc24ndCBpdCEgWW91IGNhbiBndWVzcyBieXRlcyBpbiB0aGUgc2FtZSB3YXksIGJ1dCB5b3UgbmVlZCB0byBiZSBzbWFydGVyIGFib3V0IGZpZ3VyaW5nIG91dCB3aGVuIGEgZ3Vlc3MgaXMgJ2NvcnJlY3QnCg==
SG93IGNhbiB3ZSB0ZWxsIHdoZW4gYSBndWVzcyBpcyBjb3JyZWN0PyBXZWxsIHdlIGtub3cgdGhhdCB0aGUgdGV4dCBpcyBlbmdsaXNoLiBTbyBjYW4gYXNzaWduIGVhY2ggZ3Vlc3MgYSAnc2NvcmUnIGJhc2VkIG9uIGhvdyAnZW5nbGlzaCcgdGhlIHJlc3VsdCBpcy4gRm9yIGV4YW1wbGUsIGFkZGluZyBwb2ludHMgZm9yIGxldHRlcnMsIG51bWJlcnMsIHB1bmN0dWF0aW9uLCBhbmQgd2hpdGVzcGFjZSwgd2hpbGUgc3VidHJhY3RpbmcgcG9pbnRzIGZvciBlYWNoIGJ5dGUgdGhhdCBkb2Vzbid0IGZpdC4K
Propagation
Challenge
Ok so ECB mode was a bust, what else can we try? What about trying to chain blocks?
Guide
Good job figuring out the last challenge!
You saw that ECB mode has one crucial weakness, each block gets encrypted and decrypted exactly the same way, no matter where it's located in the ciphertext. This led you to be able to move blocks around to wherever you wanted and have the plaintext change in the same way.
So now that we know ECB is bad, how can we circumvent this problem? How can we make our cipher mode not have this problem?
Block chaining
Blocks should be encrypted differently based on where they are in the plaintext, so how about we encrypt a block differently based on the previous block?
If we try to move the block like we did with ECB, it'll fail because the previous block will be different, so this seems promising
Let's do it like this
- Assume we've encrypted the previous block to
C_0
, and now we want to encrypt plaintext blockP_1
- Let the next ciphertext block be
AES(C_0 XOR P_1)
, basically xor the new plaintext block with the last ciphertext block before encrypting with AES
But then what about the first block? It has no previous block to be XORd against. Well for that we introduce whats called an Initialization Vector, or IV for short. We basically just decide on some 16 bytes of random data, and this ensures each message is unique.
This cipher mode is called Cipher Block Chaining, or CBC for short. Wikipedia has an excellent diagram showing how exactly the encryption and decryption for CBC work.
The challenge
CBC mode was quite widespread for a long time and lacks any glaring weaknesses like ECB. However it's design leads it to have various pitfalls, one of which you'll need to exploit.
To find weaknesses in older cryptographic methods, it's often useful to look at their successors. For example CBC being ECB's successor makes it evident ECB's weakness in being too deterministic. One successor to CBC mode is Galois/Counter mode, or GCM mode for short.
One main feature that GCM provides is a built in message authentication code, which can basically be thought of as a tag that says "yes this message is exactly what I sent, and has not been modified in any way". In essence it prevents attackers from being able to modify the ciphertext.
Knowing that this is something that CBC lacks, try to solve the challenge!
Tips
- This challenge is structured extremely similarly to cut and paste, so take a look at some of the tips from that challenge. What still applies?
- Remember the given
client.py
is just a starting point. It's possible you'll need to make modifications to it in order to return some information that it's just printing right now - Remember that the AES key will be different each time you connect to the server, you have to do everything in one connection
- The code to solve this challenge is very short (you could probably golf it down to one or two lines), the difficulty lies in realizing what kind of manipulations you can make to the ciphertext. Once you realize the trick, you'll see that you have a lot of power over the resulting plaintext
- Stare at the wikipedia diagram for CBC. Stare at it the entire time while writing your solution. Trust me, it'll help
Hints
If you really need some hints, I've left some base64 encoded here
V2hhdCBoYXBwZW5zIGlmIHlvdSBtb2RpZnkgdGhlIGZpcnN0IGNpcGhlcnRleHQgYmxvY2sgc2xpZ2h0bHk/IFRoZSBBRVMgZGVjcnlwdGlvbiBvZiB0aGF0IGJsb2NrIHdpbGwgYmUgY29tcGxldGVseSBzY3JhbWJsZWQsIGJ1dCB3aGF0IGhhcHBlbnMgdG8gdGhlIGRlY3J5cHRpb24gb2YgdGhlIF9uZXh0XyBibG9jaz8gU2ltaWxhcmx5LCB3aGF0IGhhcHBlbnMgYWZ0ZXIgeW91IG1vZGlmeSB0aGUgSVY/Cg==
UmVtZW1iZXIgdGhlIFhPUiBvZiB0d28gaWRlbnRpY2FsIHZhbHVlcyBpcyB6ZXJvLiBJZiB5b3Uga25vdyB0aGF0IEEgWE9SIEIgPSBDIGFuZCB5b3Uga25vdyBDLCB5b3Uga25vdyB0aGF0IHRoaXMgKEEgWE9SIEMpIFhPUiBCIHdpbGwganVzdCBiZSBudWxsIGJ5dGVzLiBOb3cgd2hhdCBpZiB5b3Ugd2FudCBzb21ldGhpbmcgb3RoZXIgdGhhbiBudWxsIGJ5dGVzPyBXaGF0IGRvIHlvdSBuZWVkIHRvIGRvPwo
Cryptopals companion
THIS IS A WORK IN PROGRESS! THE FORMATTING WILL BE LESS GARBAGE AFTER IM DONE WITH IT
Preparations before we start
- This companion in general will not provide any direct solutions, I instead mostly give leading questions. If you need solutions, plenty of solutions and writeups are just a google away, though remember that looking at the answers at the back of the book leads to understanding less.
- Almost all ctfers use python for solving challenges across basically all categories, so that’s what I would recommend
- For python, all you’ll need is
- A working python installation
- pycryptodome
- Something to write code with,
vscode
is nice
- I’ll be assuming you’ll be writing in python for this guide
- For python, all you’ll need is
- Make sure you read the homepage, everything it says is true
- You don’t need (and aren’t expected to) know any cryptography beforehand
- They do unfortunately skip out on defining some terms, but that’s what this companion is for
- You only need simple math
- The hard math is left to the professionals and we just copy their equations :)
- No math shows up until set 5 anyway
- These challenges are attacks, not puzzles with tricks. The challenge authors make an effort to make the challenges clear and "doable", instead of obfuscated like you'd see in competitions. This companion guide also helps clear up the occasional ambiguity and confusing wording
- I like to put an extra little disclaimer whenever I recommend someone take a look at cryptopals, which is that after solving these challenges most people feel a strong feeling of dread. Dread when they realize how fragile cryptography can be and how most people writing code that uses cryptography are completely unaware that these fragile points exist
- You don’t need (and aren’t expected to) know any cryptography beforehand
- The general structure of the sets is
- Set 1 is preparation and “the qualifier set”
- Sets 2-4 are various attacks on symmetric ciphers and a few hashes in set 4
- Sets 5-6 are attacks on various asymmetric primitives, all using primes and numbers
- In case you’re wondering how well cryptopals will prepare you for CTFs, take a look at CSAW CTF 2020. Three challenges are basically directly taken from cryptopals and can be solved by copying over the code
- Authy is a length extension sha attack, covered in challenge 29
- modus operandi is a ciphertext detection challenge, covered in challenge 11
- adversarial is a fixed nonce CTR, covered in challenges 6 and 20
Some quick definitions
Encryption
- Encryption is turning a message (like “cryptography is cool”) and changing into a message that appears to be a garbled mess (like “falkjdior30vjs”) based on some “key”
- The method of encrypting the message and what the key is vastly depends on the algorithm, for example in rsa the key is a number, in caesar ciphers the key is a table, and in aes the key is 16 bytes
- Plaintext is used to refer to the message before encryption and ciphertext is used for the garbled message after encryption
Cryptopals Set 1
- Intro
- Challenge 1: Some useful functions and notes about python bytes
- Challenge 2: XOR in python
- Challenge 3
- Challenge 4
- Challenge 5
- Challenge 6
- Challenge 7
- Challenge 8
Intro
- A lot of this set comes down to how comfortable you are writing code in python
- There is a little hitch in that byte representations in python changed a lot between python2 and python3, meaning stackoverflow answers and such will be outdated and might not work
Challenge 1: Some useful functions and notes about python bytes
int(x, 16)
will parse the string x as a hex string, and return a number- The
Crypto.Util.number.long_to_bytes
andCrypto.Util.number.bytes_to_long
functions from pycryptodome convert between numbers and bytes - Bytes in python are really just an array of ints between 0 and 255
b”hi”
is really just[104, 105]
- the
bytes()
constructor takes an int array and returns a bytes object - using a for loop or a list comprehension like
[x for x in my_bytes]
will return the int array - you can also index into bytes objects to get the integers,
b”hi”[0] == 104
- Remember that bytes are just numbers with base 255
ord()
takes a single character and returns its integer valuechr()
does the opposite, integer to char- the
base64
package has functions to go from bytes <-> base64 strings, namelyb64decode
andb64encode
Challenge 2: XOR in python
- The xor operator in python is
^
and it takes two integers and returns the result of xor - A note about XOR in case you aren’t familiar: The operation cancels out with itself if you do it twice
- Ie:
A xor B xor B == A
- If you aren’t sure why this is the case, try writing out the table for XOR
- This is why “encryption” and “decryption” are actually the same action when using XOR, you end up doing the exact same thing
- Ie:
- How do we use
^
with bytes objects? Remember that we can index into bytes objects to get the underlying integers, so try a for loop and then combining the results afterwards withbytes()
back into a bytes object
Challenge 3
- This general idea comes up sometimes in cryptography attacks, but the idea is that we know the original must have been some english sentence, so we can bruteforce and see which one is most like an english sentence
- This idea also can be generalized to cases where we know some "structure" that the original message must be in and use that information to inform our bruteforce
- Note: Do not be lazy and do it by eye, the “scoring” function will be needed in the next challenge
- The
sorted()
function will probably come in handy - https://en.wikipedia.org/wiki/Letter_frequency
Challenge 4
- Basically the same as challenge 3 but at a larger scale, same principle applies
- We can assume that anything that isn’t encrypted with a single character XOR will result in a garbled mess when we perform an XOR on it
Challenge 5
- Just coding, good luck!
- The modulo operator
%
will probably come in handy here
Challenge 6
- Hamming distance function tips
- To format an integer as a binary string, use
“{0:b}”.format(x)
- Use in combination with
bytes_to_long
to convert the bytes to the bit string - If the lengths differ, the difference in length should be added to the distance
- To format an integer as a binary string, use
- Solve tips
- Try to reuse as much code from earlier challenges as possible
- Transposing means taking
[[a,b,c], [1,2,3]]
and making it into[[a,1], [b,2], [c,3]]
- An aside
- I’ve actually used my code for this challenge quite a few times. You might be wondering what real world algorithm uses a repeating key xor and the answer is none, but many ciphers do actually use XORs. This challenge may seem like a toy challenge, but when we get to challenge 20 in set 3, you’ll see how this attack actually works very well on a popular block cipher mode.
Some definitions
- AES (Advanced Encryption Standard) is a block cipher, it takes a block of 16 bytes and garbles it based on a 16 byte key
- However this presents a problem, what do you do if your message is longer than 16 bytes, for example a 32 byte message?
- If you’re curious about what AES does under the hood (not needed for these challenges, but fun to learn about) https://www.youtube.com/watch?v=O4xNJsjtN6E
- ECB stands for Electronic Code Book, and is a block cipher mode of operation. Block cipher modes are ways of making block ciphers (which can only encrypt a single block) usable for larger messages
- ECB is the most straightforward, cut the message into blocks and encrypt each of them separately.
- This turns out to be a terrible way of doing things, and you’ll see why in the next few challenges
- https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_codebook_(ECB)
Challenge 7
- pycryptodome provides an AES implementation in Crypto.Cipher.AES
- https://pycryptodome.readthedocs.io/en/latest/src/cipher/aes.html
- You should make both encryption and decryption functions, you’ll need it in a bit
Challenge 8
- There is a hidden assumption in this challenge that I think really should have been mentioned. The challenge assumes that the plaintext must have a repeated block.
- The important point here is that ECB is the only cipher mode that has this property, that the same plaintext blocks will result in the same ciphertext blocks
- This weakness of ECB turns out to cause many many problems, which you’ll see in set 2
- The worst part is ECB is seen as the “default” setting for block ciphers, despite it being almost always a terrible idea
Cryptopals Set 2
- Challenge 9
- Challenge 10: A new block cipher mode, CBC
- Challenge 11
- Challenge 12
- Challenge 13: Cut and paste
- Challenge 14
- Challenge 15
- Challenge 16
Challenge 9
- Fairly straightforward, just some more programming
Challenge 10: A new block cipher mode, CBC
- CBC or Cipher block chaining is another block cipher mode
- The goal of CBC is to not be deterministic like ECB, ie two identical plaintext blocks should encrypt to different ciphertext blocks
- How does it accomplish this?
- CBC requires an extra initialization vector or IV for short, which is just random bytes the size of a block
- For each plaintext block we want to xor our plaintext with something first, and then encrypt it with our cipher
- For the first block we xor it with the IV before encrypting
- For every other block we xor it with the last ciphertext block
- Essentially “chaining” the different blocks together, so the resulting ciphertext of a block depends on the “sum” of all the previous blocks and the IV
- https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_block_chaining_(CBC) has some very useful diagrams
- Implementing it is fairly straightforward once you understand whats going on
- Decryption is just running this in reverse, block cipher decryption first and then xor
- Note: ECB mode with one block is the same as just running the cipher, so you can reuse that as long as you’re encrypting/decrypting only one block
Challenge 11
- This is a bit of a weird challenge, I’m not really sure what it’s accomplishing?
- I don’t think there’s a good way to detect CBC mode, so I just detected ECB mode and then guessed CBC otherwise
- Again, it only works if you send a plaintext with repeated blocks
- Idk, maybe skip this one, it kinda comes back in challenge 12 but as an “optional” thing
Challenge 12
- Now don’t skip this one though, because it’s really cool (and also shows up in real CTFs!)
- The explanation is pretty good for how to get the first byte of the unknown-string, but the next step and automating it can be a bit tricky
- For the next byte you would first send 14 A’s + the first byte of the unknown string
- The first block would then be aes(“A”s + first byte + second byte)
- Then you want to start sending 13 A’s + first byte + guesses for the second byte
- And then like last time stop when you find a match
- For the next byte you would first send 14 A’s + the first byte of the unknown string
- The congratulations at the bottom is pretty accurate for CTFs too, every once in awhile I see another one of these pop up
Challenge 13: Cut and paste
- This one is fun to figure out, it’s the first time the authors really let you out and try to figure out how to do what it wants
- The title is a big hint, if we could copy paste things, what would we want to cut and replace?
- Given that this is ECB mode, what exactly can we cut and replace? (bits? bytes? characters? blocks? messages?)
Challenge 14
- Some clarification on this challenge, I originally thought the random prefix would be generated each time, a new random prefix of random length each time you touch the oracle
- I honestly don’t really know how one would go about solving that, so I and everyone else who has write ups for cryptopals instead assumed that the random prefix would be generated once and be reused for all later oracle calls
- So now the trouble is really just one thing, how long is that random prefix? How can you figure that out?
Challenge 15
- More programming stuff, this is just setup for challenge 17, which is gonna be a big one
Challenge 16
- Another fun one to figure out
- Try to think about what happens in CBC decryption with the user data block and the block after it
- How can we completely replace the second ciphertext block in a way to make the third block decrypt to what we want, namely “;admin=true;”? What happens to the second plaintext block in that case?
Cryptopals Set 3
Intro
- For me, this is the set where things really started to get fun
- This note at the start made me really hyped and I hope it makes you excited too
- “We've also reached a point in the crypto challenges where all the challenges, with one possible exception, are valuable in breaking real-world crypto.”
Challenge 17
- This is a great challenge
- CBC is (unfortunately) still a relatively popular block cipher mode, and is seen as "the good alternative" to ECB
- This directly builds off of what you learned about CBC decryption in challenge 16, how bit flips in one ciphertext block affect the resulting decryption in the next block
- This shows off how useful these side channel leaks are. A side channel leak is when a system unintentionally reveals information about it's inner workings
- For this challenge, the "decryption server" is leaking the fact that the padding is invalid
- For later challenges you'll see that even leaking how long the code takes to run can be enough to cryptography
- Some other side channels you won't see in cryptopals:
- The amount of power the CPU is using can be used to retrieve AES keys from smart cards
- Using information about the CPU cache to break OpenSSL's carefully implemented RSA algorithm https://web.eecs.umich.edu/~genkin/cachebleed/index.html
- CBC padding oracles also show up from time to time in CTFs, though more rarely due to how famous this attack is
- Here is a good explanation of the attack: https://robertheaton.com/2013/07/29/padding-oracle-attack/
- From my experience this challenge is not necessarily conceptually difficult, but can be a bit tricky to get right
- Start with just finding one byte, then work towards automating it for the rest of the bytes in the block, and then finally for multiple blocks
- Made sure the padding functions from set 2 are fully correct
- A plaintext that is all padding should be accepted
- A plaintext with no padding should be rejected
- Aside: Anyone else feeling that dread after realizing that all it takes to break a perfectly secure algorithm like AES and a reasonably sensible mode like CBC is something as small as having two different error messages for wrong password vs bad padding? Because I definitely was
Assorted guides
Various guides for useful techniques when solving cryptography challenges
z3
z3
is a useful tool for solving sets of constraints, for example when you know the state of the random number generator after n steps and want to easily recover the state from the start
Rust
python
is very convinient for communicating with services and rapidly prototyping but is often too slow for anything beyond a quick bruteforce. rust
is very very fast, supports very easy python interop using pyo3
, and has easy multithreaded iteration using rayon
.
Coming soon
Coming soon
Resources
Basic python tips and tricks
This should cover most of the python api you'll commonly use when solving cryptography problems (except pycryptodome
but we don't need to worry about that for now)
I would recommend having the python REPL open (type python
in the command line) and following along with the code examples
The basics
# The bytes type is usually defined like this
my_bytes = b"this is not a string, but a bytes object!"
list(my_bytes) # converts my_bytes to a list of numbers
bytes(my_list) # converts a list of numbers to a bytes object
A Bytes object is really just a list of numbers, and you can see that by running list(b"my bytes")
or b"my bytes"[0]
. There is one important detail, which is that a number in a byte array is always between 0 and 255, which is 2**8-1. For details, wait til cpsc 121!
In general
- You get bytes objects from reading stuff (ie
open("data","rb").read()
) or converting from other data formats (hex, base64) - You usually want to work with the list of numbers (because you can then map and iterate over it easily)
- You want to convert your final result to a bytes object to print it (because python will then format it nicely for you)
Two more important functions
chr(x) # converts a decimal number x between 0 and 255 to it's character
ord(c) # converts a character c to it's corresponding byte number
ord("a") == 97
chr(97) == "a"
Since all bytes are numbers and we sometimes want bytes to represent characters instead of random numbers, some smart people figured out the ascii table and that's what ord
and chr
use.
Decode and encode
Very similar to chr
and ord
"my string".encode() # converts from a string to bytes
b"my bytes".decode() # converts from bytes to a string
These can be thought of as shorthand for calling ord
/chr
on each character/number in a string/bytes.
How to work with bytes
Your best friends will probably be good ol' for loops
For the imperatively minded, this will print the numbers in the array, so 101
, 121
, 32
...
for b in b"my bytes":
print(b)
For the functionally minded (and those who took 110), python does have maps and filters and such
list(map(lambda x: x+1, b"my bytes"))
My preferred method is using python list comprehensions since they're basically less clunky maps
[x+1 for x in b"my bytes"]
XOR
If you've taken 121 you've seen it before, but it turns out that xor is actually quite important for various parts of cryptography. If you haven't taken 121, you'll want to understand how xor works so google "bitwise XOR". Good googling skills are important!
XOR in python
python has a xor operator, ^
x ^ y # this is x xor y
# x and y have to be numbers
Since ^
only operates on numbers, you have to use one of the above strategies for actually xoring your entire bytes object.
Files in python
This might be a bit tricky if you haven't seen this type of interface before, but here's how you work with files in python
mode = "r" # Read my file as a big string
mode = "rb" # Read my file as a big bytes object
mode = "w" # Write strings to my file
mode = "wb" # Write bytes to my file
f = open("my_filename", mode) # Open "./my_filename" with the given mode
# if you don't input a mode, the mode is assumed to be "r"
# You have to give a filename that already exists to use either r or rb
# But you can use w or wb with a file that doesnt exist, python will create the file for you
# f is a file object, the functions you can call on it are:
f.close() # Closes the file, must always be called or else bad things happen
# =============================================
# if you opened as mode = "r"
f.read() # Returns the entire contents of the file as a big string
f.readlines() # Returns an array of strings where each string is a line from the file
# =============================================
# if you opened as mode "rb"
f.read() # Returns the entire contents of the file as a big bytes object
# =============================================
# if you opened as mode "w"
f.write(my_string) # Writes the string into the file
# =============================================
# if you opened as mode "wb"
f.write(my_bytes) # Writes the bytes into the file
This is only a basic summary, but hopefully it gives you the basic gist for how files work in python. You can read more about them here. If you're curious about why files work like this and why you need to seek around a file then wait til 313! You'll learn all about them there.
Bash tricks
Did you know that you can automatically make any program write to a file instead of printing out it's results?
$ python my_script.py > result.txt
Anything that my_script.py
prints out will be put into result.txt
instead of being shown on the terminal.
For those who really want to write racket in python
Here's some functions to get you started
def cons(x, xs):
return [x] + xs
def first(b):
return b[0]
def rest(b):
return b[1:] # <-- python slice notation
def fn_for_bytes(b):
if len(b) == 0:
return # ...
else:
first_res = # ... fn_for_byte(first(b))
rest_res = # ... fn_for_bytes(rest(b))
result = # ... first_res, rest_res
return result
(PS: Though all the code you'll be presented with will be in an imperative style, so you should get used to the whole imperative thing)
Environment setup
For solving CTF challenges in general, there's a few things that you'll need. Specific instructions for how to install these tools are in their respective OS pages
A Linux shell
While not a strict requirement technically, a linux shell will almost inevetiably become a requirement as you do more CTF challenges. Installing tools, running binaries, debugging, these are all much easier to do with a fully fledged linux shell vs using a shell emulator or something like powershell
The good part is that installing a proper linux environment is now quite easy
- MacOS comes with one pre-installed
- Windows now supports Windows Subsystem for Linux (WSL) v2, which is easy to install and gives you everything you need
Basic tools
A basic suite of tools you'll want are
python
, the programming language of choice for most CTFers. Comes preinstalled on all linux shells- You'll want some additional libraries though, namely
pycryptodome
which implements all the cryptographic algorithms you'll need, andpwntools
which makes connecting to servers programatically easy
- You'll want some additional libraries though, namely
netcat
, a tool for connecting to ports on servers, which is what you'll need to do to connect to challenge instancescurl
, a tool for making http/https requests, useful for challenges that are on websites
Docker
Docker is a tool that lets you package up a "mini vm" of sorts as a list of instructions. This lets challenge authors give you a Dockerfile
and let you rebuild the challenge server for your own local testing, which you'll definitely want to be able to do
Environment setup: Windows
TLDR
- Install Ubuntu WSL2
sudo apt install netcat curl python3
python -m ensurepip --upgrade
python -m pip install pycryptodome pwntools
- Install Docker for windows and enable the wsl2 backend
- Install vscode and enable the
Remote: WSL
extension
Step 1: Install WSL
Microsoft provides very in depth installation instructions on https://docs.microsoft.com/en-us/windows/wsl/install. Ubuntu
is the OS you want if you have no prior experience with Linux
There's additional steps for setting up your installation at https://docs.microsoft.com/en-us/windows/wsl/setup/environment
- This includes information on setting up
docker
andvscode
, both of which you should install in a bit - Optional: WSL can be a bit sluggish with it's default settings, check out https://jade.fyi/blog/development-in-wsl/ for some more tuning
Step 2: Install basic tools
apt
is the package manager for Ubuntu, and it's what you'll use to install and update software for your WSL installation
Run sudo apt install netcat curl python3
in the terminal to install basic tools
In the future, run sudo apt update
and sudo apt upgrade
occasionally to keep everything up to date
Step 3: Install python libraries
pip
is the package manager for Python and you'll use it to install libraries and keep them up to date
virtualenv
creates small environments that you can safely install your tools into without having to worry about it affecting the rest of your system
Run mkdir ctf
to make a folder to put all your ctf related stuff in and cd ctf
to move into it
Run python3 -m venv ctf
to create a virtual environment (this may take awhile) and run source ctf/bin/activate
to activate it. You should now see a little (ctf)
on your terminal prompt
Run python -m ensurepip --upgrade
to install pip
Run python -m pip install pycryptodome pwntools
to install the two libraries you'll need (this may take awhile)
Run deactivate
after you're done to leave the virtual environment. Remember to run source ctf/bin/activate
each time you restart your terminal to enter the venv
Step 4: Install docker and vscode
Visual studio code is a versatile editor with great integration with WSL, I recommend it if you don't have a particular preference for editor https://code.visualstudio.com/
Instructions for installing Docker with the WSL2 backend can be found on https://docs.docker.com/desktop/windows/wsl/
Details for enabling the WSL features
- https://docs.microsoft.com/en-us/windows/wsl/tutorials/wsl-vscode
- https://docs.microsoft.com/en-us/windows/wsl/tutorials/wsl-containers