Faster Poly1305 key multicollisions

It is well known by now that encryption without authentication is insufficient, and many chosen-ciphertext attacks on improperly authenticated ciphertexts are now commonplace. Authenticated encryption—constructions that both encrypt and authenticate plaintexts in one sitting—are widespread at this point, with the two most common instances being AES-GCM and ChaChaPoly1305.

One property that the usual definitions of authenticated encryption do not capture is key commitment: a ciphertext is tied to a particular key, and it should not be possible to create ciphertexts that successfully decrypt under more than one key. Some systems will fail, or have unexpected properties, if their authenticated encryption is not committing; this was the case for Facebook’s message franking, the OPAQUE authenticated key exchange, some AWS and Google services, and more.

Recently, an attack surfaced for some systems that unwittingly relied on key-committing encryption: partitioning oracle attacks. The setup is simple: if the keyspace is known to be relatively small—say, if keys are derived from low-entropy passwords—and it is possible to detect whether a ciphertext was successfully validated or not, it becomes possible to recover the key/password using this oracle. This attack requires being able to generate ciphertexts that successfully decrypt under many different keys. One system where this kind of attack was possible was Shadowsocks proxy servers.

The Len et al. paper introducing partitioning oracle attacks considers generating these “key-colliding” ciphertexts for both AES-GCM and ChaChaPoly1305. In the case of GCM, generating such ciphertexts is asymptotically easy—it reduces to fast polynomial interpolation, which finds a ciphertext colliding for $l$ different keys in $O\left(l \log^2 l\right)$ finite field operations. The authors successfully generated ciphertexts decrypting under $262144$ different keys, and it is practical to go much higher. The case of ChaChaPoly1305 is more problematic, however. The authors, using linear algebra and some light bruteforce, generated ciphertexts that decrypt under up to $10$ different keys.

This post describes how to generate ChaChaPoly1305 ciphertexts (more precisely, Poly1305 inputs) that validate under at least hundreds of keys, reducing it to a well-studied lattice problem. We also argue that it is unlikely to be able to go much further without massive computing power due to the hardness of the very same lattice problems.

Colliding Poly1305 tags

The Poly1305 one-time authenticator takes two $16$-byte keys $(r, s)$1 and given a padded message $(m_0, m_1, \dots, m_{n-1})$2 produces \begin{align}\label{eq:definition} \text{Poly1305}(r, s, m) = \left(\left( \sum_{i=0}^{n-1} m_i r^{n-i+1} \bmod \left(2^{130}-5\right) \right) + s\right) \bmod 2^{128}. \end{align} The goal here is, given $l$ known key pairs $((r_0, s_0), (r_1, s_1), \dots, (r_{l-1}, s_{l-1}))$, to produce a message $m = (m_0, m_1, \dots)$ such that $\text{Poly1305}(r_0, s_0, m) = \dots = \text{Poly1305}(r_{l-1}, s_{l-1}, m)$.

This problem essentially reduces to polynomial interpolation. There are, however, a few complications that must be resolved in order to obtain valid ciphertexts:

  • There are a few ciphertext block choices that are not arbitrary. Namely, the last block must be the encoding of the message length, and cannot be arbitrary.
  • Every input message block is $16$ bytes long3, and must be padded with a 0x01 byte at the 17th byte. This effectively means that every coefficient $m_i$ in (\ref{eq:definition}) must lie in the interval $\left[2^{128}, 2^{129}-1\right]$.

We will now deal with both of these obstructions.

The first obstruction is rather easy to deal with. We begin with the congruence system, directly translating from the definition (\ref{eq:definition}): $$ \begin{align*} \left(\left( \sum_{i=0}^{n-1} m_i r_0^{n-i+1} \bmod (2^{130}-5) \right) + s_0\right) \bmod 2^{128} &= t_{0} \newline \dots \newline \left(\left( \sum_{i=0}^{n-1} m_i r_{l-1}^{n-i+1} \bmod (2^{130}-5) \right) + s_{l-1}\right) \bmod 2^{128} &= t_{l-1}. \end{align*} $$ Since the last block, corresponding to $r_i m_{n-1}$, is fixed, as it is solely dependent on the message length, we can move it to the right side, and obtain the system $$ \begin{align} \label{eq:system} \begin{split} m(r_0) &= \frac{\left(t_{0} - s_{0} \bmod 2^{128}\right) - r_0m_{n-1}}{r_0^2} \pmod{2^{130}-5} \newline \dots \newline m(r_{l-1}) &= \frac{\left(t_{l-1} - s_{l-1} \bmod 2^{128}\right) - r_{l-1}m_{n-1}}{r_{l-1}^2} \pmod{2^{130}-5}, \end{split} \end{align} $$ which is more suitable for interpolation4. As such, we are now ready to move on to the second obstruction—the Poly1305 padding. We split this process into 3 phases: interpolation, redundancy, and optimization.

Interpolation

Solving (\ref{eq:system}) can be accomplished in $O(l \log^2 l)$ field operations with standard interpolation algorithms, and yields a unique solution $m(x)$ of degree $\le l-1$. The chance that $m(x)$ has all of its coefficients in the required interval $\left[2^{128}, 2^{129}-1\right]$ is approximately $\left(\frac{2^{128}}{2^{130}-5}\right)^l \approx 2^{-2l}$. Except for very short degrees, simply hoping for a valid solution is untenable.

Redundancy

The above situation forces us to add redundancy to the solution $m(x)$. Using the correspondence $m(r) = m(x) \bmod (x - r)$ we have that the solution $m$ is the smallest polynomial satisfying $$ \begin{align*} m(x) \bmod (x - r_0) &= b_0 \newline \dots \newline m(x) \bmod (x - r_{l-1}) &= b_{l-1}, \end{align*} $$ where $b_i$ are the right sides of (\ref{eq:system}).

Let $p(x) = (x - r_0)(x - r_1)\dots(x - r_{l-1})$. To add redundancy, we look for a polynomial $m'(x)$ such that $$ \begin{align*} m'(x) \bmod p(x) &= m(x) \newline m'(x) \bmod f(x) &= g(x), \end{align*} $$ where $f(x)$ is an arbitrary monic polynomial of degree $d$, $\gcd(f(x), p(x)) = 1$, and $g(x)$ is an arbitrary polynomial of degree $d-1$. This is directly given by interpolation or the Chinese remainder theorem: $$ \begin{align}\label{eq:split} \begin{split} m'(x) &= m(x)\cdot f(x) \cdot \left(f(x)^{-1} \bmod p(x)\right) \bmod p(x)f(x)\newline &+ g(x)\cdot p(x) \cdot \left(p(x)^{-1} \bmod f(x)\right) \bmod p(x)f(x). \end{split} \end{align} $$ The arbitrary choices of $f(x)$ and $g(x)$ give us essentially unlimited freedom to manipulate the original solution until a suitable modification is reached5.

Optimization

We now have a way to generate an arbitrary number of higher-degree polynomials from a single solution, but making that solution fit the padding constraint remains difficult. Bruteforcing our way through this problem requires exponential time on the number of keys $l$. Our solution sidesteps this problem by using lattice reduction.

Let $m'_a(x)$ be the left side of the sum in (\ref{eq:split}), and $m'_b(x)$ be its right side. Both have degrees bounded by $l + d - 1$. Suppose we fix $f(x)$ to an arbitrary constant value; this results in $m'_a(x)$ being constant, and $m'_b(x)$ being of the form $g(x) \cdot h(x) \bmod p(x)f(x)$, where $h(x)=p(x) \cdot \left(p(x)^{-1} \bmod f(x)\right)$ only depends on $f(x)$ which, again, remains constant. Unpacking the multiplication, we have $$ \begin{align*} m'_b(x) &= g_0\cdot\left( h(x) \bmod p(x)f(x) \right) \newline & + g_1\cdot\left( xh(x) \bmod p(x)f(x) \right) \newline & \dots \newline & + g_{d-1}\cdot\left( x^{d-1}h(x) \bmod p(x)f(x) \right). \end{align*} $$ This set of linear combinations forms a $q$-ary lattice6 generated by the coefficient vectors of $h(x) \bmod p(x)f(x), xh(x) \bmod p(x)f(x), \dots, x^{d-1}h(x) \bmod p(x)f(x)$, that is, $$ \begin{align*} \Lambda_q(g) = \{ z \in \mathbb{Z}^{l+d} : z = A^t\cdot s \pmod{2^{130}-5} \text{ for some } s \in \mathbb{Z}^{d} \}, \end{align*} $$ where, abusing notation, $$ \begin{align*} A =% \begin{pmatrix} h(x) \bmod p(x)f(x) \newline xh(x) \bmod p(x)f(x)\newline \dots \newline x^{d-1}h(x) \bmod p(x)f(x) \end{pmatrix}. \end{align*} $$

This can be turned into a standard integer lattice by adding $l+d$ rows encoding the modulo $q$ equivalence class:

$$ \begin{pmatrix} h(x) \bmod p(x)f(x) \newline xh(x) \bmod p(x)f(x)\newline \dots \newline x^{d-1}h(x) \bmod p(x)f(x) \newline q\cdot I_{l+d} \end{pmatrix}. $$

After echelonization, this lattice basis will be of the form

$$ \begin{align}\label{eq:lattice} \begin{pmatrix} I_{d} & A' \newline 0 & q\cdot I_{l} \end{pmatrix}, \end{align} $$ with $A'$ an $l\times l$ matrix. This form is quite useful to compute the determinant, as will be necessary below.

The problem of finding a properly padded $m'(x)$ then consists of finding a lattice vector (respectively, a polynomial $m'_b$) close to $\left(2^{128}+2^{127}, \dots, 2^{128}+2^{127}\right) - m'_a(x)$, such that $m'_b + m'_a$ is close to the center of the $[2^{128}, 2^{129}-1]$ interval.

It should be pointed out that there is a connection to coding theory here. The basis of the lattice $\Lambda_q(A)$ can also be understood as the generator matrix for a $q$-ary linear code in $\mathbb{F}_q^{l+d}$. Finding the nearest codeword to $\left(2^{128}+2^{127}, \dots, 2^{128}+2^{127}\right) - m'_a(x)$ in the $L_{\infty}$ metric is the coding-theoretic formulation of the same problem. Like finding closest vectors, this problem is very hard in the general case.

The maximum error we are able to tolerate is a distance of $2^{127}$ in each coordinate (i.e., $L_{\infty} \le 2^{127}$). However, when dealing with lattices the $L_2$ norm is most common, and we can bound it instead by $\sqrt{l+d}2^{127}$. Which leads to an important question…

Do close enough vectors exist?

Yes, almost certainly, for sensible choices of $d$.

The volume of the $q$-ary lattice $\Lambda_q(A)$ is given by the determinant of its basis, which from (\ref{eq:lattice}) is easily determined to be $q^{l}$, with $q=2^{130}-5$. By Minkowski’s theorem, the shortest vector in this lattice is bounded by $\sqrt{l+d}q^{l/(l+d)}$, and the Gaussian heuristic tells us that for a random lattice most vectors will be at a distance $\approx \sqrt{\frac{l+d}{2\pi e}} q^{l/(l+d)} $ of the nearest lattice point.

Thus, as long as $\sqrt{l+d}q^{l/(l+d)}$ is much smaller than $\sqrt{l+d}2^{127}$, there is almost certainly a suitable vector close by.

Are such close vectors easy to find?

Mostly, as long as $l+d$ is not too large.

For dimensions $l+d$ up to $60$ or so, it is easy to exactly find closest vectors using, say, fplll or g6k. But for dimensions in the hundreds it quickly becomes unfeasible to find closest vectors, and we need to rely on approximation algorithms. Specifically, we use LLL or BKZ combined with Babai’s algorithm or Kannan’s embedding.

One important parameter of the lattice basis reduction algorithm being used is its root Hermite factor: a lattice basis reduction algorithm with root Hermite factor $\delta$ will yield vectors of length $\le \delta^{l+d} q^{l/(l+d)}$ in the reduced basis. LLL has a proven $\delta = 1.075,$ but in practice it’s closer to $1.021$, according to the experiments of Gama and Nguyen. The following table lists some of the practical $\delta$ factors for various lattice reduction methods:

LLL1.021
BKZ-201.013
BKZ-501.012
BKZ-851.001
BKZ-1101.009

In practice we observed that embedding usually works better than Babai. Embedding reduces a CVP problem to SVP by extending the lattice to $$ \begin{pmatrix} A & 0 \newline t & M \end{pmatrix}, $$ where $A$ is the basis (\ref{eq:lattice}), $t$ is the vector corresponding to $\left(2^{128}+2^{127}, \dots, 2^{128}+2^{127}\right) - m'_a(x)$, and the embedding factor is $M = \frac{q^{l/(l+d)}}{2\delta^{l+d}}$ to maximize the decoding radius, since we are not guaranteed that there is a particularly close vector relatively to the shortest vector in the lattice7.

The most important parameters here are the choice of $d$ and the root Hermite factor $\delta$. While increasing $d$ increases the lattice dimension, it also shortens the expected distance to the closest vector. And the lower the $\delta$, the better the approximation to the closest vector is going to be, but lower $\delta$ generally comes at a computational cost. So a balance must be reached.

For example, let $l = 1000$, and set $d = 300$. Using BKZ-20, we have an expected $1.013^{l+d}q^{l/(l+d)} \approx 2^{124.22}$, much lower than the error margin $\sqrt{l+d}2^{127} \approx 2^{132.17}$. Thus, it is likely that we are able to find a solution under this combination of parameters. Using plain LLL, however, gives us an expected $1.021^{l+d}q^{l/(l+d)} \approx 2^{138.98}$, making it unlikely to succeed in this case.

But running BKZ-20 on such a high dimension is going to be very expensive, so it might be a better choice to switch to LLL and increase $d$ to, say, $600$, in which case we have $1.021^{l+d}q^{l/(l+d)} \approx 2^{129.22}$, which is tight, but within range of being solved.

The following graph plots the (heuristically) minimum necessary degree $d$ to achieve a solution depending on the CVP-approximation algorithm used:

Necessary extra degree for various lattice reduction techniques.

It is clear that with an exact CVP solver the degree $d$ remains low8, but solving the CVP problem is NP-hard. Instead, using LLL or BKZ the approximation factor grows exponentially, and at some point it is no longer possible to find a solution, regardless of how high we set $d$. For LLL, this limit seems to be after $l \approx 1100$; for BKZ-20, around $l \approx 1800$.

It is also important to point out that even LLL, the fastest basis reduction algorithm, is pretty slow: the best variants take time $O((l+d)^4(l+d+\log q)\log q)$, so while it runs in polynomial time, the degree is not very good and reducing, say, a dimension-1000 lattice will take a long time9.

Practical experiments

Now we give a worked Sagemath example of the above method by creating a ciphertext valid for $8$ different keys. First, some setup and parameters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305
from cryptography.hazmat.primitives.ciphers.algorithms import ChaCha20
from cryptography.hazmat.primitives.poly1305 import Poly1305
from cryptography.hazmat.primitives.ciphers import Cipher
from fpylll import IntegerMatrix, CVP

q = 2^130-5
R = GF(q)
P.<x> = R[]
l = 8
d = 1

Then, we recreate the key derivation of $r, s$ within ChaChaPoly1305:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def generate_keys(l):
	result = []
	for i in range(l):
		k = os.urandom(32)
		chacha = Cipher(ChaCha20(k, b'\x00' * 16), mode=None).encryptor()
		rs = chacha.update(b'\x00'*32)
		r = int.from_bytes(rs[:16], 'little') & 0x0ffffffc0ffffffc0ffffffc0fffffff
		s = int.from_bytes(rs[16:], 'little')
		result.append((k, r, s))
	return result

k, r, s = zip(*generate_keys(l))

Now, generate some arbitrary tag that the ciphertext will have, and obtain the interpolation targets:

1
2
3
4
5
# target common tag 
tag = randint(0, 2^128)
# final polynomial has l+d blocks
E = ((16 * (l+d)) << 64) + 2^128 
t = [ (((tag - s[i]) % 2^128) - E*r[i])/r[i]^2 % q for i in range(l) ]

Now we are ready for the algorithmic part of the process. First, find $m$ by interpolation10:

1
2
# Interpolate
m = P.lagrange_polynomial(zip(r, t))
Having this solution, lift it to a bigger solution we can work with:
1
2
3
4
5
6
7
# CRT
p = prod([ x - ri for ri in r ])
f = x^d + P.random_element(d-1)
assert(gcd(f, p) == 1)
# g = P.random_element(d-1)
ma = m * f * inverse_mod(f, p) % (p*f)
mb =     p * inverse_mod(p, f) % (p*f)
Armed with $m'_a$ and $m'_b$, build the q-ary lattice:
1
2
3
4
5
6
7
# build q-ary lattice
qL = matrix(R, [ list(x^i * mb % (p*f)) for i in range(d) ])
zL = block_matrix(ZZ, 2, 1, [
  qL.echelon_form().change_ring(ZZ), 
  block_matrix(ZZ, 1, 2, [zero_matrix(l, d), identity_matrix(ZZ, l) * q])
])
target = vector([ ZZ((2^128+2^127) - ma[i]) for i in range(l+d) ]) 
Because this is dimension $9$, we can easily find the closest vector exactly, and will do so below.
1
2
L = IntegerMatrix.from_matrix(zL.LLL())
v = CVP.closest_vector(L, target)
Finally, having the closest vector, we now have the solution and may confirm that it does, in fact, work.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
final = vector(ZZ, (vector(v)+vector(ma))%q)
assert( (final - vector([2^128+2^127]*(l+d))).norm() <= sqrt(l+d)*2^127 )
assert( all([ 2^128 <= block < 2^129 for block in final ]) )
finalp = sum([ x^i * final[i] for i in range(l+d)])
assert( all([ finalp(ri) == ti for ri, ti in zip(r, t) ]) )
assert( all([ (ZZ((finalp * x^2 + E*x)(ri)) + si) % 2^128 == tag for ri,si in zip(r,s)]) )
ciphertext = b''.join([ int(block % 2^128).to_bytes(16, 'little') for block in reversed(final) ]) + tag.to_bytes(16, 'little')
for ki in k:
	aead = ChaCha20Poly1305(ki)
	aead.decrypt(b'\x00'*12, ciphertext, None)

While this is a toy example, it highlights all of the necessary components of the process, and generalizes easily to higher dimensions, where the only difference is the usage of LLL and the embedding method.

For a more elaborate example, the following Base64-encoded ChaChaPoly1305 ciphertext, which takes around 15 minutes to generate, most of which spent inside LLL, successfully decrypts under the 256 keys bytes([i])*32 for $0 \le i < 256$ and all-zero nonce:

I4HE6QnUAakXV6eicS1Jbtv5enbEPjymUBwz9b2cXm5NxjW/ld9r7UU9W+aGvXZtQ564Nobw67+x
AiX/t+ynmESCx+GyFD6nn/bMnYJ9o6ra9SXl7NwCHMWRMaBRNPN/i6W6I4Ywiln3+De3O1QCmKSh
0ikxlGdfBwZsxhLuxIF8kgjalYXGsJtHUJ2vUwGEoAkYDrCfimqKHjBCEWP/ig8DgSj3HONPLzq9
KJ2+C2A/eBjhqNfTF1CI8rA66xF7sxT1Kg7YUqhwS8V5bFvXZY/Mfv2cNjLLKPfNNAImkIPTHxav
NuyJo6/zbv4mvjJzVbMYM/MJi7y+kFNdfz8HcSml26NwU/ftyrpR9DnTYmXbC/W50OqG/VpxFpvR
CIee+o7B9f0jW3hZqX9/F/bfZpDYzl9AS6V5Q92ZGxT1YIpqai70ZMo7SnVUXJWOUDJraQlUX4ol
oerj318EHo0gkGY1ag39qk+t/kZXVJI2PJ6GIqlCG3OumxoNGqvq3wmXck2zvzwvmgjuvHPQRV9Z
eWK68X3J6ueqsdiSyCNoRnIea+rRkuW63/5ypXzXhyl2dJIyQlx6xiOp6ZhXxp9tkCwaPIIT36xy
OxaBioW254LYleWlGr9qDo2KhrKR8CR5NUknq7nNYwIU7gyMl74CaTK8RsRf8dfcSXsrg8XGyYqC
hYMv2P/FncMD3CH6PL9q9RXkm80VW51+etdUjMvtkWS76a0TZT3VxToQx5Li0Yg/j7FditPRcSmW
Erj+YDp6BudA5bter4EEqlZpsApyjIUOvf93QeW3qhLfp9V7h3WN260e9+7Pqxv4Gl6T5YiLU+z6
XLqFZiBJJP/BD8qojQCKfYnJiAU5WbF9hARfco7QERkGj9e8gKWgZQwWqch2WtC+rjrdUoKa9YDA
btEiksKGGSxECBCGSoeOYkCioYj3yvsVL/VybHmSeqjhnNV+CIbTC8+0VGks1T99CdAAo01FO4mT
xVywQ4u2QZz0vlv1DaP68WOWSUm811/8EqB7xNIGltL824TDWxzUEO3iimpWEqtFTKpuW15SY/gT
j4hs0O8SmFX27AmxRXE3yT1/ON4dzl+I1rhfTmoKlrzMgWHvhp3navVElF7uzM0HtYHVf/w8bIWK
cySP9EuzsVqWZd5MxUDwT1CmYrP7i7cjqlnwUeIq0ILpdLUWs8/MJHtAWyKNvarJB6or5yfB7G2e
kwFa5mz3K9ghENjo9N6bgZTUhAnb1gboOLYzy/2bJIRimWV436bc46A3oJtAWIlxO2LWgnxcwD6U
SBYThr94gPf4Dg9UbxNBQDTbO1Ng04Ih8dU097SGOGV9/n3Y05N65ruPz3+Ly1E7F4GpojUtcfdf
KuOVWK+vRB/CnmEtGHxJFPR49A2eINaijqdKzA6fpGcc0jRM2gxEn0JotCLPc/Q7ATIQ9/o+KLnW
nXymVo5PrKOaIp2L6MTzw9q0BqZ8R79GgdFYYWzZKmB7G7DQbh/0FPm7gknvznITLRkiPoTL+ImV
8csEfItR5HTnweCeZRvCMjW+cs+COX8UIkUCly3/uATMAla1BmqB3Gxw5I1HHr2INZtCpn8KIkoF
Ja6Fuj1VuoQiBZaiSSp7ynGMYAyyrWoPSQW6sxGgIFy+oZuLwPBV1fEEVuNx9hWlzPCAqpMRKvYP
f18dFhzLav8DbKwxvCf/4agvYPIASpn6e3tofaeT5WYj3kBcjqmbn5CFDtUwBL93EGPkOrCzU1qW
dgdPl/pcPoK6FdqXKQ7MOXSCwC6hzF79vyPhI12qaumF7t+clRyDOgSiy9yYwOe9i8de7p4aX+7D
aZlHInLlqozjaqQc1zxoNFcvLNskxwp4YM3OaHPb2Q1kzce++6onb/sIBWhd44v72qM/IA6cLH9w
KjRVcMEzOUkZrQBFo/uAIdVEo5YRdqzM5mWoJzJnegAFILsdEdbvc0HMpqs4oYdkiKbZP7AnjQ70
pn6qH2SBk0SJMlDrGa5A+9EIqdSVjH6cyoDhkaV6xIaU53DbOXQFJAgS5gaFOr7OrZHdcKdeOpkx
g7MFYFyDIt9cBeCrbyrc+J6FOEzuteh650G1cXK4jQGjO1fgdvISmob10PeKAhQf2e5P6+DPHSZ8
836Bju3Yx37CiftDHiuQwODJR33S6rHLsaVC+rWSlvJ2P1+Nk7gLcki1TmWEoeoQ4VFXeQDsrC9H
O7JGQQCMylL4MKUFGk43bZ6WEWH4S4a2DQx41KvAx6D/ZfaiI9ML7G3ni3MAJ/FCMW363m+KnAp4
HojA9Zi7c/Hn67Ibht7qnKJt+f1mqyW2MmOUyuBsJzw+guRKnywoS8M/7so6U17tbYTpe0FB8pb6
w8b8GnWDYWaagqhdVIAVpdDBaj0PaymBTpbX1/Fw9lXnXRPiKhSob3EkBeK5nE6prIbq0ZxU0oqD
z1R9OPFgwhAMihKtoGT1h13FNxI34Vx9z1i1PFX6N2N1Dp/gIBQ9/FK2wQi9jraHjneeTu1aiRXP
bbvhQg87e+BodmwHCLVeWTz2VbMEwHaB/TvdCLERYvkrsdngQf5cDbkp7B5OOgweyS2jabVNW6hw
KdeK5X6mrR9btBuuO4J1DtkFZ87J2thNwnza6h6BTaBlrmWrSZP1KE/+OdfgeZ65Y4LFcZOYeeEX
eoErc2rgZ+pwSpoTx6D+Zxazi0+HYS8RC6ndOU7uUujIejRFdEFsR/f/FGoc54K7Wi3g9opSHkDK
ab2pcEyfNuak/xKM6EsjgHLEdiHaI7+mIiLld97Jz9p+0pzigHL14whp7oMxeHEDBTyTzcqDIGrc
CSN6NWxOtBDnMlD5ClFIjKnXd9BBn18sDWSIAJ0E7PXbIpOB5HJLHOAQAGB7NvlEXQKOACG+ZA7Q
IXVzNNR2BRcIhQ5K3hbHEwJcH9RoBn2WXJD+ek45Xu0jbHkxSXrUvPiDkEuFSQPbMnlxu6/JklYP
bt9yBog//mtxu9kwhM2NZYjkI36zEqrpmn+97K6GbEeh0S0g+UWupPnvDSwJ1RibX+MOI//8jE77
t7vitNK2moMgACdkPluNqgPF5JMFiJxnTAN8pIY6fY0f01lpFRiol9WbRdN7Fu5O3V5xhtLw9JGk
6Y2n4YREEiryFeqKH4Nfl/+zLdYUaSJ+PILeqYzVbfXyKMXIs3W3JhzDMWQaRXH+oM5UEso6iHyF
2obXb8p7u20xr7i79CigLqQxGLJQluXq6nrJX3odAoqWNYAVZ2s8jATQuLo7PRzFVMpAnHRnICnt
ezj/h2zA7bjtgyOUf0XW9CIuYeiD+dU/9ad3FIl677Pxe+ZJpTwkOuLccIdguWBE6gZu71Q6M1Qt
84OhfbWLi4o3wyfrmuWjGKeNCJsa7F8XN2RIBboPgioU7b5yJTLyrFspNAaUA/7ag0+GewKXc79P
iKl64WiUqe4FzW/EbPwt3Kuoyi4hcqgGIwioUQdHDrv93W0B7XlROPuagCjEC4KR9DmBh4u8HbPM
RqXSb9P2odME2Hc4g76TzZJ09PFFojBIP1dlmod6a13ajWfZvsv50nGg+ItbYTfpKnfy/pHvdZri
5SChGYzhl0Z7vj9xMvLREWpWVF4rOrcQlPqdx/gs+MYUQv0JTIh1c224PvS7LqXDoWINoIxBgdVx
qAnyrk3jvVXDDF3aqflDni/o2albZmuLOruipps0gIoG93FHKggg9Bh6a33C9zmkl1yiKI/FUgM7
/HdikJVMjFeC5PlY1gC502NLIY1Lg4PqyAJOcwHx4P8L+1CRgd9pGC/fBLnMvPCyC/kLOTqxbddQ
kNEchEKiSGsvJFfwN3A+MYg+4trGt9c43fDlp8J1n6d3ejORJARUoXrMMB3Wc8+PviR0btzJvQZB
gctvOXwmZck8gqQGliImjIbs5y+T6aQQ0D/7JDSCW+B2+vadgBOZyoHMNQVRZUYMF3JYDWhFJor9
YGGCmFbhNCgv61N8BRJDzb9e3x/qVPAHcg9EeAY2HJBl0Sc3R2R+zDakSm2oxv1B0dpLzNyKjFP0
/s2FVA1deRiF1mpaEqn6ZIxJgFRs3oP5zlj7F5VUzik7nYJI+j3hIEPDHp/56PW3xehvCFzq4BbD
VZKpetargiQqg1zyFfx0TEkxDO7ySel2qowXXdUU3lYRr9KkeMfXVlaaxTx+QzsBZBZTb12dgBVe
hvNAKo+Fj2Ng0akQCB51/4OfWTZSux38W569Y3YI+iJdB1/FC/21T1B+0HeQLzFUgpziDzBHf3an
PvMgnVFlsrEWZDEdw8iHg8+aXnCm24qKkqzRd+EiM5jURso1ZP1qf7qnIWKRcA+zmiYlH4Vht4f5
M5kw8Sl8+56EPXHCOHls2ml+o9+VvH8w/3USRlMMrCSlJQpxcFFW5uOWafcV5IQfk7OEijZbb9ZZ
bHEi2Hlj1jWSyCr9decjyvxo8u5pgIlVyzCp1SCBcWeOcBNiM5FnI1CHKh5fWczoAWKHjNaMevVn
xMNANvXdZvmnwKZLcU7TwLCNrkLsOsk0mqly3o1vu9OlADjxniAAOi0AONqgDEMM0XRpVLXltZcQ
j9xAdcwdguDAiEgCYOjzmvFvI3h6oevbi5ZEpEFAin9S9BiUZ3GLjalJpF9rHuPTqiOmfFOKLBjf
KfRxmNay8yWxJHmKhZzvmbUDoDSIr7U1HzmKz9VJ+5qE6rKAVir9g9MjoFZ8Jin8bKrtKVdtfF2F
mowluZXDKO5EsmYoHcKIQoB6dzSnb8Iju0LPHxIqmMBwavgag0UVW33ZyAH1mFdQZ2H3Uo6b7aGD
8d6NtKQu+EuUc5Anb63YTKCp0g0jqR5lf+QjOS8d1c33wdzzYUr81GPZQhoL9FxMM5T/D0vKWbZ5
cdK28a50Hgw+Hc1QkOKuZ3C88L9movKe+W9RdBGARn1psZ55jN/LTYp6TRKHoZR3apl71t4LF9wf
ZBVbdvsBjCijTwmlGo+pH4awTIiWYHrm2zxipJ6mZCL90QwTXFltOzeAonEsOYODSe8O/B9clwuW
/fKMjkI3HBIWun2HZZEwryAbQab4RshQxQUGM/WPY0H7cjbYVGOwUbMhWqYkjaMc8u/5/GJhrIhX
Xx0esm0leO4xgX2rUiA77rE/QHlZDm/ShxHdYaVNbrp8aQ5HXeCx7ywowvrff/PRGwMGDXXtQ+A/
cLcWqurrqaGq8WuP3dzGLogxTtUG4j/AuHyjc0csDF9d2vdWDwkCXzccd6wTR250G9nAmVsnXgcT
KtCdwl12CuyIHcPQRz/Qql90ZyLdvDmPIUWGSsUeuGFR7JI51ZNRxX8iH1TvEYZEmR+SyCG7Ay83
4RIDE4MuNUGhVJsYg+Rxw22dc+6V4dlVY2Ey+XV67Uoa9NhZQ7bE2HCJ5zthKjdvm5frUo1MuhJC
ccpjJhF7QqJMPgdIMpkGM3wnnj9dEEaDqCDBWWjVg1qc1eX3ARzCkBEOjPlSpsuvc7nx2vnS/lz0
Vy8BJp3cDGqzAUsR5IKe4etSAnyANcqLPbK6sUuUGQHIcbeMgjeYbnRZSZYZUBSrZQdwF1BzaYq1
F0pHlfqpEztVtUtIVC6LwJL5XGtO3bLC7dumnylwfY3Ghd1nXlYJcT4NOdPzC2/EKk8yNpDs2w0R
GeeKAgmEzEOrPbZRLXDkSxZ6nBCsh347pBmrwgdRu7/1N7BWHI0zRRB4bxtw2D/6gSnTO+OJWxLD
3zMM2JpsmHGH2lYudJ4uHrRfWhpYJjgTNag1l4ArkwKqrleXz5RIhmId+2yFg4i7PV0jPH3JXA7U
II/4isLAwMHW1sbYYseavHetN4lOe+pVn6uqfvi0fG9RgiN8nTDEww+g315EX045APXmeDDoLhoM
EVfwhlnW5nqAbV6nCIe4X8rB+9ZoHN3IkVKcOUHp7zHR9t4bPMI1ZhivdtsLvPfAW3372wBDEyZ3
uGvyxFigFnKwLXbRg33Q+NeGMBgWR6REuDpKl/3Qg+bqyA==

Going even further, we also generated a longer message validating under 512 keys11 and the all-zero nonce. The poor scaling of LLL is more noticeable here: we went from from $l = 256, d = 23$ in 15 minutes to $l = 512, d = 94$ in approximately 48 hours, both on a single core.

Conclusion

It is not every day that we stumble upon a problem in symmetric cryptography where lattices will be of use. Lattices and lattice reduction are, of course, more often associated with either breaking public-key cryptography, such as various textbook RSA implementation mistakes, (EC)DSA nonce leaks, or building public-key primitives, most notably the NTRU and (Ring-)LWE public-key encryption, both of which being now featured in round-3 of NIST’s post-quantum competition.

As you can see, figuring out how to efficiently—or, rather, as efficiently as possible—generate constrained polynomials usable by a partitioning oracle attack on Poly1305 can be a fun and enlightening exercise.


  1. In ChaChaPoly1305 the keys $(r, s)$ are derived using Chacha20; we skip past these details, as they are not too important for the exposition. But the proof of concept code does take this into account. ↩︎

  2. In the case of ChaChaPoly1305 the message is of the form $\text{pad}(a) | \text{pad}(m) | \text{pad}(\text{encode}(|a|,|m|))$, where $\text{pad}(\cdot)$ splits its input into 16-byte blocks and extends each block to 17 bytes by appending one 0x01 byte and as many 0x00 bytes as necessary. Here we ignore the existence of the associated data $a$; it’s assumed to be empty. ↩︎

  3. The last message block is an exception to this, but to keep things simple we will assume that the message length is always a multiple of $16$ bytes. ↩︎

  4. There is an implicit switch in coefficient order here, since the most significant coefficient was $m_0$ before, and is now $m_{n-2}$, but this is purely a matter of encoding. ↩︎

  5. In hindsight, a polynomial of the form $m'(x) = m(x) + p(x)f(x)$ for arbitrary $f(x)$ of degree $d$ would have been simpler, and would have resulted in the same lattices as here. ↩︎

  6. This lattice is quite similar to the typical LWE lattice. More precisely, it is reminiscent of Ring-LWE, with the caveat that our modulus is random, whereas the likes of Ring-LWE and NTRU use nice structured moduli of the form $x^{2^n} + 1$ or $x^n - 1$. ↩︎

  7. In the LWE setting this is usually framed as a reduction from BDD to uSVP, that is, a bounded-distance decoding problem into the unique SVP problem. But since we don’t have a promise here that there is a disproportionately close vector to our target (relative to the length of the shortest vectors), this sort of reduction does not apply. ↩︎

  8. Indeed, the necessary $d$ required with an exact CVP oracle is very close to the $d = \lceil l/64 \rceil$ that would be expected from a bruteforce search. ↩︎

  9. The recent asymptotically faster LLL-type reduction of Kirchner et al. show promise to enable large-dimensional LLL reductions. ↩︎

  10. We use Sage’s quadratic interpolation algorithm here, as the number of points is negligible. ↩︎

  11. Base64-encoded, one per line. ↩︎