It is well known by now that encryption without authentication is insufficient, and many chosenciphertext attacks on improperly authenticated ciphertexts are now commonplace. Authenticated encryption—constructions that both encrypt and authenticate plaintexts in one sitting—are widespread at this point, with the two most common instances being AESGCM and ChaChaPoly1305.
One property that the usual definitions of authenticated encryption do not capture is key commitment: a ciphertext is tied to a particular key, and it should not be possible to create ciphertexts that successfully decrypt under more than one key. Some systems will fail, or have unexpected properties, if their authenticated encryption is not committing; this was the case for Facebook’s message franking, the OPAQUE authenticated key exchange, some AWS and Google services, and more.
Recently, an attack surfaced for some systems that unwittingly relied on keycommitting encryption: partitioning oracle attacks. The setup is simple: if the keyspace is known to be relatively small—say, if keys are derived from lowentropy passwords—and it is possible to detect whether a ciphertext was successfully validated or not, it becomes possible to recover the key/password using this oracle. This attack requires being able to generate ciphertexts that successfully decrypt under many different keys. One system where this kind of attack was possible was Shadowsocks proxy servers.
The Len et al. paper introducing partitioning oracle attacks considers generating these “keycolliding” ciphertexts for both AESGCM and ChaChaPoly1305. In the case of GCM, generating such ciphertexts is asymptotically easy—it reduces to fast polynomial interpolation, which finds a ciphertext colliding for $l$ different keys in $O\left(l \log^2 l\right)$ finite field operations. The authors successfully generated ciphertexts decrypting under $262144$ different keys, and it is practical to go much higher. The case of ChaChaPoly1305 is more problematic, however. The authors, using linear algebra and some light bruteforce, generated ciphertexts that decrypt under up to $10$ different keys.
This post describes how to generate ChaChaPoly1305 ciphertexts (more precisely, Poly1305 inputs) that validate under at least hundreds of keys, reducing it to a wellstudied lattice problem. We also argue that it is unlikely to be able to go much further without massive computing power due to the hardness of the very same lattice problems.
Colliding Poly1305 tags
The Poly1305 onetime authenticator takes two $16$byte keys $(r, s)$^{1} and given a padded message $(m_0, m_1, \dots, m_{n1})$^{2} produces \begin{align}\label{eq:definition} \text{Poly1305}(r, s, m) = \left(\left( \sum_{i=0}^{n1} m_i r^{ni+1} \bmod \left(2^{130}5\right) \right) + s\right) \bmod 2^{128}. \end{align} The goal here is, given $l$ known key pairs $((r_0, s_0), (r_1, s_1), \dots, (r_{l1}, s_{l1}))$, to produce a message $m = (m_0, m_1, \dots)$ such that $\text{Poly1305}(r_0, s_0, m) = \dots = \text{Poly1305}(r_{l1}, s_{l1}, m)$.
This problem essentially reduces to polynomial interpolation. There are, however, a few complications that must be resolved in order to obtain valid ciphertexts:
 There are a few ciphertext block choices that are not arbitrary. Namely, the last block must be the encoding of the message length, and cannot be arbitrary.
 Every input message block is $16$ bytes long^{3}, and must be padded with a
0x01
byte at the 17th byte. This effectively means that every coefficient $m_i$ in (\ref{eq:definition}) must lie in the interval $\left[2^{128}, 2^{129}1\right]$.
We will now deal with both of these obstructions.
The first obstruction is rather easy to deal with. We begin with the congruence system, directly translating from the definition (\ref{eq:definition}): $$ \begin{align*} \left(\left( \sum_{i=0}^{n1} m_i r_0^{ni+1} \bmod (2^{130}5) \right) + s_0\right) \bmod 2^{128} &= t_{0} \newline \dots \newline \left(\left( \sum_{i=0}^{n1} m_i r_{l1}^{ni+1} \bmod (2^{130}5) \right) + s_{l1}\right) \bmod 2^{128} &= t_{l1}. \end{align*} $$ Since the last block, corresponding to $r_i m_{n1}$, is fixed, as it is solely dependent on the message length, we can move it to the right side, and obtain the system $$ \begin{align} \label{eq:system} \begin{split} m(r_0) &= \frac{\left(t_{0}  s_{0} \bmod 2^{128}\right)  r_0m_{n1}}{r_0^2} \pmod{2^{130}5} \newline \dots \newline m(r_{l1}) &= \frac{\left(t_{l1}  s_{l1} \bmod 2^{128}\right)  r_{l1}m_{n1}}{r_{l1}^2} \pmod{2^{130}5}, \end{split} \end{align} $$ which is more suitable for interpolation^{4}. As such, we are now ready to move on to the second obstruction—the Poly1305 padding. We split this process into 3 phases: interpolation, redundancy, and optimization.
Interpolation
Solving (\ref{eq:system}) can be accomplished in $O(l \log^2 l)$ field operations with standard interpolation algorithms, and yields a unique solution $m(x)$ of degree $\le l1$. The chance that $m(x)$ has all of its coefficients in the required interval $\left[2^{128}, 2^{129}1\right]$ is approximately $\left(\frac{2^{128}}{2^{130}5}\right)^l \approx 2^{2l}$. Except for very short degrees, simply hoping for a valid solution is untenable.
Redundancy
The above situation forces us to add redundancy to the solution $m(x)$. Using the correspondence $m(r) = m(x) \bmod (x  r)$ we have that the solution $m$ is the smallest polynomial satisfying $$ \begin{align*} m(x) \bmod (x  r_0) &= b_0 \newline \dots \newline m(x) \bmod (x  r_{l1}) &= b_{l1}, \end{align*} $$ where $b_i$ are the right sides of (\ref{eq:system}).
Let $p(x) = (x  r_0)(x  r_1)\dots(x  r_{l1})$. To add redundancy, we look for a polynomial $m’(x)$ such that $$ \begin{align*} m’(x) \bmod p(x) &= m(x) \newline m’(x) \bmod f(x) &= g(x), \end{align*} $$ where $f(x)$ is an arbitrary monic polynomial of degree $d$, $\gcd(f(x), p(x)) = 1$, and $g(x)$ is an arbitrary polynomial of degree $d1$. This is directly given by interpolation or the Chinese remainder theorem: $$ \begin{align}\label{eq:split} \begin{split} m’(x) &= m(x)\cdot f(x) \cdot \left(f(x)^{1} \bmod p(x)\right) \bmod p(x)f(x)\newline &+ g(x)\cdot p(x) \cdot \left(p(x)^{1} \bmod f(x)\right) \bmod p(x)f(x). \end{split} \end{align} $$ The arbitrary choices of $f(x)$ and $g(x)$ give us essentially unlimited freedom to manipulate the original solution until a suitable modification is reached.
Optimization
We now have a way to generate an arbitrary number of higherdegree polynomials from a single solution, but making that solution fit the padding constraint remains difficult. Bruteforcing our way through this problem requires exponential time on the number of keys $l$. Our solution sidesteps this problem by using lattice reduction.
Let $m’_a(x)$ be the left side of the sum in (\ref{eq:split}), and $m’_b(x)$ be its right side. Both have degrees bounded by $l + d  1$. Suppose we fix $f(x)$ to an arbitrary constant value; this results in $m’_a(x)$ being constant, and $m’_b(x)$ being of the form $g(x) \cdot h(x) \bmod p(x)f(x)$, where $h(x)=p(x) \cdot \left(p(x)^{1} \bmod f(x)\right)$ only depends on $f(x)$ which, again, remains constant. Unpacking the multiplication, we have $$ \begin{align*} m’_b(x) &= g_0\cdot\left( h(x) \bmod p(x)f(x) \right) \newline & + g_1\cdot\left( xh(x) \bmod p(x)f(x) \right) \newline & \dots \newline & + g_{d1}\cdot\left( x^{d1}h(x) \bmod p(x)f(x) \right). \end{align*} $$ This set of linear combinations forms a $q$ary lattice^{5} generated by the coefficient vectors of $h(x) \bmod p(x)f(x), xh(x) \bmod p(x)f(x), \dots, x^{d1}h(x) \bmod p(x)f(x)$, that is, $$ \begin{align*} \Lambda_q(g) = \{ z \in \mathbb{Z}^{l+d} : z = A^t\cdot s \pmod{2^{130}5} \text{ for some } s \in \mathbb{Z}^{d} \}, \end{align*} $$ where, abusing notation, $$ \begin{align*} A =% \begin{pmatrix} h(x) \bmod p(x)f(x) \newline xh(x) \bmod p(x)f(x)\newline \dots \newline x^{d1}h(x) \bmod p(x)f(x) \end{pmatrix}. \end{align*} $$
This can be turned into a standard integer lattice by adding $l+d$ rows encoding the modulo $q$ equivalence class:
$$ \begin{pmatrix} h(x) \bmod p(x)f(x) \newline xh(x) \bmod p(x)f(x)\newline \dots \newline x^{d1}h(x) \bmod p(x)f(x) \newline q\cdot I_{l+d} \end{pmatrix}. $$
After echelonization, this lattice basis will be of the form
$$ \begin{align}\label{eq:lattice} \begin{pmatrix} I_{d} & A’ \newline 0 & q\cdot I_{l} \end{pmatrix}, \end{align} $$ with $A'$ an $l\times l$ matrix. This form is quite useful to compute the determinant, as will be necessary below.
The problem of finding a properly padded $m’(x)$ then consists of finding a lattice vector (respectively, a polynomial $m’_b$) close to $\left(2^{128}+2^{127}, \dots, 2^{128}+2^{127}\right)  m’_a(x)$, such that $m’_b + m’_a$ is close to the center of the $[2^{128}, 2^{129}1]$ interval.
It should be pointed out that there is a connection to coding theory here. The basis of the lattice $\Lambda_q(A)$ can also be understood as the generator matrix for a $q$ary linear code in $\mathbb{F}_q^{l+d}$. Finding the nearest codeword to $\left(2^{128}+2^{127}, \dots, 2^{128}+2^{127}\right)  m’_a(x)$ in the $L_{\infty}$ metric is the codingtheoretic formulation of the same problem. Like finding closest vectors, this problem is very hard in the general case.
The maximum error we are able to tolerate is a distance of $2^{127}$ in each coordinate (i.e., $L_{\infty} \le 2^{127}$). However, when dealing with lattices the $L_2$ norm is most common, and we can bound it instead by $\sqrt{l+d}2^{127}$. Which leads to an important question…
Do close enough vectors exist?
Yes, almost certainly, for sensible choices of $d$.
The volume of the $q$ary lattice $\Lambda_q(A)$ is given by the determinant of its basis, which from (\ref{eq:lattice}) is easiy determined to be $q^{l}$, with $q=2^{130}5$. By Minkowski’s theorem, the shortest vector in this lattice is bounded by $\sqrt{l+d}q^{l/(l+d)}$, and the Gaussian heuristic tells us that for a random lattice most vectors will be at a distance $\approx \sqrt{\frac{l+d}{2\pi e}} q^{l/(l+d)} $ of the nearest lattice point.
Thus, as long as $\sqrt{l+d}q^{l/(l+d)}$ is much smaller than $\sqrt{l+d}2^{127}$, there is almost certainly a suitable vector close by.
Are such close vectors easy to find?
Mostly, as long as $l+d$ is not too large.
For dimensions $l+d$ up to $60$ or so, it is easy to exactly find closest vectors using, say, fplll or g6k. But for dimensions in the hundreds it quickly becomes unfeasible to find closest vectors, and we need to rely on approximation algorithms. Specifically, we use LLL or BKZ combined with Babai’s algorithm or Kannan’s embedding.
One important parameter of the lattice basis reduction algorithm being used is its root Hermite factor: a lattice basis reduction algorithm with root Hermite factor $\delta$ will yield vectors of length $\le \delta^{l+d} q^{l/(l+d)}$ in the reduced basis. LLL has a proven $\delta = 1.075,$ but in practice it’s closer to $1.021$, according to the experiments of Gama and Nguyen. The following table lists some of the practical $\delta$ factors for various lattice reduction methods:
LLL  1.021 
BKZ20  1.013 
BKZ50  1.012 
BKZ85  1.001 
BKZ110  1.009 
In practice we observed that embedding usually works better than Babai. Embedding reduces a CVP problem to SVP by extending the lattice to $$ \begin{pmatrix} A & 0 \newline t & M \end{pmatrix}, $$ where $A$ is the basis (\ref{eq:lattice}), $t$ is the vector corresponding to $\left(2^{128}+2^{127}, \dots, 2^{128}+2^{127}\right)  m’_a(x)$, and the embedding factor is $M = \frac{q^{l/(l+d)}}{2\delta^{l+d}}$ to maximize the decoding radius, since we are not guaranteed that there is a particularly close vector relatively to the shortest vector in the lattice^{6}.
The most important parameters here are the choice of $d$ and the root Hermite factor $\delta$. While increasing $d$ increases the lattice dimension, it also shortens the expected distance to the closest vector. And the lower the $\delta$, the better the approximation to the closest vector is going to be, but lower $\delta$ generally comes at a computational cost. So a balance must be reached.
For example, let $l = 1000$, and set $d = 300$. Using BKZ20, we have an expected $1.013^{l+d}q^{l/(l+d)} \approx 2^{124.22}$, much lower than the error margin $\sqrt{l+d}2^{127} \approx 2^{132.17}$. Thus, it is likely that we are able to find a solution under this combination of parameters. Using plain LLL, however, gives us an expected $1.021^{l+d}q^{l/(l+d)} \approx 2^{138.98}$, making it unlikely to succeed in this case.
But running BKZ20 on such a high dimension is going to be very expensive, so it might be a better choice to switch to LLL and increase $d$ to, say, $600$, in which case we have $1.021^{l+d}q^{l/(l+d)} \approx 2^{129.22}$, which is tight, but within range of being solved.
The following graph plots the (heuristically) minimum necessary degree $d$ to achieve a solution depending on the CVPapproximation algorithm used:
It is clear that with an exact CVP solver the degree $d$ remains low^{7}, but solving the CVP problem is NPhard. Instead, using LLL or BKZ the approximation factor grows exponentially, and at some point it is no longer possible to find a solution, regardless of how high we set $d$. For LLL, this limit seems to be after $l \approx 1100$; for BKZ20, around $l \approx 1800$.
It is also important to point out that even LLL, the fastest basis reduction algorithm, is pretty slow: the best variants take time $O((l+d)^5(l+d+\log q)\log q)$, so while it runs in polynomial time, the degree is not very good and reducing, say, a dimension1000 lattice will take a long time.
Practical experiments
Now we give a worked Sagemath example of the above method by creating a ciphertext valid for $8$ different keys. First, some setup and parameters:


Then, we recreate the key derivation of $r, s$ within ChaChaPoly1305:


Now, generate some arbitrary tag that the ciphertext will have, and obtain the interpolation targets:


Now we are ready for the algorithmic part of the process. First, find $m$ by interpolation^{8}:










While this is a toy example, it highlights all of the necessary components of the process, and generalizes easily to higher dimensions, where the only difference is the usage of LLL and the embedding method.
For a more elaborate example, the following Base64encoded ChaChaPoly1305 ciphertext, which takes around 15 minutes to generate, most of which spent inside LLL, successfully decrypts under the 256 keys bytes([i])*32
for $0 \le i < 256$ and allzero nonce:
I4HE6QnUAakXV6eicS1Jbtv5enbEPjymUBwz9b2cXm5NxjW/ld9r7UU9W+aGvXZtQ564Nobw67+x AiX/t+ynmESCx+GyFD6nn/bMnYJ9o6ra9SXl7NwCHMWRMaBRNPN/i6W6I4Ywiln3+De3O1QCmKSh 0ikxlGdfBwZsxhLuxIF8kgjalYXGsJtHUJ2vUwGEoAkYDrCfimqKHjBCEWP/ig8DgSj3HONPLzq9 KJ2+C2A/eBjhqNfTF1CI8rA66xF7sxT1Kg7YUqhwS8V5bFvXZY/Mfv2cNjLLKPfNNAImkIPTHxav NuyJo6/zbv4mvjJzVbMYM/MJi7y+kFNdfz8HcSml26NwU/ftyrpR9DnTYmXbC/W50OqG/VpxFpvR CIee+o7B9f0jW3hZqX9/F/bfZpDYzl9AS6V5Q92ZGxT1YIpqai70ZMo7SnVUXJWOUDJraQlUX4ol oerj318EHo0gkGY1ag39qk+t/kZXVJI2PJ6GIqlCG3OumxoNGqvq3wmXck2zvzwvmgjuvHPQRV9Z eWK68X3J6ueqsdiSyCNoRnIea+rRkuW63/5ypXzXhyl2dJIyQlx6xiOp6ZhXxp9tkCwaPIIT36xy OxaBioW254LYleWlGr9qDo2KhrKR8CR5NUknq7nNYwIU7gyMl74CaTK8RsRf8dfcSXsrg8XGyYqC hYMv2P/FncMD3CH6PL9q9RXkm80VW51+etdUjMvtkWS76a0TZT3VxToQx5Li0Yg/j7FditPRcSmW Erj+YDp6BudA5bter4EEqlZpsApyjIUOvf93QeW3qhLfp9V7h3WN260e9+7Pqxv4Gl6T5YiLU+z6 XLqFZiBJJP/BD8qojQCKfYnJiAU5WbF9hARfco7QERkGj9e8gKWgZQwWqch2WtC+rjrdUoKa9YDA btEiksKGGSxECBCGSoeOYkCioYj3yvsVL/VybHmSeqjhnNV+CIbTC8+0VGks1T99CdAAo01FO4mT xVywQ4u2QZz0vlv1DaP68WOWSUm811/8EqB7xNIGltL824TDWxzUEO3iimpWEqtFTKpuW15SY/gT j4hs0O8SmFX27AmxRXE3yT1/ON4dzl+I1rhfTmoKlrzMgWHvhp3navVElF7uzM0HtYHVf/w8bIWK cySP9EuzsVqWZd5MxUDwT1CmYrP7i7cjqlnwUeIq0ILpdLUWs8/MJHtAWyKNvarJB6or5yfB7G2e kwFa5mz3K9ghENjo9N6bgZTUhAnb1gboOLYzy/2bJIRimWV436bc46A3oJtAWIlxO2LWgnxcwD6U SBYThr94gPf4Dg9UbxNBQDTbO1Ng04Ih8dU097SGOGV9/n3Y05N65ruPz3+Ly1E7F4GpojUtcfdf KuOVWK+vRB/CnmEtGHxJFPR49A2eINaijqdKzA6fpGcc0jRM2gxEn0JotCLPc/Q7ATIQ9/o+KLnW nXymVo5PrKOaIp2L6MTzw9q0BqZ8R79GgdFYYWzZKmB7G7DQbh/0FPm7gknvznITLRkiPoTL+ImV 8csEfItR5HTnweCeZRvCMjW+cs+COX8UIkUCly3/uATMAla1BmqB3Gxw5I1HHr2INZtCpn8KIkoF Ja6Fuj1VuoQiBZaiSSp7ynGMYAyyrWoPSQW6sxGgIFy+oZuLwPBV1fEEVuNx9hWlzPCAqpMRKvYP f18dFhzLav8DbKwxvCf/4agvYPIASpn6e3tofaeT5WYj3kBcjqmbn5CFDtUwBL93EGPkOrCzU1qW dgdPl/pcPoK6FdqXKQ7MOXSCwC6hzF79vyPhI12qaumF7t+clRyDOgSiy9yYwOe9i8de7p4aX+7D aZlHInLlqozjaqQc1zxoNFcvLNskxwp4YM3OaHPb2Q1kzce++6onb/sIBWhd44v72qM/IA6cLH9w KjRVcMEzOUkZrQBFo/uAIdVEo5YRdqzM5mWoJzJnegAFILsdEdbvc0HMpqs4oYdkiKbZP7AnjQ70 pn6qH2SBk0SJMlDrGa5A+9EIqdSVjH6cyoDhkaV6xIaU53DbOXQFJAgS5gaFOr7OrZHdcKdeOpkx g7MFYFyDIt9cBeCrbyrc+J6FOEzuteh650G1cXK4jQGjO1fgdvISmob10PeKAhQf2e5P6+DPHSZ8 836Bju3Yx37CiftDHiuQwODJR33S6rHLsaVC+rWSlvJ2P1+Nk7gLcki1TmWEoeoQ4VFXeQDsrC9H O7JGQQCMylL4MKUFGk43bZ6WEWH4S4a2DQx41KvAx6D/ZfaiI9ML7G3ni3MAJ/FCMW363m+KnAp4 HojA9Zi7c/Hn67Ibht7qnKJt+f1mqyW2MmOUyuBsJzw+guRKnywoS8M/7so6U17tbYTpe0FB8pb6 w8b8GnWDYWaagqhdVIAVpdDBaj0PaymBTpbX1/Fw9lXnXRPiKhSob3EkBeK5nE6prIbq0ZxU0oqD z1R9OPFgwhAMihKtoGT1h13FNxI34Vx9z1i1PFX6N2N1Dp/gIBQ9/FK2wQi9jraHjneeTu1aiRXP bbvhQg87e+BodmwHCLVeWTz2VbMEwHaB/TvdCLERYvkrsdngQf5cDbkp7B5OOgweyS2jabVNW6hw KdeK5X6mrR9btBuuO4J1DtkFZ87J2thNwnza6h6BTaBlrmWrSZP1KE/+OdfgeZ65Y4LFcZOYeeEX eoErc2rgZ+pwSpoTx6D+Zxazi0+HYS8RC6ndOU7uUujIejRFdEFsR/f/FGoc54K7Wi3g9opSHkDK ab2pcEyfNuak/xKM6EsjgHLEdiHaI7+mIiLld97Jz9p+0pzigHL14whp7oMxeHEDBTyTzcqDIGrc CSN6NWxOtBDnMlD5ClFIjKnXd9BBn18sDWSIAJ0E7PXbIpOB5HJLHOAQAGB7NvlEXQKOACG+ZA7Q IXVzNNR2BRcIhQ5K3hbHEwJcH9RoBn2WXJD+ek45Xu0jbHkxSXrUvPiDkEuFSQPbMnlxu6/JklYP bt9yBog//mtxu9kwhM2NZYjkI36zEqrpmn+97K6GbEeh0S0g+UWupPnvDSwJ1RibX+MOI//8jE77 t7vitNK2moMgACdkPluNqgPF5JMFiJxnTAN8pIY6fY0f01lpFRiol9WbRdN7Fu5O3V5xhtLw9JGk 6Y2n4YREEiryFeqKH4Nfl/+zLdYUaSJ+PILeqYzVbfXyKMXIs3W3JhzDMWQaRXH+oM5UEso6iHyF 2obXb8p7u20xr7i79CigLqQxGLJQluXq6nrJX3odAoqWNYAVZ2s8jATQuLo7PRzFVMpAnHRnICnt ezj/h2zA7bjtgyOUf0XW9CIuYeiD+dU/9ad3FIl677Pxe+ZJpTwkOuLccIdguWBE6gZu71Q6M1Qt 84OhfbWLi4o3wyfrmuWjGKeNCJsa7F8XN2RIBboPgioU7b5yJTLyrFspNAaUA/7ag0+GewKXc79P iKl64WiUqe4FzW/EbPwt3Kuoyi4hcqgGIwioUQdHDrv93W0B7XlROPuagCjEC4KR9DmBh4u8HbPM RqXSb9P2odME2Hc4g76TzZJ09PFFojBIP1dlmod6a13ajWfZvsv50nGg+ItbYTfpKnfy/pHvdZri 5SChGYzhl0Z7vj9xMvLREWpWVF4rOrcQlPqdx/gs+MYUQv0JTIh1c224PvS7LqXDoWINoIxBgdVx qAnyrk3jvVXDDF3aqflDni/o2albZmuLOruipps0gIoG93FHKggg9Bh6a33C9zmkl1yiKI/FUgM7 /HdikJVMjFeC5PlY1gC502NLIY1Lg4PqyAJOcwHx4P8L+1CRgd9pGC/fBLnMvPCyC/kLOTqxbddQ kNEchEKiSGsvJFfwN3A+MYg+4trGt9c43fDlp8J1n6d3ejORJARUoXrMMB3Wc8+PviR0btzJvQZB gctvOXwmZck8gqQGliImjIbs5y+T6aQQ0D/7JDSCW+B2+vadgBOZyoHMNQVRZUYMF3JYDWhFJor9 YGGCmFbhNCgv61N8BRJDzb9e3x/qVPAHcg9EeAY2HJBl0Sc3R2R+zDakSm2oxv1B0dpLzNyKjFP0 /s2FVA1deRiF1mpaEqn6ZIxJgFRs3oP5zlj7F5VUzik7nYJI+j3hIEPDHp/56PW3xehvCFzq4BbD VZKpetargiQqg1zyFfx0TEkxDO7ySel2qowXXdUU3lYRr9KkeMfXVlaaxTx+QzsBZBZTb12dgBVe hvNAKo+Fj2Ng0akQCB51/4OfWTZSux38W569Y3YI+iJdB1/FC/21T1B+0HeQLzFUgpziDzBHf3an PvMgnVFlsrEWZDEdw8iHg8+aXnCm24qKkqzRd+EiM5jURso1ZP1qf7qnIWKRcA+zmiYlH4Vht4f5 M5kw8Sl8+56EPXHCOHls2ml+o9+VvH8w/3USRlMMrCSlJQpxcFFW5uOWafcV5IQfk7OEijZbb9ZZ bHEi2Hlj1jWSyCr9decjyvxo8u5pgIlVyzCp1SCBcWeOcBNiM5FnI1CHKh5fWczoAWKHjNaMevVn xMNANvXdZvmnwKZLcU7TwLCNrkLsOsk0mqly3o1vu9OlADjxniAAOi0AONqgDEMM0XRpVLXltZcQ j9xAdcwdguDAiEgCYOjzmvFvI3h6oevbi5ZEpEFAin9S9BiUZ3GLjalJpF9rHuPTqiOmfFOKLBjf KfRxmNay8yWxJHmKhZzvmbUDoDSIr7U1HzmKz9VJ+5qE6rKAVir9g9MjoFZ8Jin8bKrtKVdtfF2F mowluZXDKO5EsmYoHcKIQoB6dzSnb8Iju0LPHxIqmMBwavgag0UVW33ZyAH1mFdQZ2H3Uo6b7aGD 8d6NtKQu+EuUc5Anb63YTKCp0g0jqR5lf+QjOS8d1c33wdzzYUr81GPZQhoL9FxMM5T/D0vKWbZ5 cdK28a50Hgw+Hc1QkOKuZ3C88L9movKe+W9RdBGARn1psZ55jN/LTYp6TRKHoZR3apl71t4LF9wf ZBVbdvsBjCijTwmlGo+pH4awTIiWYHrm2zxipJ6mZCL90QwTXFltOzeAonEsOYODSe8O/B9clwuW /fKMjkI3HBIWun2HZZEwryAbQab4RshQxQUGM/WPY0H7cjbYVGOwUbMhWqYkjaMc8u/5/GJhrIhX Xx0esm0leO4xgX2rUiA77rE/QHlZDm/ShxHdYaVNbrp8aQ5HXeCx7ywowvrff/PRGwMGDXXtQ+A/ cLcWqurrqaGq8WuP3dzGLogxTtUG4j/AuHyjc0csDF9d2vdWDwkCXzccd6wTR250G9nAmVsnXgcT KtCdwl12CuyIHcPQRz/Qql90ZyLdvDmPIUWGSsUeuGFR7JI51ZNRxX8iH1TvEYZEmR+SyCG7Ay83 4RIDE4MuNUGhVJsYg+Rxw22dc+6V4dlVY2Ey+XV67Uoa9NhZQ7bE2HCJ5zthKjdvm5frUo1MuhJC ccpjJhF7QqJMPgdIMpkGM3wnnj9dEEaDqCDBWWjVg1qc1eX3ARzCkBEOjPlSpsuvc7nx2vnS/lz0 Vy8BJp3cDGqzAUsR5IKe4etSAnyANcqLPbK6sUuUGQHIcbeMgjeYbnRZSZYZUBSrZQdwF1BzaYq1 F0pHlfqpEztVtUtIVC6LwJL5XGtO3bLC7dumnylwfY3Ghd1nXlYJcT4NOdPzC2/EKk8yNpDs2w0R GeeKAgmEzEOrPbZRLXDkSxZ6nBCsh347pBmrwgdRu7/1N7BWHI0zRRB4bxtw2D/6gSnTO+OJWxLD 3zMM2JpsmHGH2lYudJ4uHrRfWhpYJjgTNag1l4ArkwKqrleXz5RIhmId+2yFg4i7PV0jPH3JXA7U II/4isLAwMHW1sbYYseavHetN4lOe+pVn6uqfvi0fG9RgiN8nTDEww+g315EX045APXmeDDoLhoM EVfwhlnW5nqAbV6nCIe4X8rB+9ZoHN3IkVKcOUHp7zHR9t4bPMI1ZhivdtsLvPfAW3372wBDEyZ3 uGvyxFigFnKwLXbRg33Q+NeGMBgWR6REuDpKl/3Qg+bqyA==
Going even further, we also generated a longer message validating under 512 keys^{9} and the allzero nonce. The poor scaling of LLL is more noticeable here: we went from from $l = 256, d = 23$ in 15 minutes to $l = 512, d = 94$ in approximately 48 hours, both on a single core.
Conclusion
It is not every day that we stumble upon a problem in symmetric cryptography where lattices will be of use. Lattices and lattice reduction are, of course, more often associated with either breaking publickey cryptography, such as various textbook RSA implementation mistakes, (EC)DSA nonce leaks, or building publickey primitives, most notably the NTRU and (Ring)LWE publickey encryption, both of which being now featured in round3 of NIST’s postquantum competition.
As you can see, figuring out how to efficiently—or, rather, as efficiently as possible—generate constrained polynomials usable by a partitioning oracle attack on Poly1305 can be a fun and enlightening exercise.

In ChaChaPoly1305 the keys $(r, s)$ are derived using Chacha20; we skip past these details, as they are not too important for the exposition. But the proof of concept code does take this into account. ↩︎

In the case of ChaChaPoly1305 the message is of the form $\text{pad}(a)  \text{pad}(m)  \text{pad}(\text{encode}(a,m))$, where $\text{pad}(\cdot)$ splits its input into 16byte blocks and extends each block to 17 bytes by appending one
0x01
byte and as many0x00
bytes as necessary. Here we ignore the existence of the associated data $a$; it’s assumed to be empty. ↩︎ 
The last message block is an exception to this, but to keep things simple we will assume that the message length is always a multiple of $16$ bytes. ↩︎

There is an implicit switch in coefficient order here, since the most significant coefficient was $m_0$ before, and is now $m_{n2}$, but this is purely a matter of encoding. ↩︎

This lattice is quite similar to the typical LWE lattice. More precisely, it is reminiscent of RingLWE, with the caveat that our modulus is random, whereas the likes of RingLWE and NTRU use nice structured moduli of the form $x^{2^n} + 1$ or $x^n  1$. ↩︎

In the LWE setting this is usually framed as a reduction from BDD to uSVP, that is, a boundeddistance decoding problem into the unique SVP problem. But since we don’t have a promise here that there is a disproportionately close vector to our target (relative to the length of the shortest vectors), this sort of reduction does not apply. ↩︎

Indeed, the necessary $d$ required with an exact CVP oracle is very close to the $d = \lceil l/64 \rceil$ that would be expected from a bruteforce search. ↩︎

We use Sage’s quadratic interpolation algorithm here, as the number of points is negligible. ↩︎

Base64encoded, one per line. ↩︎