Spec del formato .anpx

1. Estructura binaria

El archivo .anpx es un binario autodescriptivo de tamaño variable. La cabecera tiene 37 bytes fijos y a continuación viene el ciphertext + tag de autenticación.

Offset  Length  Field         Description
0       4       magic         ASCII "ANPX"  (0x41 0x4E 0x50 0x58)
4       1       version       0x01
5       16      salt          PBKDF2 salt (random)
21      12      iv            AES-GCM nonce (random)
33      4       iterations    PBKDF2 iterations, uint32 big-endian
                              (>= 600 000, mínimo aceptable 100 000)
37      N       body          AES-256-GCM(JSON map, key)
                              N = ciphertext + 16 bytes auth tag

2. Derivación de clave

La clave AES se deriva con PBKDF2-SHA256 sobre la passphrase del usuario y el salt del archivo, con el número de iteraciones declarado en la cabecera (mínimo 600.000 en archivos generados por Anoply, recomendación OWASP 2023).

key = PBKDF2_HMAC_SHA256(
  password = passphrase.encode("utf-8"),
  salt     = file.salt,
  iter     = file.iterations,
  dkLen    = 32  # AES-256
)

3. Cifrado del cuerpo

El cuerpo es el resultado de cifrar con AES-256-GCM el JSON serializado en UTF-8 del mapa de sustitución. No se usan datos asociados (AAD vacío). El tag de 16 bytes va concatenado al final por la primitiva GCM estándar.

body = AES_GCM_ENCRYPT(
  key        = derived key (32 bytes),
  nonce      = file.iv (12 bytes),
  plaintext  = JSON.stringify(substitutionMap).encode("utf-8"),
  aad        = b""  # not used
)

4. Estructura del mapa de sustitución

Tras descifrar el body se obtiene un JSON con las entradas que mapean cada token de vuelta a su valor original. Formato canónico v1:

{
  "version": 1,
  "createdAt": "2026-05-13T12:00:00Z",
  "entries": {
    "DNI_001": { "original": "12345678Z", "type": "dni", "country": "ES" },
    "IBAN_001": { "original": "ES66 2100 0418…", "type": "iban", "country": null },
    "EMAIL_001": { "original": "juan@example.com", "type": "email", "country": null }
  }
}

La reversión consiste en sustituir cada token por su original en el archivo anonimizado. Los tokens deben sustituirse del más largo al más corto para evitar solapamientos.

5. Implementación de referencia (Python)

Script independiente que lee un archivo anonimizado + un .anpx y produce el archivo original. Sólo requiere cryptography. Distribuye este script junto con tus archivos para que cualquier auditor pueda verificar la reversión sin depender de Anoply.

#!/usr/bin/env python3
"""
Anoply .anpx — reference reverter.

Reads a .anpx file produced by Anoply, decrypts the substitution map with
the user's passphrase, applies the inverse substitution to an anonymized
text file (.csv, .txt) and writes the original content out.

Dependencies: cryptography (pip install cryptography).

Usage:
    python anpx_revert.py clientes_anonimizado.csv clientes.anpx \
        --passphrase "your-passphrase-here" \
        --output clientes_revertido.csv
"""
from __future__ import annotations

import argparse
import json
import struct
import sys

from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

MAGIC = b"ANPX"
VERSION = 0x01
SALT_BYTES = 16
IV_BYTES = 12
ITER_BYTES = 4
KEY_BYTES = 32  # AES-256
HEADER_LEN = len(MAGIC) + 1 + SALT_BYTES + IV_BYTES + ITER_BYTES


def parse_anpx(buf: bytes) -> tuple[bytes, bytes, int, bytes]:
    if len(buf) < HEADER_LEN + 16:
        raise ValueError(".anpx file too short")
    if buf[: len(MAGIC)] != MAGIC:
        raise ValueError("invalid ANPX magic header")
    if buf[len(MAGIC)] != VERSION:
        raise ValueError(f"unsupported .anpx version: {buf[len(MAGIC)]}")
    salt = buf[5 : 5 + SALT_BYTES]
    iv = buf[5 + SALT_BYTES : 5 + SALT_BYTES + IV_BYTES]
    iterations = struct.unpack(">I", buf[5 + SALT_BYTES + IV_BYTES : HEADER_LEN])[0]
    if iterations < 100_000:
        raise ValueError("insufficient PBKDF2 iterations (must be ≥ 100 000)")
    ciphertext = buf[HEADER_LEN:]
    return salt, iv, iterations, ciphertext


def decrypt_map(buf: bytes, passphrase: str) -> dict:
    salt, iv, iterations, ciphertext = parse_anpx(buf)
    kdf = PBKDF2HMAC(
        algorithm=hashes.SHA256(),
        length=KEY_BYTES,
        salt=salt,
        iterations=iterations,
    )
    key = kdf.derive(passphrase.encode("utf-8"))
    plaintext = AESGCM(key).decrypt(iv, ciphertext, None)
    return json.loads(plaintext.decode("utf-8"))


def revert_text(anonymized: str, mapping: dict) -> str:
    """
    The Anoply substitution map is a dict: { token: original }. Replace each
    token by its original value. Longest tokens first to avoid overlaps.
    """
    entries = mapping.get("entries") or mapping
    by_token = {row["token"]: row["original"] for row in entries.values()} \
        if isinstance(entries, dict) and entries and "token" in next(iter(entries.values())) \
        else {k: v for k, v in entries.items()}

    tokens = sorted(by_token.keys(), key=len, reverse=True)
    out = anonymized
    for tok in tokens:
        out = out.replace(tok, by_token[tok])
    return out


def main() -> int:
    p = argparse.ArgumentParser(description="Revert an Anoply-anonymized file")
    p.add_argument("anonymized", help="anonymized input file (.csv, .txt)")
    p.add_argument("anpx", help="encrypted map file (.anpx)")
    p.add_argument("--passphrase", required=True, help="passphrase used to encrypt")
    p.add_argument("--output", required=True, help="output path for the reverted file")
    args = p.parse_args()

    with open(args.anpx, "rb") as fh:
        anpx_bytes = fh.read()
    with open(args.anonymized, "r", encoding="utf-8") as fh:
        text = fh.read()

    try:
        mapping = decrypt_map(anpx_bytes, args.passphrase)
    except Exception as e:
        print(f"decryption failed: {e}", file=sys.stderr)
        return 2

    out = revert_text(text, mapping)
    with open(args.output, "w", encoding="utf-8") as fh:
        fh.write(out)
    print(f"wrote {args.output}")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Este script está publicado bajo licencia MIT. Cópialo, modifícalo, redistribúyelo.

6. Compatibilidad y versionado

La versión actual es 1. Las versiones futuras mantendrán la cabecera fija (magic + version + salt + iv + iterations) y se documentarán aquí. Anoply garantiza que cualquier .anpx generado por una versión anterior podrá revertirse con la versión actual mientras la cadena de versiones esté soportada en esta página.

7. Reporta vulnerabilidades

Si encuentras un fallo criptográfico o de protocolo, escríbenos a security@anoply.eu. Respondemos en menos de 48 horas. Premiamos los reportes responsables.

Si Anoply desaparece, esto sigue funcionando.

Formato .anpx · especificación binaria