Overview
Cryptonets System Description
Background
Neural network cryptography with homomorphic tokenization (HT), also known as Cryptonets, enables organizations to significantly reduce privacy and security risk and improve accuracy, speed, and efficiency by operating in the encrypted space. Specifically, Cryptonets allows encrypted match and search operations on HT data (anonymized data), without any requirement to store, transmit or use PII, plaintext images or biometric templates.
Cryptonets was first described by Microsoft in 2013 1, and Microsoft and IBM led Cryptonets research from 2016 to 2019.2, 3, 4 Private Identity was the first to describe HT transformations using machine learning in 20175 and the first to solve and patent Cryptonets in 2019.6 Private Identity collaborated with Google in 2019 to bring Cryptonets to the edge.
Homomorphic tokenization (HT) was first described by Rivest and colleagues at MIT in 19787 and today is one of the most important cryptography problems in computer science. In 2015, MIT Lincoln Laboratory first demonstrated HT to control data sharing.8 In 2019, the US Intelligence Advanced Research Projects Activity (IARPA) launched the Homomorphic Encryption Computing Techniques with Overhead Reduction (HECTOR) program9, and the US Office of the Director of National Intelligence reported that law enforcement agencies and private sector companies may, “leverage these new forms of Personal Identifying Information (PII) without encroaching on civil liberties…”10
System Description
Cryptonets supports continuous 1:N search in 50ms constant time for an unlimited user base by combining the speed, accuracy, and security of Cryptonets HT with an extremely fast, accurate, and scalable machine learning (ML) vector classifying service.
The Cryptonets Edge AI client encrypts PII and biometric data on-device in 20ms constant time. It is a small C++ shareable object (.SO) composed of immutable compiled native C++ binary object code, DNNs compiled into C++ runnable libraries, and three layers of cryptography. Additional controls are present in the container. The client transforms, stores and transmits HT payloads, and runs on modern browsers, phones, devices, embedded devices, and platforms. The client is wrapped in WebAssembly, C++, Android, or Python, and operates in low- or no-network environments.
The Cryptonets Edge AI client contains complex, pre-trained embedding models to generate harmonic, distance-measurable 1-way HT payloads. This patented technique was first described by Streit and colleagues in 2017 link. HT payloads output by the Cryptonets client are compact real-valued feature vector representations of unstructured data.
Specifically, these HT payloads are anonymized, keyless, 1-way HT payloads that are globally unique (i.e. no two HT payloads are ever the same), positional arrays of 512 floating-point numbers that do not contain biological or behavioral characteristics, imagery, or a template of any physiological, biological or behavioral trait. HT payloads cannot be used to recreate the initial input data but are distance measurable such that similarity between embeddings can be classified by a DNN.
Search and match operations in the encrypted space offer the highest level of security and privacy, and exempt CryptoNets from privacy law obligations. The CryptoNets ML vector classification service returns identity in 50ms constant time with high accuracy and very few false positives for any size gallery.
Cryptonets irreversibly encrypts (anonymizes) data while enabling match and search operations without the requirement to decrypt to plaintext. This mitigates the regulatory and legal risk of PII or biometric data. Indeed, the GDPR, CCPA, BIPA & HIPAA privacy laws specifically exclude and do not regulate anonymized data. Building on this, the IEEE 2410-2021 Standard for Biometric Privacy link provides that conforming HT systems “do not incur GDPR, CCPA, BIPA or HIPAA privacy law obligations.” This Standard ensures that payloads are always 1-way homomorphically tokenized, and no PII is received by the SBP server.
Helper Networks
Cryptonets uses an ensemble of DNNs (“helper networks”) to find and validate the signal, detect presentation attacks (spoofing) and blur, and augment data prior to HT transformation. These helper networks are fully integrated into the Cryptonets Edge AI client to prevent bad inputs. Helper Networks are further described in US Patents 10,938,852, 11,489,866, 11,489,866, 11,502,841, 11,789,699, 11,210,375, and 11,170,084.
Landmark models. This geometry detection model accurately locates face(s), finger(s), document(s), and other data in an image by transforming each image into geometric primitives and measuring the relative position, width, and other parameters of eyes, mouth(s), nose(s), chin(s), and finger(s). The Landmark DNN returns x,y coordinates of each biometric factor in an image, video frame, or video stream, uses YOLO architecture, and is 100kB to 1.7MB. The Landmark DNN is on (true) by default.
Validation models. The face, fingerprint and document validation models accurately validate frontalized face input images, fingerprint input images, and document images. These Validation DNNs return a validation score between 0 to 100, where 100 is a perfect image. The Validation DNNs use MobileNetV2 architecture and are approximately 1.5MB. The DNNs are on (true) by default.
Voice validation models. The voice validation model uses audio input for a quality human voice to discriminate between a voice and external noise. The model accepts a sound wave as input and returns a validation score between 0 to 100, where 100 is perfect audio in which the voice is isolated enough to create a valid embedding. It uses YOLO architecture and is 100kB.
Three-class validation model. Wearing glasses or a face mask during biometric enrollment lowers subsequent prediction performance. This model determines if the face is not obstructed, obstructed with glasses (sunglasses or eyeglasses), or obstructed with a face mask. The model accepts one frontalized face input image and outputs three values summing to 100. The largest value among these three values predicts the class. This model uses YOLO architecture and is 100KB. The DNN is on (true) by default during enrollment.
Presentation attack models. Three presentation attack detection (PAD) models provide passive facial liveness to ensure the system does not process spoofed biometric data. Models detect image spoofing, video spoofing, and eyes open/eyes closed.
Data augmentation. Once located and validated and prior to embedding, a procedural program augments data to generalize enrollment and improve the accuracy and performance of subsequent predictions. Enrollment operations augment the original data. Image augmentations included rotations, flips, and color and lighting homogenization to increase the distance metric between embeddings without exceeding class boundaries.
To augment voice data, the Cryptonets Edge AI client modulates and transforms each voice audio with pulse code modulation (PCM) and Fast Fourier Transform (FFT). PCM lessens the input to two times the frequency range allowing for the smallest possible Fourier transform without computational loss. FFT moves the PCM modulated audio signal from the time domain to a representation in the frequency domain. The transformed output is a two-dimensional array of frequencies. Audio augmentations broaden the digital signal to emulate different microphones, noise-canceling algorithms, normalized speaker distance from the microphone, and emulates various human physiological changes including lack of sleep, alcohol consumption, and smoking, and added random background noise. Each augmentation increases the distance similarity without surpassing class boundaries.
Document geometry detection model. To acquire data from a Photo ID card or passport, the document geometry detection model locates identity documents, text, face(s) or fingerprint(s) samples by transforming each image into geometric primitives and measuring the relative position, width, and other parameters of the document and text, eyes, mouth(s), nose(s), chin(s) and finger(s). The model accepts image input and outputs (x,y) coordinates of the document and the data or biometric samples contained in the document. It uses YOLO architecture and is 100KB. The DNN is on (true) by default.
HT Transformation
The Cryptonets Edge AI client uses complex, pre-trained embedding models trained on a large corpus to convert unstructured dense individual raw biometric pixel intensities form images or text to real-valued, lower-dimensional, distance-measurable dense vectors, also known as embeddings. These embedding models use “one-shot learning” techniques to avoid the need to retrain to recognize new subjects. One-shot learning saves compute and power that would otherwise be needed for retraining.
Cryptonets uses four embedding DNNs, one each for face, voice, fingerprint, and text, to accept plaintext input and produce distance-measurable embeddings. The embedding DNNs train using techniques first taught by Schroff and colleagues11 at Google Research, and further developed by Private Identity in collaboration with Google Brain.
The embedding DNNs use the MobileNetV2 architecture and train on privacy-exempt synthetic datasets and open-source datasets (i.e. VoxCeleb, Asian-Celeb). After each training iteration (~60 days), we use the “No Reference” data quality methodology12 to repair errors and class overlaps in the training data.
To further generalize training and accommodate a wide range of boundary conditions during operation, the training dataset for each embedding DNN is randomly augmented during training to add obfuscations. The face dataset is randomly augmented with eyeglasses, sunglasses, image distortions, different camera positioning, facial expressions, image rotations, facial hair, scars, makeup, colored lenses and filters, image abrasions, variable backgrounds, variable poses, variable distances, and variable hue, saturation and lighting (HSL).
Similarly, the fingerprint recognition embedding DNN’s training dataset is randomly augmented during training to add image distortions, different camera positioning, image rotations, scars, colored lenses and filters, image abrasions, variable backgrounds, variable poses, variable distances, and variable hue, saturation and lighting (HSL). Finally, the voice audio dataset is randomly augmented during training to include signal variations (8-48kHz), background noise, variable microphones, variable noise-canceling algorithms, and variable human physiological conditions that affect voice including lack of sleep, smoking and alcohol.
PII Data and Biometric Privacy Law Obligations
Cryptonets allows match and 1:N search operations in the encrypted space. FHE data output by the Cryptonets Edge AI client are anonymized, keyless, 1-way HT payloads that are globally unique (i.e. no two HT payloads are ever the same), positional arrays of 512 floating-point numbers that do not contain biological or behavioral characteristics, imagery or a template of any physiological, biological or behavioral trait. These HT payloads are a form of 1-way encryption in that they cannot be used to recreate the initial input data but are distance measurable so the similarity between payloads can be calculated and classified by a neural network.
IEEE 2410-2021 Standard for Biometric Privacy link provides technical and policy standards for Cryptonets identity systems. The IEEE 2410-2021 standard ensures: (1) biometric payloads are always 1-way homomorphically tokenized; (2) no PII is received by the SBP server; and (3) conforming 1-way HT systems “do not incur GDPR, CCPA, BIPA or HIPAA privacy law obligations.” Private Identity is certified compliant with IEEE 2410-2021 Standard for Biometric Privacy, ISO 27001, ISO 9001. GMS Registrar provides the third-party accredited certification (3PAO).
Cryptonets Edge AI Client Security
The Cryptonets Edge AI client is a C++ shareable object (.SO) composed of immutable compiled native C++ binary object code, helper networks and embedding DNNs compiled using TensorFlow Lite into C++ runnable libraries, three layers of cryptography to protect communication with the backend, uninterruptible business functions, and a one-way authentication mechanism on the backend that only returns a UUID. The decentralized client operates with or without a network.
Client Cryptography
The Cryptonets Edge AI client uses three levels of data confidentiality. At the Web tier, Cryptonets uses transport-level declarative security (TLS) enforced by the container. These payloads are further encrypted with AES256 or PKI, and inside each payload are the HT transformations which are themselves one-way encryptions. The server executes automatic and declarative transport decryption, followed by AES256 decryption to get to the 1-way FHE payload. The 1-way HT payload is then used in RESTFul operations. Time salts, and additional safeguards are present in the container.
Decentralized operation (“airplane mode”)
The Cryptonets client caches AES256-encrypted HT payloads locally (on-device) to operate in low- or no-bandwidth environments and achieve massive horizontal scalability. A phone can authenticate > 100 users without latency. A GPU or TPU on the phone increases speed by 70X but is not required.
Cryptonets W3C WebAssembly SDK (Wasm)
The CryptoNets Edge AI shareable object wrapped in W3C WebAssembly (Wasm) runs on major browsers using immutable compiled native C++ binary object code in a stack-based VM in user-space that is not observable at runtime and is protected from control-flow hijacking attacks and direct code injection attacks. Irrespective of the security framework and the isolated environment of the code, the solution seamlessly adheres to the normal DOM interface, is memory sandboxed, and is capability constrained.
Cryptonets C++, Android & Python SDKs
The CryptoNets Edge AI client is available in a C++, Android, and Python wrapper to support phones, devices, embedded devices or platforms.
Cryptonets Client Business Functions
The Cryptonets Edge AI client provides five uninterruptible business processes: is_valid(), predict(), enroll(), compare() and delete(). The client maintains full control of the identification, verification, and authentication journey to prevent control flow and code injection. It validates the biometric, transforms the sample to an embedding, calls local storage, secures the transport layer with three layers of cryptography, calls the Web Tier, and returns the encrypted result (uuid and guid).
Cryptonets 1:N Search
Modern vector search systems usually require indexing the costs (distances) between every set of vectors. These solutions require large compute infrastructures, and are often bounded by gallery size. Facebook’s Faiss and Google’s Vertex AI are examples of NP vector search solutions.
CryptoNets 1:N search, on the other hand, solves 1:n vector classification without indexes to achieve a O(1) solution by training a feed-forward fully connected (FCNN) neural network with embeddings and labels (UUIDs). This FCNN accurately infers the correct label when provided never-before-seen embeddings in 50ms constant time irrespective of gallery size. See US Patents 10,419,221 and 10,721,070.
Cryptonets Unbiased Algorithms
Cryptonets uses a balanced facial training dataset and homogenized lighting algorithms to prevent discrimination based on race, age, gender, or ethnicity. A recent evaluation found the CryptoNets algorithm performed without bias across all racial subgroups, generated no false positive (Type I) errors and 0.25% false negative (Type II) errors. This study relied on Private Identity’s Diversity in Faces (DIF-CELEB-1M) evaluation dataset, in which subclass distribution closely resembled the US population. The DIF-CELEB-1M contained 2,049 celebrity classes (461,322 facial images) distributed 14% Black, 6% Asian, 53% White, 23% Hispanic, and 4% Other.
Additional Background Information
Open Standards - IEEE 2410 Standard for Biometric Privacy
Cryptonets is certified compliant with IEEE 2410-2021 Standard for Biometric Privacy, ISO 9001:2015, ISO 27001:2013, and compliant with W3C WebAssembly, FIPS 140-3, TLS, IPSEC and SSL. 3PAO provided by GMS Registrar.
ISO 19785 Biometric Data Exchange
Customers may alternatively configure Cryptonets to comply with NIST Cybersecurity Framework, Common Biometric Exchange Formats Framework (CBEFF, ISO/IEC 19785-1:2015), ISO 19794, NIST Type 10, NIST SAP 32, NIST SAP 42 or NIST SAP 52 exchange formats.
Cryptonets Data Privacy
Cryptonets allows organizations to use PII and 1:N biometric search and match operations without incurring GDPR, CCPA, BIPA or HIPAA privacy law obligations. To accomplish this, Cryptonets irreversibly anonymizes PII and biometric data using a 1-way cryptographic hash algorithm (1-way HT) and then immediately discards the plaintext data. This HT payload, or anonymized data, ceases to be personal data as no decrypt key exists, and the loss of privacy by decryption is not reasonably likely, or mathematically impossible. As a result, 1-way HT is not “Personal Data” under General Data Protection Regulation (EU) 2016/679 (“GDPR”) or the California Consumer Privacy Act (“CCPA”) and is not “Biometric Information” under the Biometric Information Privacy Act (“BIPA”). Additionally, the identification system is exempt from data breach notification requirements.
Cryptonets provides asynchronous face enrollment and identification with unlimited throughput using elastic, load-balanced, fault-tolerant Kubernetes clusters for high-demand identification tasks using a RESTful interface. The Encryption Engine maintains full accuracy and performance through real-world boundary conditions and accepts Base-64 PNG, MPEG, JPEG, or MJPEG plaintext images using a RESTful API or by directory scanning. The Engine provides the ability to identify one or multiple faces per frame, provides data augmentation, creates a 1-way hash (HT transformation) for subsequent processing, and then immediately discards the original biometrics. The FHE transformation computes in 20ms constant time.
Facial Recognition Algorithm
The CryptoNets facial recognition algorithm provides high accuracy (99.71%) with very few false positives (FPIR=0.0001%). The algorithm recognizes faces using Webcams, phones or specialized cameras, enrolls an unlimited number of users (“unlimited gallery size”), and operates in constant time.
Ensemble of models (helper networks):
- Face landmark model: 300KB, 10ms, YOLO architecture
- Face validation model: 1.7MB, 10ms, YOLO architecture
- Facemask/eyeglasses detect model: 100KB, 10ms, YOLO architecture
- Eye geometry detection model, 100KB, 10ms, YOLO architecture
- Eye blink detection model, 100KB, 10ms, YOLO architecture
- Age estimation model: 800KB, 10ms, MobileNetV2 architecture
- Face embedding model: 1.7MB, 20ms, MobileNetV2 architecture
- Minimum requirements: face ≥224×224 pixels, camera ≥256kB
- Photo ID capture requires ≥ 2MP Webcam or mobile camera (≥2560x1440)
- Accommodates facemasks, glasses, distortions, rotations, blur, facial hair, scars, makeup, filters, abrasions, and hue, saturation and light.
- Identification Rate (IR) ≥99.71%, FPIR =0.001%, FNIR =0.025%
- Multithreaded, massively scalable using Kubernetes Clusters
Voice Recognition Algorithm
The Cryptonets text-, language- and accent-independent speaker identification model achieves 99.50% identification rate (IR) using a 1-second voice sample at 16kHz with very few false positives (0.0001%), and 93.00% IR using a 3-second voice sample at 8kHz. It enrolls an unlimited number of users (“unlimited gallery size”), and operates in constant time.
Ensemble of models (helper networks):
- Voice Validation Model: 100KB, YOLO architecture
- Voice embedding model: 3.5MB, MobileNetV2 architecture
- Requires 1 second of voice (text- and language-independent) @ 16kHz
- Identification Rate (IR) ≥99.50%, FPIR=0.001%, FNIR=0.050%
- Requires 3 seconds of voice (text- and language-independent) @ 8kHz
- IR ≥93.00%, FPIR=0.001%, FNIR=6.999%
- On-device validation and encryption
- Accommodates background noise, noise-canceling microphones, and sleepiness, smoking and alcohol.
Fingerprint Recognition Algorithm
The Fingerprint Recognition Algorithm provides high accuracy (94.50%) with very few false positives (0.0001%). The model recognizes fingerprints using any general-purpose Webcam or phone ≥2MP enrolls an unlimited number of users (“unlimited gallery size”) in constant time.
Ensemble of models
- Fingerprint landmark model: 300KB, 10ms, YOLO architecture
- Fingerprint validation model: 1.7MB, 10ms, YOLO architecture
- Fingerprint embedding model: 900KB, 20ms, MobileNetV2 architecture
- Minimum requirements: fingerprint ≥224×224 pixels, camera ≥2MP
- Accommodates distortions, rotations, blur, scars, filters, abrasions, and hue, saturation and light.
- Identification Rate (IR) ≥94.50%, FPIR=0.001%, FNIR=6.50%
- Multithreaded, massively scalable using Kubernetes Clusters