Technical Report
Encrypted Data Analytics
An in-depth exploration of privacy-enhancing technologies, examining different architectures and weighing the trade-offs involved in contemporary data protection.
Executive summary
Encrypted data analytics can be described as a collection of privacy-enhancing technologies that allow an analyst, cloud service, or counterparties to perform computations on sensitive data without exposing it in plaintext. This includes homomorphic encryption, secure multi-party computation, structured/searchable encryption, private set operations, and functional encryption. Additionally, in industry practice, this umbrella term encompasses a variety of other techniques. trusted execution environments and confidential computing, which protect data in use NIST, the ICO, and major cloud providers all view these tools as integral components of the broader landscape of PET and confidential computing, utilizing hardware isolation rather than relying solely on cryptographic opacity throughout. [1]
No single technique dominates. TEEs/confidential VMs Currently, the simplest route to achieving comprehensive functionality and near-native performance in SQL, joins, and ML involves incorporating hardware trust, attestation, and side-channel assumptions. Searchable encryption and ORE/OPE are frequently the most effective method for supporting equality/range/search workloads, however, they come with the risk of structured leakage that can be taken advantage of. MPC is often the most reliable option for multiple organizations requiring collaborative analytics without a third-party intermediary, but its effectiveness is largely influenced by interactivity and network quality. FHE provides the most effective trust reduction for outsourced single-server computation, but still comes at a high cost and is currently most advanced in selected areas such as aggregation, linear algebra, similarity search, and low-depth ML inference, rather than general-purpose OLAP. [2]
The most important design lesson is that "encrypted analytics" is not a binary propertySystems vary in the threats they defend against, including cloud operators, collaborating parties, database administrators, output inference, side channels, frequency leakage, and collusion among compute nodes. Similarly, they vary in their capabilities to perform tasks such as calculating sums, linear models, approximate neural inference, equality search, range search, joins, sorting, or executing arbitrary code. Therefore, the appropriate selection is crucial. threat-model-first, not feature-first. [3]
In terms of deployment, the prevailing production trend in 2026 is hybridizationA layer of confidential computing for overall execution, cryptographic protocols for sensitive steps, encryption for limited query capability, and privacy measures for releasing results. This setup is showcased in products like SecretFlow, Duality on AWS Nitro Enclaves, Decentriq on Azure Confidential Computing, Google Confidential Space, MongoDB Queryable Encryption, AWS Clean Rooms Differential Privacy, and Prio/DAP private aggregation systems. [4]
Definitions and scope
NIST’s Privacy-Enhancing Cryptography project describes the core primitives clearly: MPC lets multiple distrustful parties compute on private inputs; FHE A server can evaluate supported functions on ciphertexts to ensure that decryption produces the function output. PIR retrieves a database item without revealing the query; and structured encryption allows for private searches on encrypted data structures. NIST also specifically mentions functional encryption Within the realm of PEC tools, this establishes a solid groundwork for defining encrypted data analytics. [5]
For this report, encrypted data analytics encompasses all architectures that enable effective data analysis, searching, merging, or machine learning on sensitive information without exposing the raw data to the processing environment. This includes performing cryptographic operations on encrypted data or secret shares, executing structured and searchable queries on protected indexes, securely executing computations with hardware attestation, and implementing output controls like differential privacy in combination with encrypted or secret-shared processing. In contrast, merely encrypting data while at rest or in transit is not sufficient to meet the qualificationssince the application or analytic engine continues to process plaintext in its regular memory area. [6]
A useful practical distinction is between strict cryptographic opacity and reduced plaintext exposureFHE, MPC, PIR/PSI, and many FE constructions focus on data encryption, while TEEs focus on making data plaintext. inside An established enclave or private VM is required, with the operator, hypervisor, and surrounding platform kept outside the trust boundary. TEEs play a crucial operational role in real-world deployments that rely on existing SQL engines, model runtimes, or data clean-room workflows, despite offering different guarantees than pure cryptography. [7]
The scope of "analytics" here is broad: aggregation such as count, sum, average, histogram, and group-by; ML inference and, where feasible, training; SQL-like filtering and selected joins; search over encrypted databases or documents; and cross-party linkage or overlap analysis Different PETs support various subsets of the workload family, such as private join and compute, which is a key focus of the report. [8]
Deployment Models Synthesis Schematic
The above diagram provides a schematic overview of the primary deployment models outlined by NIST, MongoDB, AWS, Azure, Google Cloud, and SecretFlow. [9]
Threat models and regulatory constraints
The first question in encrypted analytics is who is the adversaryExamples of potential threats include: a trustworthy yet inquisitive cloud provider, a deceitful cloud operator or hypervisor, coordinated input parties in an MPC process, a database administrator with access to storage and logs, a side-channel attacker with the ability to monitor memory and microarchitectural effects, and an analyst who can only view aggregate outputs but may attempt membership or reconstruction attacks through repeated queries. Various Privacy Enhancing Technologies (PETs) are designed to combat specific threats within this diverse range. [10]
For cryptographic most techniques, the primary assumptions revolve around the difficulty of lattice or number-theoretic problems and the level of leakage permitted by the scheme's design. MPCThe corruption model and collusion threshold are the key assumptions in NIST's PEC descriptions and widely used MPC frameworks. They distinguish between passive and malicious adversaries, honest-majority and dishonest-majority, and protocol behavior under abort. Modern frameworks like MP-SPDZ, MPyC, and MOTION offer developers access to multiple security models. [11]
For TEEsIn a narrower and more operational threat model, Azure, Google, and AWS all utilize hardware-based TEEs to protect code and data in use against unauthorized access from cloud providers and other actors. However, if your risk factors include hardware/firmware bugs, side channels, or supply-chain trust issues with the platform vendor, TEEs alone may not be enough. [12]
Regulatory treatment is similarly nuanced. Under the GDPR, pseudonymisation not a substitute for proper data protection measures required by regulations such as Article 25 and Article 32. not a silver bullet, organizations continue to require legal, just, and transparent processing as well as a DPIA tailored to each individual case. [13]
That means encrypted analytics typically helps with risk reduction, processor minimization, breach resilience, and cross-organizational data sharing, but it usually does not Removing legal obligations related to purpose limitation, transparency, data-subject rights, retention, or international-transfer analysis can be risky, especially if a controller can still identify individuals through decrypting outputs or linking results back to them. This inference is drawn from the GDPR definition of pseudonymisation and the belief by the ICO/EDPB that PETs serve as additional safeguards within a comprehensive compliance strategy. [14]
Within U.S. sectoral regulations, the message remains consistent. HHS's HIPAA security recommendations direct covered entities to adhere to NIST security controls and encryption guidelines; the FTC's health data guidance stresses the importance of comprehending data flows, implementing strong protective measures, and avoiding misleading privacy assertions; and the FTC Safeguards Rule under GLBA is specifically centered on risk management and ensuring the confidentiality, integrity, and security of customer information. Utilizing encrypted analytics is most effective when integrated with Risk assessments, access control, attestation, key management, and governance of outputs are all essential components., but not when it is considered in isolation as a compliance argument. [15]
Technique catalog
Homomorphic encryption
Partially homomorphic encryption Paillier-based libraries are ideal for the mature low-functionality end of HE due to their additive homomorphism with scalar multiplication capabilities. counts, sums, weighted sums, private billing, and secure aggregationThe security of public-key semantic security in PHE is based on number-theoretic assumptions, with strong performance compared to FHE due to the lack of bootstrapping or deep circuit support. In practice, PHE is commonly integrated into larger protocols rather than utilized as a standalone analytic engine. Tools like CSIRO/Data61's python-paillier and newer lightweight packages like LightPHE are often used. Typical deployment involves client-side encryption, server-side accumulation, and decryption at the end by the key owner or a threshold group of key holders. The main limitation is simplicity. no arbitrary comparisons, joins, or general SQL without combining PHE with other primitives. [16]
Fully homomorphic encryption The most robust mainstream method for outsourcing single-server computation on encrypted data is Fully Homomorphic Encryption (FHE). NIST defines FHE as the ability to evaluate functions on encrypted data where decryption results in the output of the function. The HomomorphicEncryption.org community's 2024 security guidelines are widely used to configure modern FHE systems. The practical library ecosystem is extensive, with active projects like OpenFHE, Microsoft SEAL, HElib, Lattigo, TFHE-rs, Concrete, and Concrete ML all offering support for schemes such as BFV, BGV, CKKS, and TFHE-style variants. Current analytics capabilities include Aggregation, operations on vectors, searching for similarities, implementing low-depth arithmetic circuits, and conducting selected machine learning inference.; multiparty HE variants extend this toward collaborative analytics. However, Common SQL engines, joins, and expansive OLAP are mainly in the realm of research and prototyping.Recent surveys and tools like ArcEDB and FHE-SQL show advancements towards production readiness rather than being fully operational. [17]
The primary consideration is performance in the 2026 FHE Benchmarking Suite, with key metrics including latency, throughput, memory usage, storage expansion, communication complexity, and accuracy reduction. Bootstrapping remains a major bottleneck, as noted in the HE Standard, which emphasizes the practicality of bounded-depth schemes and the high cost of bootstrapping. Concrete ML's documentation further reinforces this operational reality by highlighting its current focus on. inferenceSupported models must adhere to quantization and precision restrictions, rather than arbitrary floating-point training pipelines. A crucial security limitation to consider is that in certain scenarios, approximate HE schemes such as CKKS require more rigorous analysis than plain IND-CPA. [18]
Current best practices for Fully Homomorphic Encryption (FHE) involve encrypting data at the client, transferring ciphertexts and evaluation keys to an untrusted compute service for homomorphic evaluation, and returning encrypted outputs or threshold-decryptable outputs to the data owner or consortium. Some of the most common use cases for FHE deployment include: one-owner outsourced computation and hybrid pipelines FHE safeguards the most delicate processes while commercial activity thrives, with IBM HElayers/FHE services, Duality, Zama, and hardware acceleration initiatives from various vendors all playing a significant role. [19]
Secure multi-party computation
MPC is the natural choice when Several organizations maintain their own raw data locally, but desire a collaborative outcome.According to NIST, Multi-Party Computation (MPC) enables several untrusting parties to perform computations on private inputs, while disclosing only information derived from each party's input and output. MPC systems can support passive or malicious security, honest-majority or dishonest-majority scenarios, and utilize a combination of secret sharing, oblivious transfer, garbled circuits, and occasionally homomorphic encryption. Well-known open-source MPC frameworks include MP-SPDZ, MPyC, MOTION, EMP, ABY3, and SecretFlow. [20]
Functionality is broad but topology-sensitive. MPC is strong for aggregations, histograms, PSI, private joins, secure overlap-and-sum, federated analytics, and classical machine learning training/inference on partitioned data.Google's Private Join and Compute demonstrates the ability to privately sum values across overlapping identifiers, while ABY3 was designed specifically as a mixed-protocol framework for machine learning. Progress in honest-majority protocols has shown significant advancements, with recent studies showing reduced high-latency links and increased efficiency. 50% fewer basic instructions per gate than prior state of the art in certain 3PC/4PC settings. [21]
Rich multi-party computation typically outperforms FHE in terms of performance, but falls short of TEEs for basic lift-and-shift analytics. The primary factors driving this performance discrepancy are... communication rounds, data transfer capacity, and the presence of autonomous computing entitiesIn a network with low latency and a strategic protocol, MPC can scale effectively; however, in WAN environments or when conservative protocols are required due to corruption thresholds, tail latency can increase rapidly. To address this, typical deployments involve 2-4 coordinated compute parties with formal collusion assumptions and stringent operational controls related to party independence and output release. The primary challenges include complexity, debugging challenges, issues with fairness and abort behavior, and the risk of security guarantees being compromised if too many parties collude. [22]
Trusted execution environments
TEEs and confidential-computing platforms protect data by constraining where it is decrypted and executed, instead of maintaining opacity throughout the computation process. Popular examples in current production include Intel SGX enclaves, AWS Nitro Enclaves, AMD SEV/SEV-SNP, Intel TDX, Azure confidential VMs, Google Confidential Space, and confidential GPUs like NVIDIA H100. Their primary assurance is usually limited to: only allowing access to. attested workload Accessing the keys or plaintext is possible while running in a protected environment. [23]
running existing analytics code with minimal modifications, making TEEs the top choice for functional expressiveness. SQL joins are commonly used in traditional databases to connect data tables, apply specific application logic, and perform machine learning training or The performance of DuckDB-SGX2 paper is impressive, showcasing its capabilities by successfully running a TPC-H scale-factor-30 analytical workload. under 2x overhead Compared to unencrypted execution, enclave execution poses risks like increased cache-miss cost, sensitivity to NUMA, and enclave paging, with common tools including Gramine, Open Enclave, and Confidential Containers. [24]
Performance comes at the cost of a trust base that is both larger and more delicate. SGX has a wide range of attacks documented in the literature, including those expected in 2024. SGX.Fail Systematization thoroughly examines well-known SGX attacks and how they can be applied to different architectures. AMD's SEV-SNP has recently come under scrutiny as well: the 2026. Fabricked a routing-misconfiguration attack allows for arbitrary read/write access and forged attestation in paper reports, with AMD's security bulletin recognizing integrity impact and outlining mitigation steps for affected products. In practice, TEEs are most effective when paired with. key release based on attestation, adherence to patch management, reducing trusted computing bases, secure deletion of secrets, and controlling, and weakest when treated as “set-and-forget encryption in use.” [25]
Searchable encryption and encrypted indexes
Searchable encryption encompasses a range of methods that enable queries on encrypted indexes or structures linked to ciphertext. NIST's definition of structured encryption emphasizes the ability to search encrypted data structures without exposing all information in the database. This concept is applied in practice by. blind indexes, reversed indexes, secure token/query protocols, and secure field-level query systemsSome of the representative tools are OpenSSE, CipherSweet, Cosmian Findex, MongoDB Queryable Encryption, and platforms similar to [26]
The security model intentionally diverges from that of FHE or MPC, as efficient searchable systems typically expose a mixture of data. search pattern, access pattern, frequency, result size, update pattern, or index structure are all important considerations.Recent studies highlight that leakage is not merely a superficial concern. Efficient structured/searchable encryption is explicitly characterized by permissible leakage, as demonstrated by contemporary research on leakage exploitation. This compromise results in considerable gains in speed, making searchable encryption an optimal choice for. equality search, document retrieval, keyword search, and selected range/prefix/suffix queriesHowever, it is not a straightforward solution for all types of joins or analytics requiring high levels of semantic security. [27]
MongoDB Queryable Encryption is a key feature in production, with equality queries available in version 7.0+ and support for equality and range queries in production. However, prefix/suffix/substring queries are still in preview in version 8.2 and are not recommended for production use. MongoDB also highlights the real operational costs of queryable encryption, including increased storage requirements, impact on query performance, and reduced observability due to redacted logs and diagnostics in encrypted collections. An independent security analysis by USENIX pointed out that operational logs could potentially compromise security and noted the lack of a full public security proof at the time of the study. [28]
Order-preserving and order-revealing encryption
OPE and ORE are specialization tools for range predicates, sorting, thresholding, and ORDER BY-like semanticsORE, as described by Stanford's Applied Crypto Group, allows for efficient range queries, sorting, and threshold filtering on encrypted data. An example of a production-oriented Rust Block-ORE implementation can be found in CipherStash's ore.rs, which is utilized in a searchable-encryption platform. These implementations are favored for their speed and seamless integration into database indexes. [29]
But the security compromise is fundamental: these schemes reveal orderThis leakage can lead to powerful inference attacks, especially when the attacker has access to supplementary distribution information or public reference data. The literature on inference attacks against property-preserving encrypted databases serves as a crucial reminder. While OPE/ORE may be suitable for latency-sensitive range searches, it is important to acknowledge and control the leakage through careful domain design and access controls. It is misleading to describe them as equivalent to standard encryption but with query capabilities. [30]
Functional encryption
Functional encryption is a cryptographic tool that bridges the gap between encryption and access control, according to IBM. It enables the ability to learn specific functions while keeping data secure. selected function The Fentec libraries bring the concept of encrypted data and fine-grained access control to life by offering functional encryption for linear, inner-product, and quadratic functionalities. In functional encryption, secret keys are associated with specific functions instead of providing complete decryption capabilities, ensuring analysts only have access to the information they need. f(x) and nothing more, at least in the ideal model. [31]
Although FE is intellectually powerful for analytics, its practical applications are currently limited. inner products, certain linear algebra operations, a few scoring functions, and specialized machine learning modulescontinuing to be explored in recent work, focusing on scalable solutions for federated learning and DP-augmented variations. not a widely used platform for general SQL, robust joins, and limitless ML training. However, its tooling ecosystem is limited, mainstream cloud support is lacking, and managing operational key management is challenging as a master authority is required to issue function keys. Consequently, the honest assessment is that FE still has room for improvement. promising but low-maturity for encrypted analytics in enterprise settings beyond specialized research or niche high-value workflows. [32]
Differential privacy with encryption
Differential privacy is not a technique for processing encrypted data, but rather a precise method for regulating the information that can be deduced. released outputsThis is why it pairs seamlessly with encrypted analytics, as OpenDP describes DP as restricting information about individuals in the output, while Google's documentation on distributed differential privacy in federated learning elaborates on its complementary function. secure aggregationThe server is expected to only receive an aggregate model update, rather than individual user updates. [33]
The strongest practical pattern is therefore: Secure the inputs during collection and computation through encryption, secret sharing, or TEEs, and then safeguard the released statistics or model with DP.Some instances of production include Google federated learning with secure aggregation and distributed DP, AWS Clean Rooms Differential Privacy, and systems like Prio/DAP that divide or combine client reports before disclosing them. The computational cost of the DP step is minimal compared to the cryptography; the challenging aspects are... privacy accounting, contribution bounding, sampling assumptions, and utility-loss managementDP addresses issues that cryptography cannot resolve on its own: vulnerabilities in security systems. output. [34]
Hybrid architectures
Hybrid designs are becoming the standard in production as they match technique to task. SecretFlow explicitly follows this approach by abstracting. MPC, HE, and TEE A privacy-focused data analysis and machine learning platform, Duality's AWS case study demonstrates the integration of Nitro Enclaves alongside FHE, federated learning, and differential privacy techniques, going beyond reliance on a single PET. Decentriq's Azure-connected resources also outline the use of clean-room architectures that merge confidential computing with various privacy technologies, such as differential privacy. [35]
The architectural value is straightforward. A hybrid stack can use searchable encryption for narrow lookup, TEE execution for general SQL or model serving, MPC for cross-party joins and aggregation without revealing plaintext to any single operator, FHE/PHE for the most sensitive arithmetic subroutines, and DP When anything is released beyond the trust boundary, it often outperforms any individual primitive in achieving the shared goals of security, functionality, and cost. However, the downside is equally clear: security proofs become compositional rather than monolithicAs each layer is added, the operational complexity rises significantly due to the introduction of new assumptions, observability requirements, and potential failure modes. [36]
Hybrid Architecture Layer Pattern
This mixed pattern provides an analytical overview of production architectures outlined by SecretFlow, cloud confidential-computing services, searchable-encryption systems, and DP release frameworks. [37]
Comparative tradeoffs
The table below is a qualitative synthesis "Instead of a concrete standard, the term "security level" indicates the degree of trust taken away from the operating environment." when the stated assumptions hold. [38]
| Technique | Security level | Supported analytics | Performance | Dev complex | Best fit | Primary caveat |
|---|---|---|---|---|---|---|
| Partial HE | High cryptographic protection for narrow arithmetic | Counts, sums, weighted sums, secure aggregation | High relative to PET alternatives | Low/Med | Simple outsourced arithmetic | Functionality too narrow for rich queries |
| Full HE (FHE) | Very high trust reduction for outsourced computation | Aggregation, vector ops, selected SQL-like ops, ML inference | Low to medium; often the slowest option | High | Single-owner outsourced compute | Blow-up, tuning, slow bootstrapping |
| MPC | Very high within explicit collusion thresholds | Aggregation, joins, PSI/PJC, partitioned ML | Medium; network- and round-bound | High | Cross-org collaboration without trusted hardware | Operational complexity and collusion assumptions |
| TEE / Confidential | High if hardware, firmware, and attestation assumptions hold | Broadest coverage: SQL, joins, arbitrary code, ML | High; often closest to native | Medium | Lift-and-shift confidential analytics | Side channels, larger TCB, hardware vulns |
| Searchable Encryption | Medium to high, but leakage-prone by design | Equality search, keyword search, some range/prefix/suffix | High | Medium | Queryable encrypted databases and search | Search/access/frequency leakage |
| OPE / ORE | Low to medium because order leakage is explicit | Sorting, range filters, thresholding, ORDER BY | Very high | Low/Med | Fast range search when leakage is acceptable | Inference attacks can recover structure |
| Functional Encryption | High for supported function families | Inner products, selected linear/quadratic analytics | Medium for narrow tasks | High | Fine-grained delegated analytics | Narrow functionality, low ecosystem maturity |
| DP + Encryption | High against output inference if well tuned | Aggregate analytics, telemetry, federated learning | High for DP; PET dominates cost | Medium | Sharing results safely after processing | Utility/privacy tradeoff and budget accounting |
| Hybrid Stack | Potentially strongest overall fit | Broadest practical coverage | Med to high if well partitioned | Very High | Real-world enterprise deployments | Security composition & operational complexity |
Deployments, case studies, and vendor landscape
The clearest production maturity today is in confidential-computing and clean-room deploymentsGoogle Documents categorizes Confidential Space as a Trusted Execution Environment (TEE) for agreed workloads, utilizing the same underlying framework. Google Ads confidential matchingMicrosoft's Azure Confidential Computing is touted as a safeguard against cloud operator access. Decentriq utilizes Azure Confidential Computing to establish enterprise data clean rooms. AWS showcases Nitro Enclaves with KMS-integrated attestation, while Duality's AWS case study demonstrates the use of Nitro Enclaves for creating isolated processing spaces for sensitive-data analysis, such as cross-border cancer research. These examples highlight the current leadership of broad-functionality encrypted analytics in practice. TEE-centric and hybrid architectures. [48]
On the pure cryptography On the flip side, the market remains genuine but increasingly discerning. IBM continues to offer HE layers and public FHE materials, showcasing a successful implementation with Intesa Sanpaolo for securing digital transaction workflows. Duality specializes in securing data collaboration for healthcare, finance, and government through PETs and open-source FHE. Zama has developed a thriving FHE ecosystem centered around TFHE-rs, Concrete, and Concrete ML, with a primary focus on blockchain and confidential smart contract infrastructure rather than traditional SQL analytics. Inpher stands out as a prominent vendor in MPC/HE/federated learning, catering to industries such as healthcare, finance, and IoT. [49]
For queryable encrypted databasesMongoDB Queryable Encryption is a leading example in the mainstream, offering support for equality and range queries while addressing storage, performance, and observability concerns. CipherSweet, OpenSSE, Cosmian Findex, and CipherStash are alternative software options for searchable encryption and encrypted index building, providing a simpler adoption path than FHE for workloads focused on exact/range/search predicates with an acceptable leakage profile. [50]
For privacy-preserving aggregation and telemetryPrio and its offspring are recognized as some of the most trustworthy real-world implementations. Mozilla has openly discussed efforts to implement. Prio-based DAP In Firefox, Divvi Up is known as a production system for aggregate statistics using Prio3. Google's federated-learning blog showcases secure aggregation and distributed differential privacy in model-training pipelines, while AWS Clean Rooms Differential Privacy exemplifies cloud products that prioritize privacy-controlled sharing of aggregate results. These examples emphasize that encrypted analytics extends beyond databases and model serving to encompass various aspects of data privacy. safe measurement and telemetry at scale. [51]
Selection criteria, deployment checklist, and evaluation metrics
initial decision criterion is not based on the vendor or algorithm chosen; instead, it is based on the. trust boundary you are trying to moveIf you don't fully trust the cloud operator, but trust a hardware root of trust and require robust existing software, consider beginning with TEEs. If several organizations require that no single operator has access to data, consider starting with MPC or PJC. If a data owner wants to outsource computation without trusting the server, consider starting with FHE or PHE. If the primary workload involves equality/range retrieval in a database, searchable encryption or queryable encryption may suffice. If the concern extends beyond input secrecy to include... sensitive outputs, you need DP on top. [52]
The second criterion is workload shapeRich joins, arbitrary UDFs, and model training tend to lean towards TEEs or hybrid setups. On the other hand, cross-party federated features, overlap analysis, and private record linkage usually align better with MPC/PJC. Low-depth inference, vector similarity, and specific arithmetic pipelines are becoming more viable with FHE. Encrypted indexes are generally preferable for equality/range lookup over application data. Without specifying operators, considering query selectivity, data sizes, cardinalities, and latency SLOs is crucial to avoid under-securing or over-engineering the workload. [53]
Practical Deployment Stage-Gate Checklist
The failure rate of PET projects increases when teams neglect adversary modeling and realistic prototyping. [54]
| Stage | What to do | Pass condition | Why it matters |
|---|---|---|---|
| Problem framing | Classify data, outputs, parties, and exact operators | Determine if the task involves aggregation, search, join, inference, or training. | PET choice is workload-specific, not generic |
| Threat model | Write down adversaries, collusion assumptions, and unacceptable leakages | Named threat model approved by security/legal | Techniques differ mainly in assumptions |
| Technique shortlist | Map workload to 2–3 candidate architectures | A minimum of one cryptographic option and one operationally efficient option were evaluated. | Prevents premature lock-in |
| Key and identity design | Define key custody, enclave attestation flow, or share-holder governance | Keys or shares are never ad hoc | Most failures are operational, not mathematical |
| Prototype | Benchmark on representative data and queries at realistic security levels | Meets p95 latency, throughput, and cost guardrails | PET performance is extremely workload-sensitive |
| Leakage review | Document what metadata, patterns, or outputs remain observable | Explicit acceptance or rejection of leakage profile | Searchable encryption and TEEs especially need this |
| Privacy release controls | Include DP, quota management, or query oversight if the outcomes exceed the established boundaries of trust. | Output policy defined and testable | Encryption alone does not solve output inference |
| Red-team / compliance | Test side channels, patching, logging, and legal claims | Findings resolved before rollout | PETs are not a silver bullet under GDPR/FTC/GLBA/HIPAA |
The benchmarking program needs to be just as clear. The FHE Benchmarking Suite is a valuable model as it focuses latency, throughput, memory, storage expansion, communication complexity, and quality lossadditional metrics specific to the enclave, such as attestation time, EPC or enclave-paging behavior, cache-miss amplification, and observable overhead during realistic OLAP workloads, when utilizing TEE-based SQL. index size, query selectivity, token-generation cost, and leakage profile documentation. For DP-based releases, add epsilon, delta, contribution bounding, privacy-budget burn rate, and utility loss. [55]
A quality cross-technique benchmark suite typically consists of at least five different workload families. aggregations on wide tables; private join or PSI-plus-sum on skewed identifiers; search with equality and range predicates; SQL analytics on a TPC-H-like subset with one or two joins; and ML Measure p50/p95 latency, throughput, ciphertext or share expansion, network bytes, RAM/VRAM, accuracy degradation, deployment time, and operator effort for one traditional model and one smaller neural model. Hidden parameter tuning or hand-crafted circuits that are not sustainable by your team should be considered a significant cost factor, not just a minor detail. [56]
A concise decision rule is this: Choose the least powerful tool that effectively mitigates the threat you are concerned about.This often entails using searchable encryption for specific fields, TEE-based confidential computing for fast analytics, MPC for collaboration across organizations without a trusted runtime, FHE for untrusted servers, and DP for aggregate outputs outside of the secure environment. The strongest production systems utilize a combination of these methods rather than relying on one primitive for all tasks. [57]
Open questions and limitations
Some sections of this area are changing rapidly, making it difficult to capture all the details in a static report. General-purpose FHE for SQL and large-model training Improvements are being made, however, the most reliable evidence currently suggests that selective inference and narrow analytics are favored over drop-in encrypted datastores for various tasks. Although the new benchmarking ecosystem shows promise, it is still in its early stages. [58]
Searchable encryption leakage An open design flaw still exists in the line. Structured encryption must include explicit leakage, but defining "acceptable leakage" is context-dependent and actively researched. Vendor positioning and academic caution often differ on this issue. [59]
TEE risk Recent findings from SGX and SEV-SNP demonstrate that confidential computing is still vulnerable to ongoing microarchitectural, firmware, and attestation-chain issues, indicating that stability is not guaranteed. Therefore, decisions heavily reliant on TEEs should be regularly reassessed as hardware advancements, cloud attestations, and vendor patch recommendations progress. [25]
Finally, functional encryption The evidence for large-scale deployment of FE lags behind FHE, MPC, searchable encryption, and confidential computing, making targeted pilots a more prudent choice over broad enterprise commitments unless the function family is exceptionally well suited to the application. The theory is robust and libraries are available, but recent evidence is limited. [45]
References
- [1, 6, 9] https://www.dataproduct.net/