14 Legal Formats, 1 Model: E-Invoicing Architecture

How we solved the normalization problem in e-invoicing systems. Learn from 5 years and 40M invoices: why semantic classification beats format conversion.

Workflow diagram: RPA classifies invoices & bank data from bank transactions, ERP, tax-crawler, & orders for analytics.
Diagram showing RPA classifying invoices and bank data. Blue boxes on the left connect with teal lines to a green square. Purple boxes and a blue cylinder on the top right connect to the green square, which connects to purple and orange boxes on the right.

Punti Chiave

Sintesi

E-invoicing systems face a fundamental challenge that extends beyond format conversion and transmission: the normalization problem. Over five years processing 40 million Italian invoices, analysis reveals that while 70% of documents follow standard patterns that parsers handle easily, the critical challenge lies in semantic interpretation rather than technical transmission. The Italian fiscal system alone requires handling 28 document types, 7 VAT nature codes with subcategories, and multiple reporting obligations that interact across different government systems. Two invoices with identical XML structure can have completely different fiscal implications based on classification fields—for example, VAT nature code N6.1 for construction subcontracting versus N6.7 for cleaning services both appear as reverse charge transactions but require different deductibility treatments and accounting classifications. The normalization problem becomes particularly acute with threshold-based obligations like Intrastat declarations, which trigger at €350,000 in quarterly intra-EU purchases but require monitoring transaction patterns over time rather than individual document processing. Successfully handling this complexity requires moving beyond parser accuracy to intelligent classification systems that understand fiscal semantics, incorporate human feedback loops for the 30% long-tail cases, and maintain awareness of cross-system fiscal obligations that emerge from transaction patterns rather than individual documents.

Same Transaction, 14 Different Legal Formats. How We Built One Model to Handle All of Them.

Architecture lessons from five years of Italian fiscal data — and why the normalization problem is harder than the transmission problem

Paolo Messina | CEO, Mentally Digital LLC — San Jose, California
PhD Physics (EPFL), MBA (Michigan Ross)


Everything was working.

The parser was handling FatturaPA XML correctly. The Sistema di Interscambio was accepting every invoice without rejection. The general ledger was updating in real time. VAT was being applied at the correct rate for every transaction type. The Italian subsidiary’s finance team hadn’t received a single error notification in months.

Then, mid-year, one number crossed a threshold.

The company had been expanding its EU supplier relationships — German machinery components, French raw materials, Dutch logistics providers. Gradually, quarterly intra-EU purchase volumes reached €350,000. At that point, Italian law requires a monthly statistical Intrastat declaration for inbound EU purchases. Not a VAT filing. Not a tax payment. A statistical report to the customs authority — separate from SDI, with different data fields, on a different submission calendar, to a different government agency.

No system flagged it. No ERP workflow was monitoring cumulative cross-border purchase volumes against this threshold. The transmission layer had operated flawlessly for months. The intelligence layer — the one that would have recognized a fiscal obligation accumulating silently from a pattern of transactions over time — didn’t exist.

This is the problem nobody talks about when they talk about e-invoicing. Transmission is solved. Normalization is not.


The Taxonomy Problem

When engineers design e-invoicing systems, they tend to frame the problem as a format conversion challenge. FatturaPA is XML. UBL is XML. XRechnung is XML. Build a parser for each, map the fields, done.

This framing is wrong, and the wrongness becomes visible at scale.

Over five years and 40 million Italian invoices classified in production, we found that the hard problem is not format — it’s semantics. Two invoices with identical XML structure can have completely different fiscal implications depending on a handful of classification fields.

Consider two invoices both marked with VAT nature code N6 — the reverse charge category. N6.1 applies to subcontracting in construction. N6.7 applies to cleaning and security services. Both are reverse charge. Both are zero-rated on the invoice. But the deductibility rules, the DSCR impact, and the analytical accounting treatment differ. A parser that correctly reads the XML of both invoices has done perhaps 30% of the work. The remaining 70% is knowing what N6.1 means for this company’s cost structure versus what N6.7 means — and applying that interpretation consistently across thousands of documents per month.

Italian fiscal taxonomy as of 2026 includes: 28 document types (TD01 through TD28), 7 VAT nature codes (N1 through N7, each with subcategories), corrispettivi from certified cash registers in a different XML schema entirely, RT daily receipt aggregations, cross-border e-invoices that must be reported through SDI using specific document codes to indicate they are reporting-only transactions, intrastat declarations that reference the same underlying transactions but require different data fields, and F24 tax payments that are linked to but not contained in any invoice. Each of these streams interacts with the others. The Intrastat threshold problem described above emerged precisely because the invoice stream and the aggregate cross-border purchase calculation lived in separate systems with no layer monitoring their relationship.

The distribution of complexity across a real production dataset is approximately this: 70% of documents fall into roughly 15 standard patterns that a well-designed parser handles without difficulty. The remaining 30% is long tail — partial credit notes referencing partially paid invoices, mixed-rate invoices where different line items carry different VAT codes, split payment transactions for public administration clients, self-billing invoices for certain agricultural and publishing categories, transactions involving non-resident entities that require a specific document type (TD17, TD18, or TD19 depending on the nature of the supply). This long tail is not solvable with a better parser. It is solvable only with a human feedback loop — tax professionals correcting classification errors, those corrections becoming training data, the model improving its handling of edge cases iteratively.

After five years, our classification accuracy on standard patterns exceeds 95%. On long-tail cases, we run multi-model validation and surface low-confidence classifications for human review rather than auto-resolving them. This distinction — knowing when not to be confident — turns out to be more valuable than raw accuracy.


The Canonical Model

The architectural decision that made everything else possible was this: don’t build 14 parsers. Build one canonical fiscal model and 14 connectors.

The canonical model has six core entities: company, counterparty, invoice, tax breakdown, payment, and fiscal period. Every fiscal data source — regardless of country, format, or government portal — maps into these six entities. The connectors handle the format-specific parsing. The canonical model handles everything above that layer: classification, reconciliation, analytics, compliance monitoring, Q&A.

The practical consequence: analytical accounting operates on the canonical model, not on the source format. Margins by project or client, fixed versus variable cost structure, break-even calculation, P&L approximation — none of this requires country-specific logic. It works on any structured invoice data that has been normalized into the canonical model. This is why the analytical accounting engine achieves approximately 85% accuracy in real time without ERP integration, regardless of whether the source data is FatturaPA, UBL, XRechnung, or NF-e. The remaining 15% — accruals, depreciation schedules, detailed cost center allocation — requires ERP data. When an ERP connects via a bidirectional adaptor, accuracy rises to approximately 98%. But 85% accuracy updated continuously is operationally more useful than 98% accuracy delivered 90 days after the fact for most decisions a CFO or controller actually makes.

The effort to build a new country connector varies significantly based on one dimension: does the country have a modern API, or does it require authenticated portal access?

France (Chorus Pro / Factur-X) and Germany (XRechnung) both conform to EN 16931 — the European standard. Their government portals have modern APIs. Estimated connector effort: 2–4 weeks. Spain (Facturae), Portugal (CIUS-PT via Peppol), and Brazil (NF-e via SEFAZ) have web services with structured responses. Moderate complexity. Mexico (CFDI via SAT) uses a PAC intermediary model — different but API-based.

Italy was structurally different. FatturaPA predates EN 16931 and uses a proprietary XML schema. The Cassetto Fiscale — the government’s fiscal data repository containing the complete fiscal history of every Italian taxpayer — has no API. Access requires PIN authentication to a web portal that was designed for human users, not programmatic access. Building a stable crawler on top of this system took approximately two years — not because the scraping technology was particularly complex, but because the portal changes its layout silently, introduces new document types without announcement, and modifies download paths without versioning. Maintaining the crawler in production requires continuous monitoring and periodic re-stabilization.

This asymmetry — Italy hard, EN 16931 markets much easier — is the correct frame for understanding the global rollout of e-invoicing mandates. Italy proved the model. The markets following Italy in 2026–2030 are structurally simpler to integrate. The canonical model and the intelligence layer built for Italy transfer almost entirely. The connectors are new, but they represent the smaller fraction of the engineering work.


What the Government Portal Data Contains That Invoices Don’t

The canonical model has a second input stream that matters as much as the invoice stream: government portal data.

In Italy, the Cassetto Fiscale contains four categories of data that never transit through SDI: F24 tax payments (the actual amounts paid, by date, by tax type — not the amounts declared), Certificazioni Uniche for employees (wage certificates), declarations filed across all tax types, and enforcement data from the collection agency. This data is essential for two capabilities that invoice data alone cannot support.

First, tax compliance monitoring. The difference between taxes declared and taxes actually paid is not visible in any invoice. A company can declare IRES correctly and pay F24 in installments, partial payments, or with compensation. The compliance health score — the metric that tells you whether the company’s actual tax behavior matches its declared position — requires the F24 payment stream. Without it, you can verify that the invoices are correct. You cannot verify that the obligations derived from those invoices were actually settled.

Second, crisis indicator monitoring. The D.Lgs 14/2019 framework — Italy’s business crisis early warning system — requires monitoring 13 KPIs including the DSCR (Debt Service Coverage Ratio). Several of these KPIs are only calculable when you have both the invoice data (for revenue and cost flows) and the F24 payment data (for actual tax obligations). Compliance with adeguati assetti — the Italian requirement for companies to maintain adequate organizational structures for crisis detection — depends on this combined dataset. 96.5% of Italian SMBs are currently non-compliant with this requirement. The primary reason is that no standard accounting system pulls both data streams and computes the required indicators automatically.


Multi-Model Routing and the Q&A Layer

When the canonical model is populated — invoices classified, bank transactions reconciled, government portal data integrated — the next engineering challenge is making the data queryable by non-technical users.

Natural language Q&A on fiscal data sounds like a standard RAG problem. It isn’t.

The difficulty is that fiscal questions are precise and fiscal terminology is ambiguous. A CFO asking “what were our transport costs last quarter” may mean: invoices with supplier ATECO codes in the transport sector, or invoices with the service description containing transport-related terms, or invoices classified under specific cost center codes, or some combination. The answer differs by €40,000 depending on interpretation. A general-purpose LLM will pick one interpretation and return a confident answer. In a fiscal context, a confident wrong answer is worse than no answer.

Our approach uses multi-model routing: three or more LLMs process the question in parallel, each with the same context about the canonical model’s structure. A disambiguation layer identifies where the models disagree — this disagreement is the signal that the question is ambiguous. Instead of arbitrating between models, we surface the ambiguity to the user: “This could mean X (€380,000) or Y (€340,000) — which interpretation did you intend?” The user’s clarification then becomes a resolved query that returns a verified number from the structured database, not a generated approximation.

The verification step is non-negotiable. Every answer to a fiscal Q&A query is cross-referenced against the underlying structured data before delivery. The response format is always: natural language explanation + the data point extracted from the canonical model + the source documents that support it. This means the CFO can audit any answer by tracing it back to the specific invoices, bank transactions, or government portal records that generated it. In a tax context, auditability is not a nice-to-have. It’s the condition under which the output is usable at all.


What Remains Unsolved

Intellectual honesty requires naming the parts that are not yet solved.

Long-tail classification in new markets requires local production data. The training corpus for Italian fiscal taxonomy took five years to build, with continuous corrections from 70+ CPA firms. Entering France in September 2026 means building a new corpus from scratch for French fiscal taxonomy — a different set of VAT codes, different document types, different treatment of intra-community supplies under French implementation of ViDA rules. The canonical model transfers. The classification accuracy does not, until the training data exists.

Legal RAG — retrieval-augmented generation over tax legislation and case law — requires a local legal corpus for each jurisdiction. Italian legal RAG covers the TUIR (tax consolidation act), the IVA decree, D.Lgs 14/2019, circulars from the Agenzia delle Entrate, and relevant case law from the Corte di Cassazione. Each of these sources has different update frequencies, different authority levels, and different relationships to the canonical model’s classification logic. Building the equivalent for France or Germany requires 8+ weeks of corpus construction and validation per country — significantly more than the 2–4 weeks required for a format connector.

The feedback loop dependency. The human correction mechanism — tax professionals flagging classification errors, those corrections updating the model — is the component that makes production accuracy possible. It is also the component that cannot be replaced by synthetic data generation or zero-shot prompting. In markets where we don’t yet have a network of local practitioners reviewing outputs, production accuracy starts lower and improves more slowly. This is a distribution problem as much as a technical one.


The Architecture Bet

The core architectural bet behind everything described here is this: the intelligence layer — classification, reconciliation, analytics, compliance monitoring, Q&A — should be independent of the transmission layer. Transmission platforms connect companies to government portals and handle format compliance. The intelligence layer sits on top, operating on normalized data regardless of where it came from.

This separation matters for two reasons. First, it makes the intelligence layer extensible without re-architecting the transmission layer. Second, it means the intelligence layer can operate on data from multiple sources simultaneously — SDI invoices, bank transactions, government portal data, ERP feeds — and produce a unified analytical output that no single-source system can match.

Italy was the test case. The mandates arriving in 2026–2030 are the scale case.

The companies building the intelligence layer now will have five years of production data — classified, corrected, and validated by real finance professionals — by the time ViDA’s Digital Reporting Requirements become mandatory across the EU. The companies that wait will be building from zero in a market where structured fiscal data is no longer a differentiated asset, because everyone will have it.

The transmission problem is solved. The normalization problem is where the next five years will be decided.


Paolo Messina is CEO of Mentally Digital, an AI fiscal intelligence engine in production with 70+ Italian CPA firms and 40M+ classified invoices. The platform is built on a country-agnostic canonical fiscal model with country-specific connectors for Italy, with France, Germany, and Spain in development.

Live production demo with real Italian fiscal data: https://saluteimpresa.mentally.ai/en/tax-demo

For architecture discussions: info@mentally.ai

Dati e Statistiche

40M+

14

95%+

70%

30%

€350K

28

6

Domande Frequenti

What is the difference between transmission and normalization in e-invoicing systems?
Transmission refers to the technical process of sending and receiving invoices in various formats like XML, which is now largely solved. Normalization is the harder problem of understanding the semantic meaning and fiscal implications of each invoice. Two invoices with identical XML structure can have completely different tax treatments, deductibility rules, and accounting implications depending on classification fields. For example, both VAT nature codes N6.1 and N6.7 are reverse charge categories, but N6.1 applies to construction subcontracting while N6.7 applies to cleaning services, requiring different analytical accounting treatments despite appearing identical in format.
What is a canonical fiscal model and why is it better than building separate parsers?
A canonical fiscal model is a unified data structure with six core entities: company, counterparty, invoice, tax breakdown, payment, and fiscal period. Instead of building 14 different parsers for different countries and formats, this approach uses one canonical model and 14 connectors that map each format into the same structure. This architectural decision allows analytical accounting, compliance monitoring, and reconciliation to operate on a single normalized data layer regardless of whether the source is FatturaPA, UBL, XRechnung, or NF-e, significantly reducing complexity and improving consistency.
What accuracy does the analytical accounting engine achieve without ERP integration?
The analytical accounting engine achieves approximately 85% accuracy in real time without ERP integration, operating solely on normalized invoice data in the canonical model. This accuracy applies regardless of whether the source data is FatturaPA, UBL, XRechnung, or NF-e. When an ERP connects via a bidirectional adaptor, accuracy rises to approximately 98%. The remaining 15% gap without ERP integration involves accruals, depreciation schedules, and detailed cost center allocation that require ERP-specific data.
What triggers the requirement for monthly Intrastat declarations in Italy?
Italian law requires a monthly statistical Intrastat declaration for inbound EU purchases when quarterly intra-EU purchase volumes reach €350,000. This is not a VAT filing or tax payment, but a statistical report submitted to the customs authority separately from the Sistema di Interscambio (SDI). The threshold is cumulative across transactions over time, requiring systems to monitor aggregate cross-border purchase volumes, not just process individual invoices. This obligation operates on a different submission calendar and goes to a different government agency than standard e-invoices.
Why is building an Italian e-invoicing connector harder than for other European countries?
Italy's FatturaPA predates the EN 16931 European standard and uses a proprietary XML schema. More critically, the Cassetto Fiscale (the government's fiscal data repository) has no API and requires PIN authentication to a web portal designed for human users, not programmatic access. Building a stable crawler took approximately two years because the portal changes its layout silently, introduces new document types without announcement, and modifies download paths without versioning. In contrast, France and Germany conform to EN 16931 and have modern APIs, requiring only 2-4 weeks to build connectors.
What percentage of invoice documents fall into standard patterns versus long-tail complexity?
Approximately 70% of documents fall into roughly 15 standard patterns that a well-designed parser handles without difficulty. The remaining 30% represents long-tail complexity including partial credit notes referencing partially paid invoices, mixed-rate invoices with different VAT codes per line item, split payment transactions for public administration, self-billing invoices for agricultural and publishing categories, and transactions with non-resident entities requiring specific document types (TD17, TD18, or TD19). This long tail requires human feedback loops where tax professionals correct classification errors that become training data for model improvement.
What classification accuracy has been achieved after five years of processing Italian invoices?
After five years and 40 million Italian invoices classified in production, classification accuracy on standard patterns exceeds 95%. On long-tail cases, the system uses multi-model validation and surfaces low-confidence classifications for human review rather than auto-resolving them. This approach of knowing when not to be confident and requesting human review for edge cases proves more valuable than pursuing raw accuracy alone, as it prevents systematic errors on complex fiscal scenarios.
How many document types and VAT codes does Italian fiscal taxonomy include?
As of 2026, Italian fiscal taxonomy includes 28 document types (TD01 through TD28) and 7 VAT nature codes (N1 through N7, each with subcategories). Additionally, the system must handle corrispettivi from certified cash registers in a different XML schema, RT daily receipt aggregations, cross-border e-invoices reported through SDI, Intrastat declarations, and F24 tax payments. These streams interact with each other, creating complexity that cannot be solved by simple format parsing alone.