This project conducts a multidimensional audit of the current food system, integrating nutritional, industrial, environmental, and market dimensions. The objective is to unravel the "information paradox" faced by European consumers in 2026.
Through the analysis of ~10,000 products extracted from Open Food Facts and FAO production data, this work applies Data Science techniques to validate whether current quality labels reflect the biological reality of food products, or whether they are vulnerable to nutritional marketing.
-
H1 (Nutri-Score vs. NOVA): The Nutri-Score algorithm is susceptible to manipulation, allowing ultra-processed products to achieve high ratings.
-
H2 (The 'Bio' Halo Effect): The "BIO" label leads consumers to perceive a product as healthier, but it is not necessarily more nutritious or less processed.
-
H3 (Private Label vs. Major Corporations): Private label brands offer more balanced nutritional profiles and a lower additive load than major corporations.
-
H4 (Meat Market vs. Plant-Based): In countries with a strong livestock culture, plant-based offerings show higher technological processing and lower protein density in an attempt to mimic the meat-eating experience.
-
Logical failure: 24% of ultra-processed products achieve a Nutri-Score of A or B.

-
Manipulable algorithm: The Nutri-Score can be "hacked" through technical adjustments in formulation to artificially improve the rating without necessarily raising real nutritional quality.

-
Clean 'Bio': The Bio seal reduces the chemical additive load by more than half in the majority of categories.

-
The private label rebellion: Private label products contain 60% less sugar and three times more fibre than products from major corporations.

-
Cleaner private labels: 57% of the private label catalogue contains zero additives, compared to 44% of the major corporation catalogue.

-
Nutritional erosion: In Spain, meat analogues have a median of 3 additives, compared to 0 additives observed in more mature markets such as the United Kingdom.

-
Nutritional quality vs. sensory experience: Two distinct approaches emerge in the meat analogue market: proposals that prioritise nutritional density, versus those focused on replicating the sensory experience of animal meat, at the expense of the product's nutritional value.

This audit reveals a critical disconnect between marketing labels and the technical reality of food products.
Nutri-Score "Hacking": Being an algorithm blind to industrial processing, it allows ultra-processed products to achieve "A" or "B" ratings simply by adding isolated fibre or protein.
The Limitations of the 'BIO' Seal: While it guarantees sustainability and fewer additives, it does not correct a poor nutritional formulation; a product can be organic and nutritionally poor at the same time.
Private Label Leadership: Against all preconceptions, private label brands act as more honest health vectors than large multinationals, offering cleaner labels, less sugar, and more fibre.
The Plant-Based Paradox: In livestock-oriented cultures like Spain, plant-based alternatives prioritise sensory mimicry, sacrificing protein density and increasing chemical complexity.
Smart consumer guide: Development of a scan-based recommendation system that ranks alternatives based on real nutritional balance and reduction of chemical additives.
Strategic industry optimisation: "Clean Label" reformulation models that prescribe precise adjustments to improve quality without compromising technological viability.
Composite scoring metric: Design of a new Nutri-Score-style index that integrates NOVA processing level and additive load alongside nutritional composition, specifically engineered to flag ultra-processed products that game the current algorithm through isolated nutrient fortification. (ML project)
food-labelling-audit/
├── src/
│ ├── data/
│ │ ├── processed/ # Final normalised dataset (foods_cleaned.csv)
│ │ ├── raw/ # Datasets (FAO & OpenFoodFacts)
│ │ └── txt/ # Extracted texts for Regex
│ ├── notebooks/
│ │ └── main.ipynb # Main summarised notebook
│ └── utils/
│ ├── cleaning.py # Full cleaning pipeline
│ └── plots.py # Full visualisation pipeline
├── outputs/
│ ├── plots/ # Exported visualisations
│ ├── presentation/ # Presentation
│ └── report/ # Report
├── README.md # Documentation
└── requirements.txt # Library list
-
Language: Python 3.11+
-
Data Analysis: Pandas, NumPy.
-
Advanced Cleaning: ReGex (for multilingual normalisation of >2,000 variants of countries and brands).
-
Visualisation: Seaborn, Matplotlib (focused on heatmaps and bar charts for clear, straightforward visualisation).
-
Modelling: Multiple Linear Regression (interpretation of
$\beta$ coefficients for reverse-engineering the Nutri-Score).
Mikel Añibarro Ortega | Data Science Bootcamp The Bridge (Bilbao)