About
I'm a computer science researcher and a Wikimedian passionate about making open data more efficient, accessible, and sustainable. My work sits at the intersection of lossless compression algorithms, open data infrastructures, and green computing.
By day, at Sant'Anna School of Advanced Studies I research compressed data structures, indexing and retrieval techniques for the Software Heritage archive, the "Library of Alexandria" of code. By weekends, I keep contributing to open-data, collaborative projects like Wikidata.
My core research area is lossless compression. I completed my PhD at the University of Pisa under the supervision of Professors P. Ferragina and G. Manzini, focusing on computation-friendly compression: techniques that allow us to operate directly on compressed data without decompression overhead. The challenge is to develop tools that make data processing more energy-efficient, too. Compression is not just about saving space on disk: the real challenge is to adapt compression schemes so that they allow to operate directly in main memory (without compression overhead) and in time proportional to the compressed representation size.
I'm actively involved in:
- Software Heritage - Making source code archival more efficient and accessible
- Wikimedia projects - Technical contributions to Wikidata and Meta-Wiki
- Green algorithms - Developing energy-aware compression techniques
Outreach and Public Engagement
I occasionally contribute to Diff, the official Wikimedia Foundation blog, with articles on open data and free software.
When source-code archival is recognised as Digital Public Good: Insights from Software Heritage's 10-year journey at UNESCO
Reflections from the UNESCO symposium for Software Heritage's 10th anniversary: the recognition of source code archival as a global Digital Public Good and its role in cultural preservation and ethical AI.
Read the article
Lightening the robotic scraping: Insights for a 'green' cache from the Software Heritage archive
Strategies to lighten massive robotic scraping (e.g. for AI training) using a compressed cache inspired by Software Heritage, significantly reducing energy, storage, and environmental impact.
Read the articleImages licensed CC BY-SA 4.0 via Wikimedia Commons. View author profile on Diff .
Connect
Profiles
Publications
Research Activity
As an algorithmist, I primarily specialised in lossless data compression. Since July 2024, I have been working on optimising the compression and efficient indexing of large code archives in collaboration with the Software Heritage team.
Current Research Focus
- Compressed formats for matrices and trie structures
- Sparse matrix formats supporting matrix-vector multiplications (SpMV) in the compressed domain
- Energy-efficient computation on compressed data
Pronunciation: For those familiar
with the IPA, my name is pronounced
[fraŋˈt͡ʃesko
toˈzoːni].
Research Topic Distribution
Co-author Network
Collaboration Geography
Contact & Location
francesco🔴tosoni🐌santannapisa🔴it
(obfuscated for spam protection)
Location
Sant'Anna School of Advanced Studies
L'EMbeDS room
p.zza Martiri della Libertà 33
56127 Pisa PI
Italy