Decentralized Adaptive Ranking using Transformers
2025. Marcel Gregoriadis, Quinten Stokkink, Johan Pouwelse.
EuroMLSys '25: Proceedings of the 5th Workshop on Machine Learning and Systems
Centralized platforms like TikTok are cause for significant concerns over information control, censorship, and bias. Decentralized systems offer a promising alternative, but their adoption is hindered by the lack of effective relevance ranking of search results. Existing decentralized approaches rely on heuristics that do not adapt to user behavior. This paper presents DART, the first decentralized ranking algorithm to leverage machine learning over users' search activities. DART adapts its rank… read more
A Thorough Investigation of Content-Defined Chunking Algorithms for Data Deduplication
2024. Marcel Gregoriadis, Leonhard Balduf, Björn Scheuermann, Johan Pouwelse.
arXiv preprint
Data deduplication emerged as a powerful solution for reducing storage and bandwidth costs by eliminating redundancies at the level of chunks. This has spurred the development of numerous Content-Defined Chunking (CDC) algorithms over the past two decades. Despite advancements, the current state-of-the-art remains obscure, as a thorough and impartial analysis and comparison is lacking. We conduct a rigorous theoretical analysis and impartial experimental comparison of several leading CDC algorit… read more
De-DSI: Decentralised Differentiable Search Index
2024. Petru Neague, Marcel Gregoriadis, Johan Pouwelse.
EuroMLSys '24: Proceedings of the 4th Workshop on Machine Learning and Systems
This study introduces De-DSI, a novel framework that fuses large language models (LLMs) with genuine decentralization for information retrieval, particularly employing the differentiable search index (DSI) concept in a decentralized setting. Focused on efficiently connecting novel user queries with document identifiers without direct document access, De-DSI operates solely on query-docid pairs. To enhance scalability, an ensemble of DSI models is introduced, where the dataset is partitioned into… read more
Analysis and Comparison of Deduplication Strategies in IPFS
2022. Marcel Gregoriadis.
Master Thesis at Humboldt University of Berlin
IPFS has recently risen in popularity, as it represents the backbone for file sharing in a decentralized web. As the amount of files exchanged on IPFS grows, and both storage and network bandwidth are expensive, the discussion around deduplication strategies becomes pressing. This discussion is largely founded on the execution of chunking algorithms. To this end, we analyzed and compared FastCDC and AE, as two state-of-the-art chunking algorithms, with Rabin, Buzhash, and fixed-size chunking, w… read more
Analysis of Arbitrary Content on Blockchain-Based Systems using BigQuery
2022. Marcel Gregoriadis, Robert Muth, Martin Florian.
WWW '22: Companion Proceedings of the Web Conference 2022
Blockchain-based systems have gained immense popularity as enablers of independent asset transfers and smart contract functionality. They have also, since as early as the first Bitcoin blocks, been used for storing arbitrary contents such as texts and images. On-chain data storage functionality is useful for a variety of legitimate use cases. It does, however, also pose a systematic risk. If abused, for example by posting illegal contents on a public blockchain, data storage functionality can le… read more