LeBonCoin partners with Spark, Amazon Web Services
30 Apr 2017
A number of tech vendors occupy key roles in the big-data strategy of LeBonCoin.fr, France’s dominant horizontal site with more than 7 billion pageviews per month. “Spark does the data streaming, Kafka handles data messaging, and Amazon S3 the data storage,” LeBonCoin chief data officer Aissa Belaid told French media recently.
These three services contribute to an infrastructure which allows the company to process 45 terabytes of data in 15 hours, Belaid told Le Monde Informatique (article available here in French).
Spark Streaming enables high-throughput processing of live data streams. It is also used by Uber, Netflix and Pinterest. Spark Streaming performs such functions as data cleaning and aggregation, detection of anomalous behavior, data enrichment, as well as grouping together and analysis of live session events, such as user activity after logging into a website or application.
Kafka is a messaging system, which enables the publication of and subscription to streams of data, their processing in real time and storage in a cluster. Website activity tracking is one of the most common uses of Kafka, with the processing of events such as pageviews and searches.
Finally, to store its data, LeBonCoin chose Redshift, the Data Warehouse as a Service of Amazon Web Services (AWS), provided and managed in the cloud. It offers data warehouse functions optimized for analytic queries. It enables the running of multiple data queries in parallel. Data is stored at Amazon Simple Storage Service (A3).
Aissa Belaid (LinkedIn profile here) has been with Schibsted-owned LeBonCoin for nearly six years, first as CRM manager and – the last 4 years – as chief data officer. In his current position, Belaid is responsible for data management, data strategy, and cloud migration strategy.