The leak contains the source codes for most of the company’s services: from mail and taxis to music and the cloud.

🔎 Leakage of source codes of Yandex services

On January 25, 2023, the source codes and accompanying data for many Yandex services and programs appeared on the web. The distribution contains separate archives (.tar.bz2), whose names can be used to identify the corresponding Yandex services.
The total amount of archives (compressed) is more than 44.7 GB.

January 26, 2023 Yandex confirmed the publication of the source codes of some projects from the internal repository.

The hackers released the archive to the public and claim that in July 2022 they downloaded the source codes of the company’s projects, in addition to the anti-spam rules.

😀 “There was no Yandex hack. The Yandex security service has discovered code fragments from an internal repository in the public domain. However, their content differs from the current version of the repository used in Yandex services.

The repository is one of the development tools within most companies that is available to their developers. Repositories are needed to work with code and are not intended to store personal data of users. We are conducting an internal investigation,” the company’s press service told Habr.

The developer Arseniy Shestakov explained that the archive contains only the contents of the git repositories, there are no personal data. There are several API keys, but they were most likely used only for test deployment. Some of the archives contain source code for part of the company’s services, as well as documentation pointing to real intranet URLs.

Just a few hours ago I found mention on Twitter that proprietary source code of Russian giant Yandex been leaked on online community called BreachForums. In this post I’ll share results of my friend digging into said archives.

Important details about torrent:
It is just content of repository without anything else.

All files are dated back to 24 February 2022.
It does not contain git history, mostly just code
No pre-built binaries for most of software with only few exceptions
There are no pre-trained ML models with some exceptions
This post is a work-in-progress and will be updated with more details.

Why is this big?
Yandex is one of largest IT companies in Russia. Within country it provide wider range of services than Google. Imagine one company that replace Google, Uber, Amazon, Netflix and Spotify.

Is this leak real?
I personally never worked at Yandex, but I know several people who worked there at different times or work there still. I verified that at least some of archives for sure contain modern source code for company services as well as documentation pointing to real intranet URLs.

What’s inside
It looks like at least source code for all major services of Yandex been leaked:

Search Engine and Indexing Bot
Maps – Like Google Maps and Street View
Alice – AI assistant like Siri / Alexa
Taxi – Uber-like taxi service
Direct – Ads service like Google Ads / Adwords
Mail – Mail service like GMail
Disk – File storage service like Google drive
Market – Marketplace like Amazon
Travel – Like a plus Airplane, Train and Bus tickets
Yandex360 – Like Google Workspaces for services on your own domain
Cloud – Probably not all infrastructure code was leaked.
Pay – Payment processing like Stripe, but with limited set of features
Metrika – Like Google Analytics
And at least backend part of majority of other company services is there. Largest archive called “frontend” is yet to be explored.

Full file list of files:
Security implications.
Since this is leak only contain contents of git repositories there is no personal data. There are at least some API keys, but they are likely only been used for testing deployment only.

