Legend has it that Gary Kildall, forefather of the modern personal operating system, created the first video game ever to run on the first microprocessor, the 4004, manufactured by a then fledgling company called Intel. He ran his game off a computer he had built around the chip mounted into a briefcase. Kildall would lug his case-computer to meetings (along with a 30-kilo teletype), run the game, and demo the hardware's features to prospective clients.
This happened back in the early seventies, still living memory for many of us. The chip has been preserved for posterity as the iconic device it is. But what happened to Kildall's game? Kildall himself died in 1994 and the game, along with much of his early code, has since been lost. We don't even know what the game-play was like.
What could we have learnt from the work of one of the most brilliant minds of contemporary computer history? What other valuable code has disappeared, carelessly overwritten, summarily erased, or fading away as the magnetic layer on an old floppy disk gradually degrades into dust?
In no way is the loss of software something rare. Quite the contrary: it is the norm. Proprietary code, more often than not, is only preserved as long as it can be sold, or used as an instrument to extort compensation from IP licensing lawsuits. Otherwise, it is expendable for its owners.
Free Software is often created by small groups, sometimes even by individuals, who often develop with limited means and on minuscule budgets. When anything gets in the way of development, even small things such as leaving college and getting a 9-to-5 job, or having a baby, often lead to projects getting abandoned and going dark.
That's why Software Heritage is so necessary. From more or less the mid-20th century until today, software has played an ever-increasing role in technological development, scientific endeavour, and the creation of new art forms. According to the creators, "software embodies our technical and scientific knowledge and humanity cannot afford the risk of losing it", and they're right. From the analysis of data from the Large Hadron Collider, to Pixar films, to advances in medical diagnostics, software has been key in most of our modern walks of life. Why not, then, treat code as it is, as a reflection of humanity's greatest contemporary accomplishments?
Software Heritage collects programs, applications and snippets of code distributed under free license from several sources, including GitHub, Debian historical releases, and GNU projects. Software Heritage has also rescued code from defunct services, such as Gitorious and Google Code, already doing what the creators set out to accomplish, namely, protecting code from sinking into oblivion on repositories marked for deletion.
The work of Software Heritage goes beyond preserving and storing code, though. Software Heritage is working on implementing features that will allow educators, researchers and scientists to check out and study the code. Provenance information will allow you to trace back where the code was originally stored; you will also be able look for chains of characters within the whole repository or on a subset of files with the Full-text search service. The Content download option will let you copy code from the Heritage's vaults to your local hard disk. You will even be able to suggest new sites the Heritage's engine can then explore in search of new code to store in the Increase coverage section.
Software Heritage fulfils a key role in the preservation of what, for too long, has been an undervalued resource. Software is crucial to our technological and cultural advancement. Software Heritage ensures today's code will be around for everybody in the future.