Blogs: Web Archiving
Blog posts filtered by the Web Archiving subject tag.
Browse blogs by subject
By Iris Geldermans (Intern June – August 2018 at the Koninklijke Bibliotheek | National Library of the Netherlands) Original text: https://www.kb.nl/blogs/duurzame-toegang/nl-blogosfeer-burgers-verslaan-een-ramp For the past three months, I have worked as an intern at the KB | National Library to form a collection of Dutch weblogs for the webarchive called the NL-blogosfeer (Dutch Blogosphere). During this […]
By teszelszky, posted in teszelszky's Blog
In a previous blog post I showed how we resurrected NL-menu, the first Dutch web index. It explains how we recovered the site’s data from an old CD-ROM, and how we subsequently created a local copy of the site by serving the CD-ROM’s contents on the Apache web server. This follow-up post covers the final […]
By johan, posted in johan's Blog
NL-menu was the first Dutch web index. The site was originally founded by a consortium of SURFnet, Dutch universities and the KB. From the mid-nineties onwards it was maintained solely by the KB. NL-menu was discontinued in 2004, after which the site was taken offline. In 2006 the domain name was sold to a private […]
By johan, posted in johan's Blog
Related to my work exploring hyperlinks in documentary heritage – something I feel we’ll be taking care of for a long time – I created a hyperlink extract tool called tikalinkextract. Put simply – the tool will take your collection of files, extract the intellectual content using Apache Tika, and then analyse that content for […]
By ross-spencer, posted in ross-spencer's Blog
Much of the inspiration from this blog came from this source here. According to UNESCO, the authenticity of a record can be jeopardized by: Threats to integrity. Changes to the content of the object itself also potentially damage authenticity. Most such changes stem from threats to the object at a data level. A hyperlink is data. […]
By ross-spencer, posted in ross-spencer's Blog
We recently posted an article on the UK Web Archive blog that may be of interest here, User-Driven Digital Preservation, where we summarise our work with the SCAPE Project on a little prototype application that explores how we might integrate user feedback and preservation actions into our usual discovery and access processes. The idea is […]
By Andy Jackson, posted in Andy Jackson's Blog
I would like to draw your attention to the new QA tool for finger detection on scans: https://github.com/openplanets/finger-detection-tool. This tool was developed by AIT in scope of the SCAPE project. Checking to identify fingers on scan manually is a very time-consuming and error-prone process. You need a tool to help you: Fingerdet. Fingerdet is […]
By Roman Graf, posted in Roman Graf's Blog
This blog post continues a series of posts about the weeb archiving topic „ARC to WARC migration“, namely it is a follow-up on the posts „ARC to WARC migration: How to deal with de-duplicated records?“, and „Some reflections on scalable ARC to WARC migration“. Especially the last one of these posts ,which described how SCAPE […]
By shsdev, posted in shsdev's Blog
Well over a year ago I wrote the ”A Year of FITS”(http://www.openpreservation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit […]
By Per Møldrup-Dalum, posted in Per Møldrup-Dalum's Blog
In my last blog post about ARC to WARC migration I did a performance comparison of two alternative approaches for migrating very large sets of ARC container files to the WARC format using Apache Hadoop, and I said that resolving contextual dependencies in order to create self-contained WARC files was the next point to investigate […]
By shsdev, posted in shsdev's Blog