[OPLIN 4cast] OPLIN 4Cast #301: Tackling the big jobs

Wed Sep 26 10:30:12 EDT 2012

Email not displaying correctly? View it in your browser. 
<http://www.oplin.org/4cast/>
OPLIN 4Cast

OPLIN 4Cast #301: Tackling the big jobs
September 26th, 2012

In this day and age of user-generated Internet content - Web 2.0, if you 
want to call it that - there are astounding amounts of information being 
generated in very short time spans. It has been pointed out 
<http://blogs.loc.gov/digitalpreservation/2012/05/a-vision-of-the-role-and-future-of-web-archives-the-web-archive-in-todays-world/>, 
for example, that every 24 hours there are more than twice as many words 
posted to Twitter as were printed in the entire New York Times over the 
last 60 years. If you are an archivist, the enormity of the task of 
saving even a little of this Internet material for future research seems 
overwhelming, and when you factor in other non-print information it 
seems even worse. Yet a few organizations, most notably the Internet 
Archive and the Library of Congress in the United States, have tackled 
portions of the job. Below are some recent news stories about the 
problem and their latest efforts to capture and provide effective access 
to huge amounts of information that might otherwise be beyond the reach 
of many.

  * The disappearing web: Decay is eating our history
    <http://www.businessweek.com/articles/2012-09-20/the-disappearing-web-decay-is-eating-our-history>
    (Businessweek/Mathew Ingram) "They took a number of recent major
    news events over the past three years-including the Egyptian
    revolution, Michael Jackson's death, the elections and related
    protests in Iran, and the outbreak of the H1N1 virus-and tracked the
    links that were shared on Twitter about each. Following the links to
    their ultimate source showed that an alarming number of them had
    simply vanished. In fact, the researchers said that within a year of
    these events, an average of 11 percent of the material that was
    linked to had disappeared completely (and another 20 percent had
    been archived), and after two-and-a-half years, close to 30 percent
    had been lost altogether and 41 percent had been archived."
  * Launch of TV news search & borrow with 350,000 broadcasts
    <http://blog.archive.org/2012/09/17/launch-of-tv-news-search-borrow-with-350000-broadcasts/>
    (Internet Archive Blog/Brewster Kahle) "Like library collections of
    books and newspapers, this accessible archive of TV news enables
    anyone to reference and compare statements from this influential
    medium. The collection now contains 350,000 news programs collected
    over 3 years from national U.S. networks and stations in San
    Francisco and Washington D.C. The archive is updated with new
    broadcasts 24 hours after they are aired. Older materials are also
    being added."
  * Congress.gov unveiled today
    <http://blogs.loc.gov/loc/2012/09/congress-gov-unveiled-today/>
    (Library of Congress Blog/Erin Allen) "The Congress.gov site
    includes bill status and summary, bill text and member profiles and
    other new features like comprehensive searching across bill text,
    summary and statuses; persistent URLs for search results; Members'
    legislative history and biographical profiles; and maintenance of
    existing features such as links to video of the House and Senate
    floor, top searched bills and the save/share feature."
  * So, is the Library of Congress still archiving Twitter?
    <http://www.buzzfeed.com/jwherrman/so-is-the-library-of-congress-still-archiving-twi>
    (BuzzFeed/John Herrman) "Serving up billions upon billions of tweets
    in even the most basic way is a hard job for a technology company,
    much less for a government agency whose requested budget
    <http://www.loc.gov/about/reports/budget/fy2013.pdf> [pdf] for
    'Digital Initiatives' in 2013 - all of them, including web
    archiving, historic newspapers, the online American history archive,
    the veteran's history project, early sound recordings - is under
    $50m, and actually /lower/ than it was in 2011."

*/Tweet fact:/*

When the Library of Congress announced in April 2010 that it was going 
to archive Twitter, there were 50 million tweets a day. Now there are 
400 million a day.
------------------------------------------------------------------------
The */OPLIN 4cast/* is a weekly compilation of recent headlines, topics, 
and trends that could impact public libraries. You can subscribe to it 
in a variety of ways, such as:

  * *RSS feed.* You can receive the OPLIN 4cast via RSS feed by
    subscribing to the following URL:
    http://www.oplin.org/4cast/index.php/?feed=rss2.
  * *Live Bookmark.* If you're using the Firefox web browser, you can go
    to the 4cast website (http://www.oplin.org/4cast/) and click on the
    orange "radio wave" icon on the right side of the address bar. In
    Internet Explorer 7, click on the same icon to view or subscribe to
    the 4cast RSS feed.
  * *E-mail.* You can have the OPLIN 4cast delivered via e-mail (a'la
    OPLINlist and OPLINtech) by subscribing to the 4cast mailing list at
    http://mail.oplin.org/mailman/listinfo/OPLIN4cast.

OPLIN 4Cast
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oplin.org/pipermail/oplin4cast/attachments/20120926/01bf15ed/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kubrickheader.jpg
Type: image/jpeg
Size: 38379 bytes
Desc: not available
URL: <http://lists.oplin.org/pipermail/oplin4cast/attachments/20120926/01bf15ed/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lateral-file-cabinet.png
Type: image/png
Size: 24343 bytes
Desc: not available
URL: <http://lists.oplin.org/pipermail/oplin4cast/attachments/20120926/01bf15ed/attachment-0001.png>