Internet Archive

The Internet Archive is a non-profit corporation dedicated to archive the Internet founded by Brewster Kahle in 1996. It is dedicated to obtaining, preserving, and curating digital knowledge so that information may not be lost to time.

The operation often runs afoul of commercial publishing corporations which, despite the ephemeral nature of the profitability of newly released works as well as their own corporate integrity, continuously fight to remove preserved works from the Internet Archive.^[1]

The Internet Archive hosts the Open Library, Wayback Machine, and purl.org services.^[2]

Stats

History

1996: Internet Archive started by Brewster Kahle.
1997-01-26: First Internet Archive snapshot of the Internet Archive's own website.^[3]
2013-11-06: A fire occurred at the Internet Archive's San Francisco scanning center.^[4]^[5]^[6].
2023-05-29: The Internet Archive suffered an outage caused by massive numbers of automated requests. VentureBeat quotes Ross Anderson, a security engineering professor at Cambridge University and the University of Edinburgh that it was an AI startup acquiring datasets to train deep learning neural networks on.^[7]^[8]
2025-07-25: California senator Alex Padilla makes the Internet Archive a federal library. ^[9]

References

↑ Brittan, Blake. (2023-03-20). “Internet Archive faces skeptical judge in publishers' copyright lawsuit”. Reuters.com. Accessed 2023-03-22. Archived from the original on 2023-03-21.
↑ “OCLC and Internet Archive work together to ensure future sustainability of Persistent URLs”. (2016-09-27).OCLC. Dublin, Ohio. Accessed on 2023-04-10. Archived from the original on 2023-02-02. “OCLC and Internet Archive today announced the results of a year-long cooperative effort to ensure the future sustainability of purl.org. The organizations have worked together to build a new sustainable service hosted by Internet Archive that will manage persistent URLs and sub-domain redirections for purl.org, purl.com, purl.info and purl.net”.}}
↑ “Internet Archive”. (1997). Archived from the original on 1997-01-26.
↑ . “Internet Archive building damaged by fire”. (2013-11-07). BBC.com. Accessed 2023-03-22. Archived from the original on 2013-11-07.
↑ Kahle, Brewster. (2013-11-06). “Fire Update: Lost Many Cameras, 20 Boxes. No One Hurt”. Accessed 2023-03-22. Archived from the original on 2013-11-07.
↑ B., Sarah. (2013-11-06). “Part of Internet Archive building badly burned in early morning fire”. Accessed 2023-03-22. The Redmond District of San Francisco Blog. Archived from the original on 2013-11-06.
↑ Carl Franzen. (2023-06-12). “The AI feedback loop: Researchers warn of ‘model collapse’ as AI trains on AI-generated content”. venturebeat.com. Accessed 2024-09-05.
↑ Ross Anderson. (2023-06-06). “Will GPT models choke on their own exhaust?”. lightbluetouchpaper.org. Accessed 2024-09-05. “Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data.”
↑ Chris Benedetto. (2025-07-25). “The Internet Archive just became an official U.S. federal library”. mashable.com. Accessed 2025-07-26. “According to a new designation announced by California Senator Alex Padilla, the website will join a network of more than 1,000 libraries around the country tasked with archiving government documents for public view. Unlike other designated federal depository libraries, as they are known, the Archive is entirely online. ”.

Footnotes

Comments

[reuters_20230320_publisherlawsuit-1] Brittan, Blake. (2023-03-20). “Internet Archive faces skeptical judge in publishers' copyright lawsuit”. Reuters.com. Accessed 2023-03-22. Archived from the original on 2023-03-21.

[oclc_20160927_oclcpurlia-2] “OCLC and Internet Archive work together to ensure future sustainability of Persistent URLs”. (2016-09-27).OCLC. Dublin, Ohio. Accessed on 2023-04-10. Archived from the original on 2023-02-02. “OCLC and Internet Archive today announced the results of a year-long cooperative effort to ensure the future sustainability of purl.org. The organizations have worked together to build a new sustainable service hosted by Internet Archive that will manage persistent URLs and sub-domain redirections for purl.org, purl.com, purl.info and purl.net”.}}

[ia_19970126_firstsnapshot-3] “Internet Archive”. (1997). Archived from the original on 1997-01-26.

[bbc_20131107_fire-4] . “Internet Archive building damaged by fire”. (2013-11-07). BBC.com. Accessed 2023-03-22. Archived from the original on 2013-11-07.

[ia_20131106_firereport-5] Kahle, Brewster. (2013-11-06). “Fire Update: Lost Many Cameras, 20 Boxes. No One Hurt”. Accessed 2023-03-22. Archived from the original on 2013-11-07.

[rdb_20131106_iafire-6] B., Sarah. (2013-11-06). “Part of Internet Archive building badly burned in early morning fire”. Accessed 2023-03-22. The Redmond District of San Francisco Blog. Archived from the original on 2013-11-06.

[venturebeat_20230612_ai-recursion-collapse-7] Carl Franzen. (2023-06-12). “The AI feedback loop: Researchers warn of ‘model collapse’ as AI trains on AI-generated content”. venturebeat.com. Accessed 2024-09-05.

[anderson_20230606_curse-of-recursion-8] Ross Anderson. (2023-06-06). “Will GPT models choke on their own exhaust?”. lightbluetouchpaper.org. Accessed 2024-09-05. “Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah. This will make it harder to train newer models by scraping the web, giving an advantage to firms which already did that, or which control access to human interfaces at scale. Indeed, we already see AI startups hammering the Internet Archive for training data.”

[mashable_20250725_internet-archive-9] Chris Benedetto. (2025-07-25). “The Internet Archive just became an official U.S. federal library”. mashable.com. Accessed 2025-07-26. “According to a new designation announced by California Senator Alex Padilla, the website will join a network of more than 1,000 libraries around the country tasked with archiving government documents for public view. Unlike other designated federal depository libraries, as they are known, the Archive is entirely online. ”.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Anonymous

Search

Internet Archive

Namespaces

More

Page actions

Contents

Stats

History

See also

External links

References

Footnotes

Comments

Navigation

Integrate

Reticulate

Observe

Help

Wiki tools

Wiki tools

Anonymous

Search

Internet Archive

Stats

History

See also

External links

References

Footnotes

Comments

Navigation

Wiki tools

Page tools

Categories