How web standards shape the internet's governing framework

A behind-the-scenes look at the consensus-driven process that ensures web reliability.

The hidden bureaucracy of web standards development processes that shape our entire online experience.
The hidden bureaucracy of web standards development processes that shape our entire online experience.

In a recent episode of the Search Off the Record podcast released on April 17, 2025, Google Search team members Martin Splitt and Gary Illyes provided rare insights into the complex world of web standards development, revealing the meticulous processes that govern how internet technologies are formalized and adopted globally.

The discussion, which took place just two days ago, offers a timely examination of internet governance at a moment when web technologies continue to evolve rapidly. While many internet users interact with web standards daily through browsers and websites, few understand the deliberate, consensus-driven processes that ensure these technologies work consistently across the digital landscape.

According to Gary Illyes, who has directly participated in standards development with the Internet Engineering Task Force (IETF), the journey from concept to official standard typically spans years of rigorous debate, implementation testing, and extensive peer review. "It probably takes years for something to become a standard. When I say probably, I'm underselling it. Basically it just takes years for something to become a standard," Illyes explained during the podcast.

The podcast reveals that standards development is intentionally methodical because of the significant implications for internet security, compatibility, and longevity. Standards bodies must consider potential security vulnerabilities that malicious actors might exploit. Illyes provided a concrete example from his work standardizing robots.txt, explaining that including a 500-kilobyte parsing limit was a deliberate security measure: "If we hadn't put 500 kilobyte limit on robots.txt file, then people would be able to cause a buffer overflow... and then, once you have that buffer overflow, then you have access to memory blocks that you could exploit to your advantage."

Google outlines pathway for robots.txt protocol to evolve
How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity.

The discussion identified several major standards organizations that oversee different aspects of internet technology. The IETF focuses primarily on fundamental internet protocols like HTTP and TCP/IP, while the World Wide Web Consortium (W3C) traditionally governed markup languages such as HTML. Other organizations include the European Computer Manufacturers Association (ECMA), which oversees JavaScript (officially called ECMAScript), and specialized bodies like the RSS Advisory Board.

One particularly notable aspect of standards development is its democratic nature. According to Illyes, many standards bodies maintain remarkably open processes: "Everything is public. Also our meetings are public. Technically, anyone can join in and listen to what we are talking about or even just say words in the meeting. There's no formal membership. You can just show up and contribute to standards."

The path to standardization typically begins when innovators identify a technology that would benefit from formal standardization. They then approach the most appropriate standards body based on subject matter expertise and relevance. For instance, when Google sought to standardize robots.txt, they selected the IETF because of its focus on internet protocols.

The standards development process includes several distinct phases. First, a draft document is created and submitted to the relevant working group. This draft undergoes extensive review and iteration, with particular attention paid to security considerations, implementation feasibility, and precise language.

Language precision is crucial in standards documents. Illyes explained the significance of specific terminology: "Especially in IETF standards, we use certain keywords that have weight. And that would be stuff like 'should' or 'may' or 'must.' If you're reading RFCs, those are capitalized... When you're reading it as an implementer, then you have to understand that if something says MUST, then you actually have to do it."

After addressing all feedback and concerns, the draft enters a "last call" phase where final objections can be raised. Various directorates within the standards organization then conduct specialized reviews before final approval. The IETF, for example, classifies standards as either "proposed standards," which remain somewhat flexible, or definitive "internet standards" that become fundamental, largely immutable components of internet architecture.

The podcast highlighted that not all widely adopted technologies have undergone formal standardization. Some remain "de facto" or "informal" standards that gain acceptance through widespread adoption rather than formal approval. Sitemap.xml files, which help search engines discover content on websites, exemplify this category. Despite being created around 2005-2006 and extensively used across the web, sitemaps have never been formally standardized.

This distinction between formal and informal standards raises questions about when formalization is worthwhile. According to Illyes, the decision to standardize robots.txt was justified because different implementations were parsing the files inconsistently: "With robots.txt, there was benefit because we knew that different parsers tend to parse robots.txt files differently. And then, if you have a standard, then at least you fix that."

The standards process, while seemingly bureaucratic, serves critical functions for internet stability and security. By requiring extensive peer review and consensus-building, standards bodies ensure that technologies are robust, secure, and technically sound before they become foundational components of internet infrastructure.

For those interested in contributing to web standards development, most standards bodies maintain open processes that welcome public participation. Resources for further information include the official websites of the IETF, W3C, and other standards organizations, which provide detailed documentation on their processes and current initiatives.

Timeline

  • Approximately 1998-2000: Robots.txt emerges as an informal standard
  • 2005-2006: Sitemap.xml format created and informally adopted
  • After 20-25 years as a de facto standard, robots.txt was formally standardized by the IETF
  • Present day (April 2025): Many foundational web technologies continue to evolve through consensus-driven standardization processes