Airbnb does not have a team of writers producing pages about “log cabins in the Austrian Alps” or “apartments with a terrace in Barcelona’s Eixample district”. Those pages exist because there is a template, a dataset, and an engine that combines them automatically. The result — multiplied across millions of geographic and accommodation-type combinations — is one of the most thoroughly documented programmatic SEO projects in history: organic traffic that no editorial team could generate manually.
Programmatic SEO is not new. What has changed since Google’s Helpful Content Update (HCU) of September 2023 is the cost of getting it wrong. Before HCU, a project of 50,000 thin pages could generate modest traffic with limited impact on the rest of the domain. After HCU, Google applies a site-level signal: if the proportion of “unhelpful” pages exceeds a threshold, the high-quality pages on the same domain also lose visibility. Programmatic SEO in 2026 operates under stricter rules: more unique data, less variable text, and demand validation before generating a single URL.
What programmatic SEO is (and what it is not)
Programmatic SEO is the methodology of creating web pages at scale by combining templates with structured data. The template defines the visual structure and textual framework; the dataset supplies the variable data that makes each page unique. Each row in the dataset produces an independent URL that answers a specific search query.
What distinguishes programmatic SEO from the “auto-generated content spam” that Google penalises is not the technology but the usefulness of the output. Google explicitly prohibits in its policies content “generated primarily to manipulate search rankings” — text produced at scale with no value for the user, where the template is 95% of the content and the variable data is just a city name. That is not programmatic SEO: it is keyword stuffing at scale.
Legitimate programmatic SEO produces pages with unique, verifiable data per dataset row. Airbnb pages expose average local prices, real inventory availability, and verified guest reviews. Zapier’s pages document functional step-by-step instructions for configuring the integration between two specific applications, with real use cases. A Spanish property portal like Idealista (Spain’s leading real estate portal) surfaces price per square metre in the neighbourhood, price variation history, and number of available properties. Each data point is functional — users can decide from it without leaving the page.
The practical test for distinguishing your project from spam: if you remove the variable data and only the template remains, does the page have value? If yes, because the informational text is substantial, it may work. If no, because without the data the page is an empty skeleton, then the project depends entirely on the quality of that data.
Companies doing it well: Airbnb, Zapier, and Idealista
All three reference cases share the same pattern: a dataset with real, up-to-date data; a template that presents it in a functional format; and a URL strategy that captures specific long-tail queries.
Zapier built approximately 75,000 integration pages, each describing how to connect two specific applications: “Connect Slack with Google Sheets”, “Automate Trello with Gmail”, and tens of thousands of other combinations. Each page contains real functional instructions: available triggers, possible actions, configuration steps, documented use cases. The format is always the same (the template); the data is always unique (the technical specifications of each integration). According to Ahrefs analysis, these pages account for the majority of the 4 million-plus monthly organic visits Zapier generates.
Airbnb operates at a different scale: millions of pages combining accommodation type, number of guests, features (with a pool, near the airport, pet-friendly) and geographic location. The “helpfulness” signal that saves these pages from HCU filtering is real-time data: live availability, updated prices, verified reviews from users who have actually stayed there. Without that dynamic data, the pages would be identical save for the place name.
In Spain, Idealista applies the same model to the property sector: pages at /[property-type]/[city]/[neighbourhood]/ with updated average prices per square metre, number of available listings, and functional search filters. The combination of proprietary market data, structured templates, and semantic URLs is the pattern that generates traffic for searches like “flats for sale in Gràcia Barcelona” without any manual copywriting.
The dataset as competitive advantage
The dataset is the differential asset in programmatic SEO. Not the template, which can be copied; not the URL architecture, which can be replicated. It is the unique data source that no competitor can duplicate exactly at the same quality and freshness.
The most effective dataset types for programmatic SEO fall into three categories. First, proprietary business data: product inventories with prices, availability, and technical specifications; histories of services delivered with verifiable results; operational data owned by the business. These are the most valuable because only you have them.
Second, licensed third-party data: market price APIs (property, financial, travel), open government data (employment statistics, registered company data), sector reference databases. These are accessible to anyone who pays the licence fee, which reduces the competitive advantage but preserves data quality.
Third, co-created data: user-generated content (reviews, verified FAQs, testimonials), product usage data. These require an active community but are high quality and difficult to replicate.
The most frequent mistake in failed programmatic projects is using variable text as a substitute for real data. Replacing “Barcelona has strong demand in the property market” with “Madrid has strong demand in the property market” across 50 city pages does not produce unique data — it produces 50 variations of the same generic assertion. Google detects this through engagement signals: if users from all cities abandon the page within the same eight seconds with no interaction, the content quality signal is identical for all of them.
HCU: when Google rewards and when it penalises programmatic content
The Helpful Content Update of September 2023 and its subsequent reinforcements have established two categories of response for programmatic SEO. Projects that survived or grew have in common: unique data per page, functional depth (the user can complete a task), verifiable E-E-A-T signals, and documented real demand for each combination. Projects that lost traffic massively have in common: variable text without real data, combinations with no search volume, and zero signals of experience or authority.
The domain signal is the most dangerous mechanism for programmatic projects. Google does not evaluate only page by page — it evaluates the proportion of “helpful” pages across the entire domain. A site that generates 100,000 thin pages alongside 10 high-quality blog articles may see those 10 quality pages lose positions due to contamination from the 100,000 poor-quality pages. This mechanism, documented in the Google Search Central blog, explains why poorly executed programmatic projects can damage the SEO of pre-existing, high-quality content.
The mitigation strategy is selective noindex: before launching all programmatic pages as indexable, publish them with <meta name="robots" content="noindex"> and monitor for 4–8 weeks which ones receive real organic traffic (visible in Search Console even with noindex active). Combinations with zero impressions in that period are candidates for removal from the sitemap or permanent noindex status.
Technical implementation: from dataset to indexed URL
The technical architecture of a programmatic project has four layers that must work correctly in coordination.
The first is the dataset: the data source in JSON, CSV, or relational database format. Each row corresponds to one page. The minimum fields are: slug (for the URL), title, unique variable data, and SEO metadata (description, image). The dataset must have update mechanisms — static data that is never refreshed generates pages that lose relevance over time.
The second is the templating engine. For projects built with Astro (like this site), getStaticPaths() consumes the dataset and generates one route per row, with correct slugification and trailing slash. The template defines the structure that receives the dataset data as props. Clear separation between data and presentation is fundamental: if you modify the template, all pages regenerate automatically with the new design.
The third is the URL structure. The /[type]/[modifier]/ pattern is the most widely used for programmatic projects. URLs must be semantic, descriptive, and avoid query parameters for indexable pages. The canonical must auto-generate for each page pointing to itself, preventing duplicates if the same data combination can be accessed via multiple routes.
The fourth is the segmented sitemap. For projects with more than 10,000 URLs, a single sitemap is inefficient. The recommended structure is a sitemap index with child sitemaps per page type (e.g. sitemap-ciudades.xml, sitemap-servicios.xml), each with a maximum of 50,000 URLs. Sitemaps must update automatically with each build or deploy.
For the Ighenatt project, this architecture is already implemented in the web architecture and structural SEO pages we generate for the 49 cities and 10 combined services. The same pattern is extensible to any proprietary dataset.
Validating demand and measuring success
The most avoidable mistake in programmatic SEO is generating pages for combinations with no search volume. Before launching the project, demand validation is the most important quality filter: for each [type] + [modifier] combination in the dataset, verify in Ahrefs, Google Keyword Planner, or Google Trends whether real search volume exists.
The minimum threshold depends on the project’s scale. For projects generating 500–2,000 pages, a volume of 50 monthly searches per combination is sufficient. For projects generating more than 10,000 pages, even 10 monthly searches per combination can be viable if the generation cost is low. For projects generating more than 100,000 pages, validation is especially critical: zero-volume combinations trigger the HCU domain signal.
The success metrics for a programmatic project go beyond aggregate traffic. The most important are: indexation rate (pages generated versus pages indexed in Search Console), ranking by template (average position of all pages of the same type), engagement signals by collection (time on page, bounce rate, conversion events), and crawl budget evolution (Googlebot requests in server logs divided by the number of pages generated).
Content pruning applied to programmatic SEO means reviewing the indexation rate by collection on a quarterly basis and removing or consolidating pages that have gone more than 90 days without any impressions in Search Console. A well-maintained programmatic project does not only grow — it actively prunes the combinations that are not working to protect the domain’s quality signal.
The connection with Entity SEO closes the loop: high-quality programmatic pages are precisely the ones that LLM retrieval bots prioritise when citing sources for specific searches. Unique data + clear structure + recognised entity = the citability formula for 2026. And before publishing that content at scale, keyword research with method determines exactly which dataset combinations have real demand before generating a single URL.
Share this article
If you found this content useful, share it with your colleagues.
Frequently Asked Questions
¿Con qué frecuencia publican contenido nuevo?
Publicamos artículos nuevos semanalmente, enfocados en las últimas tendencias de SEO técnico, casos de estudio reales y mejores prácticas. Suscríbete a nuestro newsletter para no perderte ninguna actualización.
¿Los consejos son aplicables a cualquier tipo de sitio web?
Nuestros consejos se adaptan a diferentes tipos de sitios: ecommerce, blogs, sitios corporativos y aplicaciones web. Siempre indicamos cuándo una técnica es específica para cierto tipo de sitio o requerimientos técnicos.
¿Puedo implementar estas técnicas yo mismo?
Muchas técnicas básicas puedes implementarlas tú mismo siguiendo nuestras guías paso a paso. Para optimizaciones avanzadas o auditorías completas, recomendamos consultar con especialistas en SEO técnico como nuestro equipo.
¿Ofrecen servicios de consultoría personalizada?
Sí, ofrecemos servicios de consultoría SEO técnica personalizada, auditorías completas y optimización integral. Contáctanos para discutir las necesidades específicas de tu proyecto y cómo podemos ayudarte.