How QueryBear Caches Your Schema

Spencer Pauly

Dec 10, 2024•2 min read

For QueryBear's AI to write accurate SQL, it needs to know your database schema — table names, column names, types, and relationships. Here's how that works under the hood.

Initial sync. When you connect a database, we query the information schema to pull metadata. For Postgres, that's information_schema.tables, information_schema.columns, and constraint metadata for foreign keys. For MySQL, similar system tables. We're reading metadata, not your data.

What we store:

Table names and which schema they belong to
Column names, types, and nullability
Primary keys and foreign key relationships
Index information

What we don't store:

Actual row data
Column statistics or data distributions
Query results

Why cache at all? Sending your full schema to the AI with every query would be slow and expensive for large databases. Some databases have hundreds of tables. We cache the schema so we can quickly select the relevant tables for each query and include only those in the AI prompt.

Schema refresh. Your schema changes over time — new tables, renamed columns, added fields. You can trigger a manual refresh from the QueryBear dashboard. We also detect common errors (like referencing a column that doesn't exist) and suggest a refresh.

Smart table selection. For databases with many tables, we don't send everything to the AI. We use the question to identify which tables are likely relevant and include only those. This improves both speed and accuracy.

The goal is simple: give the AI enough context to write correct SQL, without ever touching your actual data.

Want to try what I'm building?