SQL Style Guide
Table of Contents
- Consistency
- Reserved Words
- Variable Names
- Aliasing
- Left Align Root Keywords
- Code Blocks
- Parentheses
- Boolean at the Beginning of Line
- Nested Queries
- About this Document
Consistency
From Pep8:
A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.
However, know when to be inconsistent -- sometimes style guide recommendations just aren't applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!
Reserved Words
Always use uppercase for reserved keywords like SELECT
, WHERE
, or AS
.
Variable Names
- Use consistent and descriptive identifiers and names.
- Use lower case names with underscores, such as
first_name
. Do not use CamelCase. - Presto functions, such as
cardinality
,approx_distinct
, orsubstr
, are identifiers and should be treated like variable names. - Names must begin with a letter and may not end in an underscore.
- Only use letters, numbers, and underscores in variable names.
Aliasing
Always include the AS
keyword when aliasing a variable,
it's easier to read when explicit.
Good
SELECT
substr(submission_date, 1, 6) AS month
FROM
main_summary
LIMIT
10
Bad
SELECT
substr(submission_date, 1, 6) month
FROM
main_summary
LIMIT
10
Left Align Root Keywords
Root keywords should all start on the same character boundary. This is counter to the common "rivers" pattern described here.
Good:
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
AND submission_date > '20180101'
LIMIT
10
Bad:
SELECT client_id,
submission_date
FROM main_summary
WHERE sample_id = '42'
AND submission_date > '20180101'
Code Blocks
Root keywords should be on their own line. For example:
Good:
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT
10
It's acceptable to include an argument on the same line as the root keyword, if there is exactly one argument.
Acceptable:
SELECT
client_id,
submission_date
FROM main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
Do not include multiple arguments on one line.
Bad:
SELECT client_id, submission_date
FROM main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
Bad
SELECT
client_id,
submission_date
FROM main_summary
WHERE submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
Parentheses
If parentheses span multiple lines:
- The opening parenthesis should terminate the line.
- The closing parenthesis should be lined up under the first character of the line that starts the multi-line construct.
- The contents of the parentheses should be indented one level.
For example:
Good
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42'
)
Bad (Terminating parenthesis on shared line)
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42')
Bad (No indent)
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42'
)
Boolean at the Beginning of Line
AND
and OR
should always be at the beginning of the line.
For example:
Good
...
WHERE
submission_date > 20180101
AND sample_id = '42'
Bad
...
WHERE
submission_date > 20180101 AND
sample_id = '42'
Nested Queries
Do not use nested queries. Instead, use common table expressions to improve readability.
Good:
WITH sample AS (
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
)
SELECT *
FROM sample
LIMIT 10
Bad:
SELECT *
FROM (
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
)
LIMIT 10
About this Document
This document was heavily influenced by https://www.sqlstyle.guide/
Changes to the style guide should be reviewed by at least one member of both the Data Engineering team and the Data Science team.