行转列、列转行这两个技巧在校招面试中的出现率仅次于连续登录问题,特别是做报表开发和数据分析的岗位,基本上是必考内容。
今天就来彻底搞定这两个"数据变形"技巧,让你在面试中游刃有余!
为什么要学这两个技巧?
想象一下你刚入职的第一周,领导给你一个Excel文件,说:"小王,把这个学生成绩表处理一下,每个学生一行,科目做列,然后给我一个排名报告。"
如果你只会简单的SELECT,那就歇菜了。但掌握了行转列技巧,几分钟搞定!
再比如,数据库里存的是这样的宽表格式:
student_id | 数学 | 英语 | 语文
-----------|------|------|------
1001 | 85 | 92 | 78
但是做分析需要这样的长表格式:
student_id | subject | score
-----------|---------|------
1001 | 数学 | 85
1001 | 英语 | 92
1001 | 语文 | 78
这就需要列转行了!
第一绝技:行转列 - 让数据横向展开
场景1:学生成绩报表
假设我们有这样的成绩数据:
sql
-- 创建成绩表
CREATE TABLE student_scores (
student_id INT,
student_name VARCHAR(50),
subject VARCHAR(20),
score INT,
exam_date DATE
);
-- 插入示例数据
INSERT INTO student_scores VALUES
(1001, '张三', '数学', 85, '2024-01-15'),
(1001, '张三', '英语', 92, '2024-01-15'),
(1001, '张三', '语文', 78, '2024-01-15'),
(1002, '李四', '数学', 90, '2024-01-15'),
(1002, '李四', '英语', 88, '2024-01-15'),
(1002, '李四', '语文', 95, '2024-01-15'),
(1003, '王五', '数学', 76, '2024-01-15'),
(1003, '王五', '英语', 82, '2024-01-15');
方法一:CASE WHEN大法(推荐,兼容性最好)
sql
-- 基础行转列:每个学生一行,科目做列
SELECT
student_id as 学号,
student_name as 姓名,
SUM(CASE WHEN subject = '数学' THEN score ELSE 0 END) as 数学,
SUM(CASE WHEN subject = '英语' THEN score ELSE 0 END) as 英语,
SUM(CASE WHEN subject = '语文' THEN score ELSE 0 END) as 语文,
-- 计算总分和平均分
SUM(score) as 总分,
ROUND(AVG(score), 1) as 平均分,
-- 排名
RANK() OVER (ORDER BY SUM(score) DESC) as 总分排名
FROM student_scores
GROUP BY student_id, student_name
ORDER BY 总分 DESC;
结果:
学号 | 姓名 | 数学 | 英语 | 语文 | 总分 | 平均分 | 总分排名
------|------|------|------|------|------|--------|----------
1002 | 李四 | 90 | 88 | 95 | 273 | 91.0 | 1
1001 | 张三 | 85 | 92 | 78 | 255 | 85.0 | 2
1003 | 王五 | 76 | 82 | 0 | 158 | 79.0 | 3
进阶技巧:处理NULL值和缺考情况
sql
-- 更智能的行转列:区分0分和缺考
SELECT
student_id as 学号,
student_name as 姓名,
COALESCE(SUM(CASE WHEN subject = '数学' THEN score END), 0) as 数学,
COALESCE(SUM(CASE WHEN subject = '英语' THEN score END), 0) as 英语,
COALESCE(SUM(CASE WHEN subject = '语文' THEN score END), 0) as 语文,
-- 参考科目数(判断是否有缺考)
COUNT(DISTINCT subject) as 参考科目数,
-- 只计算参考科目的平均分
ROUND(AVG(score), 1) as 平均分,
-- 标记是否有缺考
CASE
WHEN COUNT(DISTINCT subject) < 3 THEN '有缺考'
ELSE '全部参考'
END as 考试状态
FROM student_scores
GROUP BY student_id, student_name
ORDER BY 平均分 DESC;
方法二:SQL Server的PIVOT函数
sql
-- SQL Server原生PIVOT语法(看起来更优雅)
SELECT
student_id,
student_name,
ISNULL([数学], 0) as 数学,
ISNULL([英语], 0) as 英语,
ISNULL([语文], 0) as 语文
FROM (
SELECT student_id, student_name, subject, score
FROM student_scores
) AS source_data
PIVOT (
SUM(score)
FOR subject IN ([数学], [英语], [语文])
) AS pivot_result
ORDER BY student_id;
实际业务场景:销售报表
sql
-- 销售数据按月份行转列(常见的业务需求)
SELECT
sales_person as 销售员,
SUM(CASE WHEN MONTH(sale_date) = 1 THEN amount ELSE 0 END) as 一月,
SUM(CASE WHEN MONTH(sale_date) = 2 THEN amount ELSE 0 END) as 二月,
SUM(CASE WHEN MONTH(sale_date) = 3 THEN amount ELSE 0 END) as 三月,
SUM(CASE WHEN MONTH(sale_date) = 4 THEN amount ELSE 0 END) as 四月,
-- 统计信息
SUM(amount) as 总销售额,
COUNT(*) as 总订单数,
ROUND(AVG(amount), 2) as 平均订单金额,
-- 排名和等级
RANK() OVER (ORDER BY SUM(amount) DESC) as 销售排名,
CASE
WHEN SUM(amount) >= 100000 THEN 'A级'
WHEN SUM(amount) >= 50000 THEN 'B级'
WHEN SUM(amount) >= 20000 THEN 'C级'
ELSE 'D级'
END as 销售等级
FROM sales_data
WHERE YEAR(sale_date) = 2024
GROUP BY sales_person
ORDER BY 总销售额 DESC;
动态行转列(高级技巧)
当列数不固定时,需要动态生成SQL:
sql
-- MySQL的动态行转列示例
SET @sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'SUM(CASE WHEN subject = ''',
subject,
''' THEN score ELSE 0 END) AS ',
REPLACE(subject, ' ', '_')
)
) INTO @sql
FROM student_scores;
SET @sql = CONCAT('SELECT student_id, student_name, ', @sql, '
FROM student_scores
GROUP BY student_id, student_name');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
第二绝技:列转行 - 让数据纵向展开
场景:Excel导入的宽表数据
假设从Excel导入了这样的数据:
sql
-- 宽表格式的成绩数据
CREATE TABLE student_grades_wide (
student_id INT,
student_name VARCHAR(50),
math_score INT,
english_score INT,
chinese_score INT,
physics_score INT,
chemistry_score INT
);
INSERT INTO student_grades_wide VALUES
(1001, '张三', 85, 92, 78, 88, 90),
(1002, '李四', 90, 88, 95, 85, 87),
(1003, '王五', 76, 82, NULL, 79, NULL);
方法一:UNION ALL大法(最通用)
sql
-- 把宽表变成长表,便于分析
SELECT student_id, student_name, '数学' as subject, math_score as score
FROM student_grades_wide
WHERE math_score IS NOT NULL
UNION ALL
SELECT student_id, student_name, '英语' as subject, english_score as score
FROM student_grades_wide
WHERE english_score IS NOT NULL
UNION ALL
SELECT student_id, student_name, '语文' as subject, chinese_score as score
FROM student_grades_wide
WHERE chinese_score IS NOT NULL
UNION ALL
SELECT student_id, student_name, '物理' as subject, physics_score as score
FROM student_grades_wide
WHERE physics_score IS NOT NULL
UNION ALL
SELECT student_id, student_name, '化学' as subject, chemistry_score as score
FROM student_grades_wide
WHERE chemistry_score IS NOT NULL
ORDER BY student_id, subject;
结果:
student_id | student_name | subject | score
-----------|--------------|---------|-------
1001 | 张三 | 化学 | 90
1001 | 张三 | 数学 | 85
1001 | 张三 | 物理 | 88
1001 | 张三 | 英语 | 92
1001 | 张三 | 语文 | 78
...
方法二:SQL Server的UNPIVOT函数
sql
-- SQL Server原生UNPIVOT语法
SELECT
student_id,
student_name,
subject,
score
FROM (
SELECT
student_id,
student_name,
math_score,
english_score,
chinese_score,
physics_score,
chemistry_score
FROM student_grades_wide
) AS source_data
UNPIVOT (
score FOR subject IN (
math_score,
english_score,
chinese_score,
physics_score,
chemistry_score
)
) AS unpivot_result
ORDER BY student_id, subject;
方法三:PostgreSQL的UNNEST方法
sql
-- PostgreSQL的优雅解法
SELECT
student_id,
student_name,
unnest(array['数学', '英语', '语文', '物理', '化学']) as subject,
unnest(array[math_score, english_score, chinese_score, physics_score, chemistry_score]) as score
FROM student_grades_wide;
实际应用:多字段转换
sql
-- 员工多个联系方式列转行
SELECT employee_id, name, '手机号' as contact_type, mobile as contact_value
FROM employees
WHERE mobile IS NOT NULL AND mobile != ''
UNION ALL
SELECT employee_id, name, '邮箱' as contact_type, email as contact_value
FROM employees
WHERE email IS NOT NULL AND email != ''
UNION ALL
SELECT employee_id, name, '座机' as contact_type, phone as contact_value
FROM employees
WHERE phone IS NOT NULL AND phone != ''
UNION ALL
SELECT employee_id, name, '微信' as contact_type, wechat as contact_value
FROM employees
WHERE wechat IS NOT NULL AND wechat != ''
ORDER BY employee_id, contact_type;
高级应用:时间序列数据转换
sql
-- 季度销售数据列转行
SELECT
sales_person,
product_category,
'第一季度' as period,
q1_sales as sales_amount
FROM quarterly_sales
WHERE q1_sales IS NOT NULL
UNION ALL
SELECT
sales_person,
product_category,
'第二季度' as period,
q2_sales as sales_amount
FROM quarterly_sales
WHERE q2_sales IS NOT NULL
UNION ALL
SELECT
sales_person,
product_category,
'第三季度' as period,
q3_sales as sales_amount
FROM quarterly_sales
WHERE q3_sales IS NOT NULL
UNION ALL
SELECT
sales_person,
product_category,
'第四季度' as period,
q4_sales as sales_amount
FROM quarterly_sales
WHERE q4_sales IS NOT NULL
ORDER BY sales_person, product_category, period;
组合技巧:行转列 + 列转行
有时候需要先列转行,再行转列:
sql
-- 先把宽表变长表
WITH normalized_scores AS (
SELECT student_id, student_name, '数学' as subject, math_score as score
FROM student_grades_wide WHERE math_score IS NOT NULL
UNION ALL
SELECT student_id, student_name, '英语' as subject, english_score as score
FROM student_grades_wide WHERE english_score IS NOT NULL
UNION ALL
SELECT student_id, student_name, '语文' as subject, chinese_score as score
FROM student_grades_wide WHERE chinese_score IS NOT NULL
),
-- 再按班级行转列
class_performance AS (
SELECT
CASE
WHEN student_id BETWEEN 1001 AND 1020 THEN '一班'
WHEN student_id BETWEEN 1021 AND 1040 THEN '二班'
ELSE '三班'
END as class_name,
subject,
AVG(score) as avg_score
FROM normalized_scores
GROUP BY
CASE
WHEN student_id BETWEEN 1001 AND 1020 THEN '一班'
WHEN student_id BETWEEN 1021 AND 1040 THEN '二班'
ELSE '三班'
END,
subject
)
SELECT
class_name as 班级,
ROUND(AVG(CASE WHEN subject = '数学' THEN avg_score END), 1) as 数学平均分,
ROUND(AVG(CASE WHEN subject = '英语' THEN avg_score END), 1) as 英语平均分,
ROUND(AVG(CASE WHEN subject = '语文' THEN avg_score END), 1) as 语文平均分,
ROUND(AVG(avg_score), 1) as 总体平均分
FROM class_performance
GROUP BY class_name
ORDER BY 总体平均分 DESC;
不同数据库的实现差异
支持情况对比
数据库 | PIVOT | UNPIVOT | CASE WHEN | UNION ALL |
---|---|---|---|---|
MySQL | ❌ | ❌ | ✅ | ✅ |
PostgreSQL | crosstab函数 | ❌ | ✅ | ✅ |
SQL Server | ✅ | ✅ | ✅ | ✅ |
Oracle | ✅ | ✅ | ✅ | ✅ |
PostgreSQL特殊语法
sql
-- PostgreSQL的crosstab函数(需要tablefunc扩展)
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM crosstab(
'SELECT student_id, subject, score FROM student_scores ORDER BY 1,2',
'VALUES (''数学''), (''英语''), (''语文'')'
) AS final_result(student_id int, 数学 int, 英语 int, 语文 int);
性能优化技巧
1. 索引优化
sql
-- 为行转列查询建索引
CREATE INDEX idx_scores_student_subject ON student_scores(student_id, subject);
-- 为列转行查询建索引
CREATE INDEX idx_grades_student ON student_grades_wide(student_id);
2. 避免不必要的UNION ALL
sql
-- ❌ 低效:每个UNION ALL都扫描全表
SELECT student_id, '数学' as subject, math_score as score FROM student_grades_wide WHERE math_score IS NOT NULL
UNION ALL
SELECT student_id, '英语' as subject, english_score as score FROM student_grades_wide WHERE english_score IS NOT NULL;
-- ✅ 高效:一次扫描,多个输出
SELECT student_id, subject, score
FROM student_grades_wide
CROSS JOIN (
SELECT '数学' as subject, 1 as col_order
UNION ALL SELECT '英语', 2
UNION ALL SELECT '语文', 3
) subjects
CROSS APPLY (
VALUES
(CASE WHEN subjects.subject = '数学' THEN math_score END),
(CASE WHEN subjects.subject = '英语' THEN english_score END),
(CASE WHEN subjects.subject = '语文' THEN chinese_score END)
) scores(score)
WHERE scores.score IS NOT NULL
ORDER BY student_id, col_order;
3. 大数据量处理
sql
-- 分批处理大表的列转行
WITH batch_data AS (
SELECT * FROM student_grades_wide
WHERE student_id BETWEEN 1000 AND 2000 -- 分批处理
)
SELECT student_id, '数学' as subject, math_score as score
FROM batch_data WHERE math_score IS NOT NULL
UNION ALL
SELECT student_id, '英语' as subject, english_score as score
FROM batch_data WHERE english_score IS NOT NULL;
常见面试问题
Q1: 行转列和列转行有什么实际应用场景? A:
- 行转列:报表展示、数据透视分析、对比不同维度的指标
- 列转行:数据规范化、便于统计分析、ETL数据清洗
Q2: PIVOT和CASE WHEN哪个性能更好? A: 通常CASE WHEN性能更好,因为只需要一次表扫描。PIVOT在某些数据库中可能需要多次扫描。
Q3: 如何处理动态列数的行转列? A: 需要使用动态SQL,先查询出所有可能的列值,然后拼接成CASE WHEN语句执行。
Q4: UNION ALL和UNPIVOT哪个效率高? A: UNPIVOT通常效率更高,因为它只扫描一次表。UNION ALL需要多次扫描,但兼容性更好。
总结
行转列和列转行是数据处理中的两大基本功:
行转列核心要点:
- CASE WHEN + GROUP BY:最通用的方法,所有数据库都支持
- PIVOT函数:代码简洁,但只有部分数据库支持
- 性能关键:合适的索引和避免重复扫描
列转行核心要点:
- UNION ALL:最通用,兼容性最好
- UNPIVOT函数:性能较好,但支持有限
- 注意NULL值:要正确处理空值情况
实际应用建议:
- 报表开发:多用行转列,让数据更直观
- 数据分析:多用列转行,便于统计计算
- 性能优化:大数据量时要考虑分批处理
- 兼容性:优先选择CASE WHEN和UNION ALL
这两个技巧看似简单,但在实际工作中用处很大。特别是做BI开发、数据分析师的同学,几乎天天都要用到。
建议大家多练习,可以找一些真实的业务数据来练手,比如学校的成绩数据、公司的销售数据等。记住,熟能生巧!
面试的时候,除了会写代码,还要能说出什么时候用哪种方法,这样才能体现你的技术深度!💪
Comments