行转列、列转行这两个技巧在校招面试中的出现率仅次于连续登录问题,特别是做报表开发和数据分析的岗位,基本上是必考内容。

今天就来彻底搞定这两个"数据变形"技巧,让你在面试中游刃有余!

为什么要学这两个技巧?

想象一下你刚入职的第一周,领导给你一个Excel文件,说:"小王,把这个学生成绩表处理一下,每个学生一行,科目做列,然后给我一个排名报告。"

如果你只会简单的SELECT,那就歇菜了。但掌握了行转列技巧,几分钟搞定!

再比如,数据库里存的是这样的宽表格式:

student_id | 数学 | 英语 | 语文
-----------|------|------|------
1001       | 85   | 92   | 78

但是做分析需要这样的长表格式:

student_id | subject | score
-----------|---------|------
1001       | 数学    | 85
1001       | 英语    | 92  
1001       | 语文    | 78

这就需要列转行了!

第一绝技:行转列 - 让数据横向展开

场景1:学生成绩报表

假设我们有这样的成绩数据:

sql

-- 创建成绩表
CREATE TABLE student_scores (
    student_id INT,
    student_name VARCHAR(50),
    subject VARCHAR(20),
    score INT,
    exam_date DATE
);

-- 插入示例数据
INSERT INTO student_scores VALUES
(1001, '张三', '数学', 85, '2024-01-15'),
(1001, '张三', '英语', 92, '2024-01-15'),  
(1001, '张三', '语文', 78, '2024-01-15'),
(1002, '李四', '数学', 90, '2024-01-15'),
(1002, '李四', '英语', 88, '2024-01-15'),
(1002, '李四', '语文', 95, '2024-01-15'),
(1003, '王五', '数学', 76, '2024-01-15'),
(1003, '王五', '英语', 82, '2024-01-15');

方法一:CASE WHEN大法(推荐,兼容性最好)

sql

-- 基础行转列:每个学生一行,科目做列
SELECT 
    student_id as 学号,
    student_name as 姓名,
    SUM(CASE WHEN subject = '数学' THEN score ELSE 0 END) as 数学,
    SUM(CASE WHEN subject = '英语' THEN score ELSE 0 END) as 英语,
    SUM(CASE WHEN subject = '语文' THEN score ELSE 0 END) as 语文,
    -- 计算总分和平均分
    SUM(score) as 总分,
    ROUND(AVG(score), 1) as 平均分,
    -- 排名
    RANK() OVER (ORDER BY SUM(score) DESC) as 总分排名
FROM student_scores
GROUP BY student_id, student_name
ORDER BY 总分 DESC;

结果:

学号  | 姓名 | 数学 | 英语 | 语文 | 总分 | 平均分 | 总分排名
------|------|------|------|------|------|--------|----------
1002  | 李四 | 90   | 88   | 95   | 273  | 91.0   | 1
1001  | 张三 | 85   | 92   | 78   | 255  | 85.0   | 2
1003  | 王五 | 76   | 82   | 0    | 158  | 79.0   | 3

进阶技巧:处理NULL值和缺考情况

sql

-- 更智能的行转列:区分0分和缺考
SELECT 
    student_id as 学号,
    student_name as 姓名,
    COALESCE(SUM(CASE WHEN subject = '数学' THEN score END), 0) as 数学,
    COALESCE(SUM(CASE WHEN subject = '英语' THEN score END), 0) as 英语,
    COALESCE(SUM(CASE WHEN subject = '语文' THEN score END), 0) as 语文,
    -- 参考科目数(判断是否有缺考)
    COUNT(DISTINCT subject) as 参考科目数,
    -- 只计算参考科目的平均分
    ROUND(AVG(score), 1) as 平均分,
    -- 标记是否有缺考
    CASE 
        WHEN COUNT(DISTINCT subject) < 3 THEN '有缺考'
        ELSE '全部参考'
    END as 考试状态
FROM student_scores
GROUP BY student_id, student_name
ORDER BY 平均分 DESC;

方法二:SQL Server的PIVOT函数

sql

-- SQL Server原生PIVOT语法(看起来更优雅)
SELECT 
    student_id,
    student_name,
    ISNULL([数学], 0) as 数学,
    ISNULL([英语], 0) as 英语,
    ISNULL([语文], 0) as 语文
FROM (
    SELECT student_id, student_name, subject, score
    FROM student_scores
) AS source_data
PIVOT (
    SUM(score)
    FOR subject IN ([数学], [英语], [语文])
) AS pivot_result
ORDER BY student_id;

实际业务场景:销售报表

sql

-- 销售数据按月份行转列(常见的业务需求)
SELECT 
    sales_person as 销售员,
    SUM(CASE WHEN MONTH(sale_date) = 1 THEN amount ELSE 0 END) as 一月,
    SUM(CASE WHEN MONTH(sale_date) = 2 THEN amount ELSE 0 END) as 二月,
    SUM(CASE WHEN MONTH(sale_date) = 3 THEN amount ELSE 0 END) as 三月,
    SUM(CASE WHEN MONTH(sale_date) = 4 THEN amount ELSE 0 END) as 四月,
    -- 统计信息
    SUM(amount) as 总销售额,
    COUNT(*) as 总订单数,
    ROUND(AVG(amount), 2) as 平均订单金额,
    -- 排名和等级
    RANK() OVER (ORDER BY SUM(amount) DESC) as 销售排名,
    CASE 
        WHEN SUM(amount) >= 100000 THEN 'A级'
        WHEN SUM(amount) >= 50000 THEN 'B级'
        WHEN SUM(amount) >= 20000 THEN 'C级'
        ELSE 'D级'
    END as 销售等级
FROM sales_data
WHERE YEAR(sale_date) = 2024
GROUP BY sales_person
ORDER BY 总销售额 DESC;

动态行转列(高级技巧)

当列数不固定时,需要动态生成SQL:

sql

-- MySQL的动态行转列示例
SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'SUM(CASE WHEN subject = ''',
      subject,
      ''' THEN score ELSE 0 END) AS ',
      REPLACE(subject, ' ', '_')
    )
  ) INTO @sql
FROM student_scores;

SET @sql = CONCAT('SELECT student_id, student_name, ', @sql, ' 
                   FROM student_scores 
                   GROUP BY student_id, student_name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

第二绝技:列转行 - 让数据纵向展开

场景:Excel导入的宽表数据

假设从Excel导入了这样的数据:

sql

-- 宽表格式的成绩数据
CREATE TABLE student_grades_wide (
    student_id INT,
    student_name VARCHAR(50),
    math_score INT,
    english_score INT,
    chinese_score INT,
    physics_score INT,
    chemistry_score INT
);

INSERT INTO student_grades_wide VALUES
(1001, '张三', 85, 92, 78, 88, 90),
(1002, '李四', 90, 88, 95, 85, 87),
(1003, '王五', 76, 82, NULL, 79, NULL);

方法一:UNION ALL大法(最通用)

sql

-- 把宽表变成长表,便于分析
SELECT student_id, student_name, '数学' as subject, math_score as score
FROM student_grades_wide
WHERE math_score IS NOT NULL

UNION ALL

SELECT student_id, student_name, '英语' as subject, english_score as score  
FROM student_grades_wide
WHERE english_score IS NOT NULL

UNION ALL

SELECT student_id, student_name, '语文' as subject, chinese_score as score
FROM student_grades_wide  
WHERE chinese_score IS NOT NULL

UNION ALL

SELECT student_id, student_name, '物理' as subject, physics_score as score
FROM student_grades_wide
WHERE physics_score IS NOT NULL

UNION ALL

SELECT student_id, student_name, '化学' as subject, chemistry_score as score
FROM student_grades_wide
WHERE chemistry_score IS NOT NULL

ORDER BY student_id, subject;

结果:

student_id | student_name | subject | score
-----------|--------------|---------|-------
1001       | 张三         | 化学    | 90
1001       | 张三         | 数学    | 85
1001       | 张三         | 物理    | 88
1001       | 张三         | 英语    | 92
1001       | 张三         | 语文    | 78
...

方法二:SQL Server的UNPIVOT函数

sql

-- SQL Server原生UNPIVOT语法
SELECT 
    student_id, 
    student_name, 
    subject,
    score
FROM (
    SELECT 
        student_id, 
        student_name, 
        math_score, 
        english_score, 
        chinese_score,
        physics_score,
        chemistry_score
    FROM student_grades_wide
) AS source_data
UNPIVOT (
    score FOR subject IN (
        math_score, 
        english_score, 
        chinese_score,
        physics_score,
        chemistry_score
    )
) AS unpivot_result
ORDER BY student_id, subject;

方法三:PostgreSQL的UNNEST方法

sql

-- PostgreSQL的优雅解法
SELECT 
    student_id,
    student_name,
    unnest(array['数学', '英语', '语文', '物理', '化学']) as subject,
    unnest(array[math_score, english_score, chinese_score, physics_score, chemistry_score]) as score
FROM student_grades_wide;

实际应用:多字段转换

sql

-- 员工多个联系方式列转行
SELECT employee_id, name, '手机号' as contact_type, mobile as contact_value
FROM employees
WHERE mobile IS NOT NULL AND mobile != ''

UNION ALL

SELECT employee_id, name, '邮箱' as contact_type, email as contact_value
FROM employees  
WHERE email IS NOT NULL AND email != ''

UNION ALL

SELECT employee_id, name, '座机' as contact_type, phone as contact_value
FROM employees
WHERE phone IS NOT NULL AND phone != ''

UNION ALL

SELECT employee_id, name, '微信' as contact_type, wechat as contact_value
FROM employees
WHERE wechat IS NOT NULL AND wechat != ''

ORDER BY employee_id, contact_type;

高级应用:时间序列数据转换

sql

-- 季度销售数据列转行
SELECT 
    sales_person,
    product_category,
    '第一季度' as period,
    q1_sales as sales_amount
FROM quarterly_sales
WHERE q1_sales IS NOT NULL

UNION ALL

SELECT 
    sales_person,
    product_category,
    '第二季度' as period,
    q2_sales as sales_amount
FROM quarterly_sales
WHERE q2_sales IS NOT NULL

UNION ALL

SELECT 
    sales_person,
    product_category,
    '第三季度' as period,
    q3_sales as sales_amount
FROM quarterly_sales
WHERE q3_sales IS NOT NULL

UNION ALL

SELECT 
    sales_person,
    product_category,
    '第四季度' as period,
    q4_sales as sales_amount
FROM quarterly_sales
WHERE q4_sales IS NOT NULL

ORDER BY sales_person, product_category, period;

组合技巧:行转列 + 列转行

有时候需要先列转行,再行转列:

sql

-- 先把宽表变长表
WITH normalized_scores AS (
    SELECT student_id, student_name, '数学' as subject, math_score as score
    FROM student_grades_wide WHERE math_score IS NOT NULL
    UNION ALL
    SELECT student_id, student_name, '英语' as subject, english_score as score
    FROM student_grades_wide WHERE english_score IS NOT NULL
    UNION ALL
    SELECT student_id, student_name, '语文' as subject, chinese_score as score
    FROM student_grades_wide WHERE chinese_score IS NOT NULL
),
-- 再按班级行转列
class_performance AS (
    SELECT 
        CASE 
            WHEN student_id BETWEEN 1001 AND 1020 THEN '一班'
            WHEN student_id BETWEEN 1021 AND 1040 THEN '二班'
            ELSE '三班'
        END as class_name,
        subject,
        AVG(score) as avg_score
    FROM normalized_scores
    GROUP BY 
        CASE 
            WHEN student_id BETWEEN 1001 AND 1020 THEN '一班'
            WHEN student_id BETWEEN 1021 AND 1040 THEN '二班'
            ELSE '三班'
        END,
        subject
)
SELECT 
    class_name as 班级,
    ROUND(AVG(CASE WHEN subject = '数学' THEN avg_score END), 1) as 数学平均分,
    ROUND(AVG(CASE WHEN subject = '英语' THEN avg_score END), 1) as 英语平均分,
    ROUND(AVG(CASE WHEN subject = '语文' THEN avg_score END), 1) as 语文平均分,
    ROUND(AVG(avg_score), 1) as 总体平均分
FROM class_performance
GROUP BY class_name
ORDER BY 总体平均分 DESC;

不同数据库的实现差异

支持情况对比

数据库PIVOTUNPIVOTCASE WHENUNION ALL
MySQL
PostgreSQLcrosstab函数
SQL Server
Oracle

PostgreSQL特殊语法

sql

-- PostgreSQL的crosstab函数(需要tablefunc扩展)
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM crosstab(
    'SELECT student_id, subject, score FROM student_scores ORDER BY 1,2',
    'VALUES (''数学''), (''英语''), (''语文'')'
) AS final_result(student_id int, 数学 int, 英语 int, 语文 int);

性能优化技巧

1. 索引优化

sql

-- 为行转列查询建索引
CREATE INDEX idx_scores_student_subject ON student_scores(student_id, subject);

-- 为列转行查询建索引  
CREATE INDEX idx_grades_student ON student_grades_wide(student_id);

2. 避免不必要的UNION ALL

sql

-- ❌ 低效:每个UNION ALL都扫描全表
SELECT student_id, '数学' as subject, math_score as score FROM student_grades_wide WHERE math_score IS NOT NULL
UNION ALL  
SELECT student_id, '英语' as subject, english_score as score FROM student_grades_wide WHERE english_score IS NOT NULL;

-- ✅ 高效:一次扫描,多个输出
SELECT student_id, subject, score
FROM student_grades_wide
CROSS JOIN (
    SELECT '数学' as subject, 1 as col_order
    UNION ALL SELECT '英语', 2
    UNION ALL SELECT '语文', 3
) subjects
CROSS APPLY (
    VALUES 
        (CASE WHEN subjects.subject = '数学' THEN math_score END),
        (CASE WHEN subjects.subject = '英语' THEN english_score END),
        (CASE WHEN subjects.subject = '语文' THEN chinese_score END)
) scores(score)
WHERE scores.score IS NOT NULL
ORDER BY student_id, col_order;

3. 大数据量处理

sql

-- 分批处理大表的列转行
WITH batch_data AS (
    SELECT * FROM student_grades_wide 
    WHERE student_id BETWEEN 1000 AND 2000  -- 分批处理
)
SELECT student_id, '数学' as subject, math_score as score
FROM batch_data WHERE math_score IS NOT NULL
UNION ALL
SELECT student_id, '英语' as subject, english_score as score
FROM batch_data WHERE english_score IS NOT NULL;

常见面试问题

Q1: 行转列和列转行有什么实际应用场景? A:

  • 行转列:报表展示、数据透视分析、对比不同维度的指标
  • 列转行:数据规范化、便于统计分析、ETL数据清洗

Q2: PIVOT和CASE WHEN哪个性能更好? A: 通常CASE WHEN性能更好,因为只需要一次表扫描。PIVOT在某些数据库中可能需要多次扫描。

Q3: 如何处理动态列数的行转列? A: 需要使用动态SQL,先查询出所有可能的列值,然后拼接成CASE WHEN语句执行。

Q4: UNION ALL和UNPIVOT哪个效率高? A: UNPIVOT通常效率更高,因为它只扫描一次表。UNION ALL需要多次扫描,但兼容性更好。

总结

行转列和列转行是数据处理中的两大基本功:

行转列核心要点:

  1. CASE WHEN + GROUP BY:最通用的方法,所有数据库都支持
  2. PIVOT函数:代码简洁,但只有部分数据库支持
  3. 性能关键:合适的索引和避免重复扫描

列转行核心要点:

  1. UNION ALL:最通用,兼容性最好
  2. UNPIVOT函数:性能较好,但支持有限
  3. 注意NULL值:要正确处理空值情况

实际应用建议:

  • 报表开发:多用行转列,让数据更直观
  • 数据分析:多用列转行,便于统计计算
  • 性能优化:大数据量时要考虑分批处理
  • 兼容性:优先选择CASE WHEN和UNION ALL

这两个技巧看似简单,但在实际工作中用处很大。特别是做BI开发、数据分析师的同学,几乎天天都要用到。

建议大家多练习,可以找一些真实的业务数据来练手,比如学校的成绩数据、公司的销售数据等。记住,熟能生巧!

面试的时候,除了会写代码,还要能说出什么时候用哪种方法,这样才能体现你的技术深度!💪