Shockang 2022-01-27 05:02:57 阅读数:488
This is my participation 8 The fourth of the yuegengwen challenge 8 God , Check out the activity details :8 Yuegengwen challenge
Global ordering , only one reduce
Use order by Clause ordering
order by Clause in select End of statement
distribute by similar MapReduce in partition,== collection hash Algorithm , stay map The end will query the results in hash Results with the same value are distributed to the corresponding reduce In file ==. Need to combine sort by Use .
Be careful : Hive requirement distribute by The statement is written in sort by The statement before .
When distribute by and sort by Same field , have access to cluster by The way
except distribute by Function outside , It also sorts the fields , therefore cluster by = distribute by + sort by
-- The following two ways are equivalent
insert overwrite local directory '/home/hadoop/hivedata/distribute_sort'
select * from student distribute by score sort by score;
insert overwrite local directory '/home/hadoop/hivedata/cluster'
select * from student cluster by score;
Copy code
select * from student s order by score desc;
Copy code
select s.sid,s.tname, avg(score) as score_avg from student s group by s.sid,s.tname order by score_avg desc;
Copy code
select * from student s order by score,age;
Copy code
sort by: Every reducer Sort internally , Not sort for global result sets .
set mapreduce.job.reduces=3;
Copy code
set mapreduce.job.reduces;
Copy code
select * from student s sort by s.score;
Copy code
insert overwrite local directory '/home/hadoop/hivedata/sort' select * from student s sort by s.score;
Copy code
set mapreduce.job.reduces=3;
Copy code
insert overwrite local directory '/home/hadoop/hivedata/distribute' select * from student distribute by sid sort by score;
Copy code
copyright:author[Shockang],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/01/202201270502509068.html