Hivesql requires the number of consecutive purchase days and the number of days between the last purchase

Big data learning monk 2022-02-13 07:16:32 阅读数:591

hivesql requires number consecutive purchase

HSQL- demand : The number of consecutive purchase days and the number of days between the last purchase
Step 1: Aggregate raw data / duplicate removal
Step 2: Group and sort users
Step 3: Subtract the date from the serial number
Step 4: Get the start date and the number of consecutive days
Step 5: Use Hive in lead function
Step 1: Aggregate data / duplicate removal

The key point is to understand the problem-solving ideas

1. Preliminary aggregation or de duplication of data on a daily basis ( Because a user may have multiple purchases on a certain day )

select id,order_date from TABLE_NAME where d='10' group by id,order_date;

 Insert picture description here

2. Use row_number()over() Function windowing , Sort by user and time

select
*,row_number() over(partition by id order by order_date) rm
from (select id,order_date from TABLE_NAME where d='10' group by id,order_date) business

 Insert picture description here

3. Use date_sub function Login time minus group sequence number

select
*,date_sub(orderdate,rm) diff
from
(select
*,row_number() over(partition by id order by order_date) rm
from (select id,order_date from TABLE_NAME where d='10' group by id,order_date) a ) b

 Insert picture description here

4. Find out the continuous dates and continuous days of customers arriving at the store

select
id,continuous_start_date,count(continuous_start_date) as continuous_date
from(
select
id,min(order_date)over(partition by id,diff) as continuous_start_date
from
(select
*,date_sub(order_date,rm) diff
from
(select
*,row_number() over(partition by id order by order_date) rm
from (select id,order_date from TABLE_NAME where d='10' group by id,order_date) a ) b )c )d
group by name,continuous_start_date

 Insert picture description here
5: Use Hive in lead Function to find the number of days of customer interruption

lead() over() Bring the next line to the current line to start another field
lead( Field name to be adjusted , Next few lines , What to add to the spare part )
lag() over() Bring the previous line to the current line to start another field
lag( Field name to be adjusted , The last few lines , What to add to the spare part )
select
id,
continuous_start_date,
continuous_next_date,
continuous_date,
datediff(a.continuous_next_date,date_add(a.continuous_start_date,a.continuous_date)) as discontinuity_date
from
(
select
id,
continuous_start_date,
lead(continuous_start_date,1,0)over(partition by id,continuous_start_date) as continuous_next_date,
continuous_date
from(
select
id,continuous_start_date,count(continuous_start_date) as continuous_date
from(
select
id,min(order_date)over(partition by id,diff) as continuous_start_date
from
(select
*,date_sub(order_date,rm) diff
from
(select
*,row_number() over(partition by id order by order_date) rm
from (select id,order_date from TABLE_NAME where d='10' group by id,order_date) a ) b )c )d
group by name,continuous_start_date)e)f

 Insert picture description here

copyright:author[Big data learning monk],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130716301742.html