ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 11주 3일차 TIL 정리
    웅진 STARTERS 부트캠프 2023. 4. 23. 20:14

    11주 3일차엔 해커톤 전 마지막으로 SQL 미니프로젝트를 진행했다. 


    데이터 파악


    1. 데이터 종류: 브라질 OLIST사의 전자상거래 데이터

     

    2. ERD

     

    3. 특이사항 정리

    - customer_id는 고객마다의 값이 아니다.

       count(customer_id) = count(order_id)

      → customer_unique_id를 식별자로 사용할 것.

     


    ※ 주제 선정 및 필요 컬럼 추출


    1. 주제: 배송기간 단축, 운임비 절감을 위한 물류창고 신설 및 배치물품 선정

     

    2. 필요컬럼 선정 및 cte절 생성

    WITH cte_joined AS (
    SELECT p.product_id, p.product_category_name, o.order_id,
    	o.order_purchase_timestamp 구매일시, o.order_delivered_customer_date 배송완료일시,
    	o.order_estimated_delivery_date 예상배송일시, oi.shipping_limit_date 배송목표일시,
    	oi.price, oi.freight_value, c.customer_unique_id, c.customer_zip_code, c.customer_city,
    	c.customer_state, r.review_score, op.payment_value, t.product_category_name_english 카테고리명,
    	o.order_delivered_customer_date - o.order_purchase_timestamp 전체배송소요기간,
    	o.order_delivered_customer_date - o.order_estimated_delivery_date 예상일차이,
    	o.order_delivered_customer_date - oi.shipping_limit_date AS 제한일경과,
    	case
    	when date_part('day', oi.shipping_limit_date - o.order_delivered_customer_date) < 0 then 1
    	WHEN date_part('day', oi.shipping_limit_date - o.order_delivered_customer_date) >=0 then 0
    	end as 제한일경과여부
    FROM order_items oi LEFT JOIN products p ON p.product_id = oi.product_id
    					LEFT JOIN orders o ON o.order_id = oi.order_id
    					LEFT JOIN customers c ON c.customer_id = o.customer_id
    					LEFT JOIN reviews r ON r.order_id = o.order_id
    					LEFT JOIN order_payments op ON op.order_id = o.order_id
    					LEFT JOIN product_category_name_translation t 
                             ON p.product_category_name = t.product_category_name

     

     

     

     

    3. 컬럼별 데이터 개수 파악

    SELECT count(*) 전체, count(product_id) 제품아이디,
        count(product_category_name) 카테고리명, count(order_id) 주문아이디,
        count(구매일시) 구매일시, count(배송완료일시) 배송완료일시,
        count(예상배송일시) 예상배송일시, count(배송목표일시) 배송목표일시,
        count(price) 가격, count(freight_value) 운임비, 
        count(customer_unique_id) customer_unique_id, 
        count(customer_zip_code) customer_zip_code, count(customer_city)customer_city,
        count(customer_state)customer_state, count(review_score)리뷰점수, 
        count(payment_value)운임비더하기가격, count(카테고리명)카테고리명,
        count(전체배송소요기간)전체배송소요기간,
        count(예상일차이)예상일차이,
        count(제한일경과)제한일경과
    FROM cte_joined

     

     

    4. 카테고리명이 존재하지 않는 제품명 탐색

    SELECT * 
    FROM product_category_name_translation t RIGHT join products p 
    on p.product_category_name  = t.product_category_name 
    WHERE p.product_category_name IS NULL

    총 610개 데이터 존재, 소거하고 카운트하기로 결정.

    댓글

Designed by Tistory.