Monday, February 5, 2018

Hive


I am newbie in Hive and Big Data. Writing some basic hive commands for quick reference-



--================Show Databases===============
hive> show databases;
--================Create Database==============
hive> create database hive_db;
--================Drop Database================
hive> drop database hive_db;

--================Create Table=================
hive> create table emp_hive (empid int, empname string);

hive> show tables;
OK
emp_hive Time taken:" 0.115 seconds Fetched: 2 row(s)
--================Rename A Table===============

hive> ALTER table emp_hive RENAME to emp_hive_New;

hive> show tables:
OK
emp_hive_New
--================Drop table======================
hive> drop table emp_hive_New;

--================Table types and usage=================
Hive has two types of structures like Internal and External tables depending on loading and design of schema in Hive

Internal Tables: 
- First we create the table then load the data.
- This is "data on schema"
- By dropping this table, both data and schema will be removed
- the stored location of this table will be at /user/hive/warehouse

When to choose Internal Table ->
- If the the processing data is available in local file system
- If we want Hive to manage the complete lifecycle of data including the deletion

Create Internal Table:
hive> CREATE TABLE hive_internal_table (id INT, Name STRING);
Row format delimited
Fields terminated by '\t';

Load the Data into internal table 
hive> LOAD DATA INPATH '/user/arora_hive/data.txt ' INTO table hive_internal_table;

Display t he content of the table

hive> select * from hive_internal_table;

Drop the internal table
hive> DROP TABLE hive_internal_table;

when we drop the table, its data is also deleted from Hive


External Tables:
- Data will be available in HDFS. The table is going to create on HDFS data.
- Its schema on Data.
- When we drop the table, we drop the schema only. The data still remains in HDFS.

When to Choose External Table:
- If the processing data is available in HDFS.
- When the files are being used outside of HIVE.

Create External Table:
hive> CREATE EXTERNAL TABLE hive_external_table (id INT, Name string)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/arora_hive/hive_external_table';

If we do not specify the location at the time of table creation, we can load the data manually-

hive> LOAD DATA INPATH '/user/arora_hive/data.txt' INTO TABLE hive_external_table;

Display the content of the table
hive> select * from hive_external_table;

To drop the external table-
hive> DROP TABLE hive_external_table;

--=========Difference in Internal and External Tables
Feature Internal External

Schema Data on Schema Schema on Data
Storage Location /usr/hive/warehouse HDFS LOCATION
Data Availability Within Local File System Within HDFS



What is the difference in Order by and Sort by? Distributed by and cluster by?