I am newbie in Hive and Big Data. Writing some basic hive commands for quick reference-
--================Show Databases===============
hive> show databases;
--================Create Database==============
hive> create database hive_db;
--================Drop Database================
hive> drop database hive_db;
--================Create Table=================
hive> create table emp_hive (empid int, empname string);
hive> show tables;
OK
emp_hive Time taken:" 0.115 seconds Fetched: 2 row(s)
--================Rename A Table===============
hive> ALTER table emp_hive RENAME to emp_hive_New;
hive> show tables:
OK
emp_hive_New
--================Drop table======================
hive> drop table emp_hive_New;
--================Table types and usage=================
Hive has two types of structures like Internal and External tables depending on loading and design of schema in Hive
Internal Tables:
- First we create the table then load the data.
- This is "data on schema"
- By dropping this table, both data and schema will be removed
- the stored location of this table will be at /user/hive/warehouse
When to choose Internal Table ->
- If the the processing data is available in local file system
- If we want Hive to manage the complete lifecycle of data including the deletion
Create Internal Table:
hive> CREATE TABLE hive_internal_table (id INT, Name STRING);
Row format delimited
Fields terminated by '\t';
Load the Data into internal table
hive> LOAD DATA INPATH '/user/arora_hive/data.txt ' INTO table hive_internal_table;
Display t he content of the table
hive> select * from hive_internal_table;
Drop the internal table
hive> DROP TABLE hive_internal_table;
when we drop the table, its data is also deleted from Hive
External Tables:
- Data will be available in HDFS. The table is going to create on HDFS data.
- Its schema on Data.
- When we drop the table, we drop the schema only. The data still remains in HDFS.
When to Choose External Table:
- If the processing data is available in HDFS.
- When the files are being used outside of HIVE.
Create External Table:
hive> CREATE EXTERNAL TABLE hive_external_table (id INT, Name string)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/arora_hive/hive_external_table';
If we do not specify the location at the time of table creation, we can load the data manually-
hive> LOAD DATA INPATH '/user/arora_hive/data.txt' INTO TABLE hive_external_table;
Display the content of the table
hive> select * from hive_external_table;
To drop the external table-
hive> DROP TABLE hive_external_table;
--=========Difference in Internal and External Tables
Feature Internal External
Schema Data on Schema Schema on Data
Storage Location /usr/hive/warehouse HDFS LOCATION
Data Availability Within Local File System Within HDFS
What is the difference in Order by and Sort by? Distributed by and cluster by?