安装部署
这里我们使用cdh版本的Sqoop,官网地址:
http://archive.cloudera.com/cdh5/cdh/5/
1
2
3
4
5
6
7
8- 下载
wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.7.0.tar.gz
- 解压
tar -zxf sqoop-1.4.6-cdh5.7.0.tar.gz -C ../app/
- 修改名称
mv sqoop-1.4.6-cdh5.7.0/ sqoop配置环境变量
1
2
3
4export SQOOP_HOME=/home/hadoop/app/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
source ~/.bash_profile
修改配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26- 进入conf目录
cd $SQOOP_HOME/conf
- 复制配置文件模板并改名
cp sqoop-env-template.sh sqoop-env.sh
- 编辑环境信息
vi sqoop-env.sh
- 解开hadoop和hive的注释,并配置正确的目录
------------
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/app/hive
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
------------导入jdbc驱动jar包
1
2
3
4
5
6到maven官网下载:https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.27
- 进入lib目录
cd $SQOOP_HOME/lib
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.27/mysql-connector-java-5.1.27.jar导入Json包
1
2
3
4
5- 下载
wget http://www.java2s.com/Code/JarDownload/java-json/java-json.jar.zip
- 解压
unzip java-json.jar.zip导入hive-exec-**.jar
1
cp $HIVE_HOME/lib/hive-exec-**.jar $SQOOP_HOME/lib
输入sqoop version检查是否安装成功
1
2
3
4
5
6
7
8
9
10
11
12
13> sqoop version
Warning: /home/hadoop/app/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/03/19 03:09:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
Sqoop 1.4.6-cdh5.7.0
git commit id
Compiled by jenkins on Wed Mar 23 11:30:51 PDT 2016
基础操作
查看mysql库信息
1
2
3sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root --password 123456查看mysql库内表信息
1
2
3sqoop list-tables \
--connect jdbc:mysql://localhost:3306/hiveDB \
--username root --password 123456
mysql导出数据到hdfs
sqoop指令
1
2
3
4
5
6
7
8
9sqoop import --connect jdbc:mysql://hadoop001:3306/hiveDB \
--username root --password 123456 \
--target-dir /user/hadoop/TBLS \
--table TBLS -m 1
解释:
--table:表名
--target-dir:hdfs目标路径
-m:需要启动的map数执行完毕后我们查看导出的数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19- 查看一下/user/hadoop目录,可以看到,生成了TBLS目录
[hadoop@hadoop001 app]$ hadoop fs -ls /user/hadoop/
19/03/19 03:31:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2019-03-19 03:28 /user/hadoop/TBLS
- 查看具体数据
[hadoop@hadoop001 app]$ hadoop fs -cat /user/hadoop/TBLS/part-m-00000
19/03/19 03:29:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1,1551466380,1,0,hadoop,0,1,d6_wc,MANAGED_TABLE,null,null
6,1551617903,6,0,hadoop,0,6,emp,MANAGED_TABLE,null,null
11,1551762781,1,0,hadoop,0,11,test_wc,MANAGED_TABLE,null,null
24,1551959859,11,0,hadoop,0,24,emp_ex,EXTERNAL_TABLE,null,null
25,1551959996,11,0,hadoop,0,25,emp,MANAGED_TABLE,null,null
26,1552137374,11,0,hadoop,0,26,test,MANAGED_TABLE,null,null
27,1552137466,11,0,hadoop,0,27,test1,MANAGED_TABLE,null,null
31,1552929882,11,0,hadoop,0,31,order_partition,MANAGED_TABLE,null,null
32,1552930859,11,0,hadoop,0,34,emp_static_partition,MANAGED_TABLE,null,null
36,1552987444,11,0,hadoop,0,42,emp_partition,MANAGED_TABLE,null,null
mysql导出数据到hive
hive中未建表
1
2
3
4
5
6
7
8
9
10sqoop import \
--connect jdbc:mysql://localhost:3306/hiveDB \
--username root --password 123456 \
--table DBS \
--hive-import \
--hive-database myhive \
--hive-table hiveDBS \
--fields-terminated-by '\t' \
--lines-terminated-by '\n' \
-m 2hive中已经建表
1 | - hive建一张表hive_table |
hdfs导出数据到mysql
在hive上新建一个表
1
2
3
4
5create table city_info(
city_id int,
city_name string,
area string
)row format delimited fields terminated by '\t';基础数据
city_info.txt
1
2
3
4
51 BEIJING NC
2 SHANGHAI EC
3 NANJING EC
4 GUANGZHOU SC
5 SANYA SC导入数据
1
load data local inpath '/home/hadoop/data/city_info.txt' into table city_info;
在mysql上建一个表
1
2
3
4
5CREATE TABLE `city_info` (
`city_id` int(11) DEFAULT NULL,
`city_name` varchar(255) DEFAULT NULL,
`area` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;hdfs导出数据到mysql中
1
2
3
4
5
6
7sqoop export \
--connect jdbc:mysql://hadoop001:3306/test \
--username root \
--password 123456 \
--table city_info \
--export-dir /user/hive/warehouse/myhive.db/city_info \
--input-fields-terminated-by '\t'
hive导出数据到mysql
- 这里笔者没有实验成功,有会的小伙伴可以联系我,非常感谢【手动抱拳】