学习资源

思科

网络工程

华为

网络工程

红帽

系统运维

RHCSA

RHCE

RHCA

OpenStack

RHCVA

RHCSS

甲骨文

数据库

OCA

OCP

OCM

MySQL

微软

系统运维

MTA

MCSA

MCSE

软件开发

编程设计

Java

Android

HTML5

其他

Python

学习文章

当前位置：首页 > >学习文章 > >

{大数据}hbase

发布时间： 2018-01-30 16:26:29

1. 什么是hbase

HBASE是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统，利用HBASE技术可在廉价PC Server上搭建起大规模结构化存储集群。

HBASE的目标是存储并处理大型的数据，更具体来说是仅需使用普通的硬件配置，就能够处理由成千上万的行和列所组成的大型数据。

HBASE是Google Bigtable的开源实现，但是也有很多不同之处。比如：Google Bigtable利用GFS作为其文件存储系统，HBASE利用Hadoop HDFS作为其文件存储系统；Google运行MAPREDUCE来处理Bigtable中的海量数据，HBASE同样利用Hadoop MapReduce来处理HBASE中的海量数据；Google Bigtable利用Chubby作为协同服务，HBASE利用Zookeeper作为对应。

上图描述了Hadoop EcoSystem中的各层系统，其中HBase位于结构化存储层，Hadoop HDFS为HBase提供了高可靠性的底层存储支持，Hadoop MapReduce为HBase提供了高性能的计算能力，Zookeeper为HBase提供了稳定服务和failover机制。

此外，Pig和Hive还为HBase提供了高层语言支持，使得在HBase上进行数据统计处理变的非常简单。 Sqoop则为HBase提供了方便的RDBMS数据导入功能，使得传统数据库数据向HBase中迁移变的非常方便。

2. 与传统数据库的对比

1、传统数据库遇到的问题：

1）数据量很大的时候无法存储

2）没有很好的备份机制

3）数据达到一定数量开始缓慢，很大的话基本无法支撑

2、HBASE优势：

1）线性扩展，随着数据量增多可以通过节点扩展进行支撑

2）数据存储在hdfs上，备份机制健全

3）通过zookeeper协调查找数据，访问速度块。

3. hbase集群中的角色

1、一个或者多个主节点，Hmaster

2、多个从节点，HregionServer

安装部署Hbase在多台装有hadoop、zookeeper的机器上安装hbase，本课程是以hdp08、hdp09、hdp10三台机器为例子来讲解

1. hdp08、hdp09、hdp10三台机器分别安装JDK、HADOOP、ZOOKEEPER，并设置好环境变量

2. 从Apache网站上(http://hbase.apache.org/)下载HBase稳定发布包，本课程以hbase-1.1.12-bin.tar.gz为例，将hbase-1.1.12-bin.tar.gz上传到hdp08

3. 解压hbase-1.1.12-bin.tar.gz到/home/hadoop/apps目录下

[hadoop@hdp10 ~]$ tar zxvf hbase-1.1.12-bin.tar.gz -C apps

4. 将解压出来的文件夹名称修改成hbase

[hadoop@hdp10 apps]$ mv hbase-1.1.12 hbase

5. 设置环境变量

[root@hdp10 apps]# vi /etc/profile

HADOOP_HOME=/home/hadoop/apps/hadoop-2.8.1

JAVA_HOME=/opt/jdk1.8.0_121

ZOOKEEPER_HOME=/home/hadoop/apps/zookeeper

HBASE_HOME=/home/hadoop/apps/hbase

PATH=$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

export HBASE_HOME ZOOKEEPER_HOME HADOOP_HOME JAVA_HOME PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL[root@hdp10 apps]#source /etc/profile

6. 查看hbase版本

[hadoop@hdp10 apps]$ hbase version

7. 编辑$HBASE_HOME/conf/hbase-env.sh

[hadoop@hdp08 conf]$ vi hbase-env.sh

#/**

# *

# * Licensed to the Apache Software Foundation (ASF) under one

# * or more contributor license agreements. See the NOTICE file

# * distributed with this work for additional information

# * regarding copyright ownership. The ASF licenses this file

# * to you under the Apache License, Version 2.0 (the

# * "License"); you may not use this file except in compliance

# * with the License. You may obtain a copy of the License at

# *

# * http://www.apache.org/licenses/LICENSE-2.0

# *

# * Unless required by applicable law or agreed to in writing, software

# * distributed under the License is distributed on an "AS IS" BASIS,

# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# * See the License for the specific language governing permissions and

# * limitations under the License.

# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,

# so try to keep things idempotent unless you want to take an even deeper look

# into the startup scripts (bin/hbase, etc.)

# The java implementation to use. Java 1.6 required.

export JAVA_HOME=/opt/jdk1.8.0_121/

# Extra Java CLASSPATH elements. Optional.这行代码是错的，需要可以修改为下面的形式

#export HBASE_CLASSPATH=/home/hadoop/hbase/conf

export JAVA_CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

# The maximum amount of heap to use, in MB. Default is 1000.

# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.

# Below are what we set by default. May only work with SUN JVM.

# For more on why as well as other possible settings,

# see http://wiki.apache.org/hadoop/PerformanceTuning

export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Uncomment below to enable java garbage collection logging for the server-side processes

# this enables basic gc logging for the server processes to the .out file

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS"

# this enables gc logging using automatic GC log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. Either use this set of options or the one above

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M $HBASE_GC_OPTS"

# Uncomment below to enable java garbage collection logging for the client processes in the .out file.

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps $HBASE_GC_OPTS"

# Uncomment below (along with above GC logging) to put GC information in its own logfile (will set HBASE_GC_OPTS).

# This applies to both the server and client GC options above

# export HBASE_USE_GC_LOGFILE=true

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.

# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="

# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.

# Uncomment and adjust to enable JMX exporting

# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.

# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"

# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.

# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default.

# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options. Empty by default.

# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored. $HBASE_HOME/logs by default.

# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.

# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes. See 'man nice'.

# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.

# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands. Unset by default. This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false

8. 编辑hbase-site.xml

[hadoop@hdp08 conf]$ vi hbase-site.xml

<name>hbase.rootdir</name>

<value>hdfs://hdp08:9000/hbase</value>

</property>

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.zookeeper.quorum</name>

</property>

</configuration>

注意：hbase.zookeeper.quorum中的值应当是运行zookeeper的机器

9. 编辑regionservers

[hadoop@hdp08 conf]$ vi regionservers

hdp09

hdp10

10. 将hdp08的hbase发送到hdp09,hdp10

[hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp09:/home/hadoop/apps

[hadoop@hdp08 apps]$ scp -r hbase hadoop@hdp10:/home/hadoop/apps

11. 启动HBase

[hadoop@hdp08 apps]$ start-hbase.sh

启动本机hbase

[hadoop@hdp08 bin]$ hbase-daemon.sh start master

[hadoop@hdp08 bin]$hbase-daemon.sh start regionserver

12. 验证启动

1. 在hadoop节点使用jps查看节点状态

13. 查看启动状态信息

http://192.168.195.138:16010/

三、配置多台HMaster

在$HBASE_HOME/conf/ 目录下新增文件配置backup-masters，在其内添加要用做Backup Master的节点hostname。如下：

[hadoop@hdp08 conf]$ vi backup-masters

dhp09

QQ空间新浪微博腾讯微博人人网微信更多

上一篇： {大数据}HBase访问接口

下一篇： {大数据}HIVE基本操作

十八年老品牌

微信咨询：gz_togogo 咨询电话：18922156670 咨询网站客服：在线客服

网络技术

系统运维

数据库

云计算

安全

大数据

人工智能

项目管理

软件开发

其他

优选课程

高校合作

企业定制

考试中心

学习资源

关于我们

学习文章

{大数据}hbase

关于我们

联系我们

最新文章

客服热线

全国校区

友情链接

关注我们