Redis特殊类型数据结构Bitmap、HyperLogLog、GEO的使用及场景分析_Redis

1.概述

上文讲解了redis五种基础数据类型的使用及场景，本文将分析redis的3中特殊数据类型（bitmap、hyperloglog、geo），这三种类型在特定场景下能有效提升数据处理效率、存储效率等。

2.数据类型详解

2.1 bitmap

bitmap是一个由位（bit）组成的图（map）。在计算机科学中，位一般只有两种状态：0或1，通常用来表示布尔值的真（true）或假（false）。redis中的bitmap是基于string类型实现的，一个字符串的每个字节(8位)可以表示8个不同位，从而实现了位数组的功能。

2.1.1 bitmap常用指令

命令	说明
setbit key offset value	设置指定offset位置的值
getbit key offset	获取指定offset位置的值
bitcount key start end	统计指定范围内值为 1 的元素个数
bitpos key bit start end	返回第一个被设置为bit值的位的位置
bitop operation destkey key1 key2 …	设置指定offset位置的值

2.1.2 bitmap指令实测

> setbit sign 5 1
0
> setbit sign 3 1
0
> getbit sign 0
0
> getbit sign 3
1
> bitcount sign 0 5
2
> bitpos sign 1 0 5
3
> getbit sign 3
1
> setbit sign1 3 1
0
> setbit sign1 4 1
0
> bitop and sign2 sign sign1
1
> getbit sign2 3
1
> getbit sign2 4
0
> getbit sign2 5
0
> bitop or sign3 sign sign1
1
> getbit sign3 4
1

2.1.3 bitmap使用场景

1.‌活跃用户统计
例如，可以用来记录网站的访问次数、用户登录次数等。
使用场景：使用日期作为 key，然后用户 id 为 offset，如果当日活跃过就设置为1。 ‌
2.用户行为统计
例如，文章评论、点赞等行为统计。
使用场景：用文章id作为key，用户id为offset，如果当日评论、点赞过就设置为1。 ‌
3.实现布隆过滤器
布隆过滤器是一种空间效率高的概率性数据结构，用于判断元素是否存在于集合中。它在大数据、缓存穿透防护、垃圾邮件过滤等场景中广泛应用。布隆过滤器可能存在误判，但它能以极小的内存代价完成高效的查询。

2.2 hyperloglog（基数统计）

2.2.1 hyperloglog常用指令

redis在2.8.9版本引入了hyperloglog 结构，hyperloglog做数据统计的优势在于：在输入元素的数量或者体积非常非常大时，计算基数所需的空间总是固定的、并且占用内存很小。
hyperloglog 是一种有名的基数计数概率算法，并非是redis独有，redis只是基于该算法提供了一些通用api，并且对 hyperloglog 的存储进行了优化，在计数比较小时，它的存储空间采用稀疏矩阵存储，空间占用很小，仅仅在计数慢慢变大，稀疏矩阵占用空间渐渐超过了阈值时才会一次性转变成稠密矩阵，才会占用 12k 的空间。
基数计数概率算法为了节省内存并不会直接存储元数据，而是通过一定的概率统计方法预估基数值（集合中包含元素的个数）。因此， hyperloglog 的计数结果并不是一个精确值，存在一定的误差（标准误差为 0.81% ）

命令	说明
pfadd key element1 element2 …	添加一个或多个元素到 hyperloglog 中
pfcount key1 key2	获取一个或者多个 hyperloglog 的唯一计数
pfmerge destkey sourcekey1 sourcekey2 …	将多个 hyperloglog 合并到 destkey 中，destkey 会结合多个源，算出对应的唯一计数

2.2.2 hyperloglog指令实测

> pfadd chars a b c d e
1
> pfadd chars f g
1
> pfcount chars
7
> pfadd nums 1 2 3
1
> pfcount chars nums
10
> pfmerge destination chars nums
ok
> pfcount destination
10

2.2.3 hyperloglog使用场景

1.‌活跃用户统计
例如，计算网站的日活、7日活、月活数据等。
使用场景：将关键字+时间作为key（daylive+20251217），将活跃用户userid作为element，计算某一天的日活，只需要执行 daylive+20251217即可。每个月的第一天，执行 pfmerge 将上一个月的所有数据合并成一个 hyperloglog（monthlive_202512），再执行 pfcount monthlive_202512，就得到了 12 月的月活数据。
2.统计注册 ip 数、统计在线用户、统计用户每天搜索不同词条的个数这些场景利用hyperloglog均能实现，原理类似

2.3 geo

redis 的 geospatial 基于 sorted set 实现提供了一种有效的方式来存储地理空间信息，例如地理位置坐标（经度和纬度）以及与之相关的数据。通过 geo 我们可以轻松实现两个位置距离的计算、获取指定位置附近的元素等功能。

2.3.1 常用指令

命令	说明
geoadd key longitude latitude member …	将一个或多个成员的地理位置（经度和纬度）添加到指定的有序集合中
geopos key member1 member2 …	返回指定元素的经纬度信息
geodist key member1 member2 m/km/ft/mi	返回两个给定元素之间的距离，m/km/ft/mi: 指定半径的单位，可以是米（m）、千米（km）、英里（mi）、或英尺（ft）
georadius key longitude latitude radius m/km/ft/mi	获取给定的经纬度为中心，返回与中心的距离不超过给定最大距离的所有位置元素，支持 asc(由近到远)、desc（由远到近）、count(数量) 等参数
georadiusbymember key member radius distance	找出位于指定范围内的元素，但是 georadiusbymember 的中心点是由给定的位置元素决定

2.3.2 指令实测

> geoadd location 116.33 39.89 user1 116.34 39.90 user2 116.35 39.88 user3 119.35 41.22 user4
3
> geopos location user1
116.3299986720085144
39.89000061669732844
> geodist location user1 user2 km
1.4018
> geodist location user1 user4 km
294.9606
> georadius location 116.33 39.87 3 km
user3
user1
> georadius location 116.33 39.87 5 km
user3
user1
user2
> georadiusbymember location user1 3 km
user3
user1
user2
> georadiusbymember location user1 2 km
user1
user2

2.3.2 使用场景

1.‌需要管理地理位置的场景
例如，寻找附近的人。
使用场景：通过georadius获取当前用户指定距离范围内的人，如qq、微信附近的人。

3.代码实现

package com.eckey.lab.service.util;
import com.alibaba.fastjson.json;
import org.slf4j.logger;
import org.slf4j.loggerfactory;
import org.springframework.data.redis.connection.redisconnection;
import org.springframework.data.redis.core.redistemplate;
import org.springframework.data.redis.core.zsetoperations;
import org.springframework.data.redis.core.types.redisclientinfo;
import org.springframework.stereotype.component;
import javax.annotation.resource;
import java.nio.charset.standardcharsets;
import java.security.messagedigest;
import java.security.nosuchalgorithmexception;
import java.util.*;
import java.util.concurrent.concurrenthashmap;
import java.util.concurrent.timeunit;
@component
public class redisutil {
    private static final logger log = loggerfactory.getlogger(redisutil.class);
    @resource
    private redistemplate<string, object> redistemplate;
    @resource(name = "strredistemplate")
    private redistemplate<string, string> stringredistemplate;
	 /**
	     * 创建布隆过滤器
	     *
	     * @param size          位数组大小
	     * @param hashfunctions 哈希函数数量
	     * @param value         元素值
	     */
    private list<long> gethashpositions(string value, int hashfunctions, int size) {
        list<long> positions = new arraylist<>(hashfunctions);
        try {
            messagedigest md = messagedigest.getinstance("md5");
            byte[] bytes = md.digest(value.getbytes(standardcharsets.utf_8));
            // 使用同一个md5值生成多个哈希位置
            for (int i = 0; i < hashfunctions; i++) {
                long hashvalue = 0;
                for (int j = i * 4; j < i * 4 + 4; j++) {
                    hashvalue <<= 8;
                    int index = j % bytes.length;
                    hashvalue |= (bytes[index] & 0xff);
                }
                positions.add(math.abs(hashvalue % size));
            }
        } catch (nosuchalgorithmexception e) {
            throw new runtimeexception("md5 algorithm not found", e);
        }
        return positions;
    }
    public void bloomfilteradd(string key, string value, int hashfunctions, int size) {
        for (long position : gethashpositions(value, hashfunctions, size)) {
            stringredistemplate.opsforvalue().setbit(key, position, true);
        }
    }
    public boolean bloomfiltercontains(string key, string value, int hashfunctions, int size) {
        for (long position : gethashpositions(value, hashfunctions, size)) {
            if (boolean.false.equals(stringredistemplate.opsforvalue().getbit(key, position))) {
                return false;
            }
        }
        return true;
    }
    public void hyperadd(string key, string value) {
        stringredistemplate.opsforhyperloglog().add(key, value);
    }
    public void hyperadd(string key, string... values) {
        stringredistemplate.opsforhyperloglog().add(key, values);
    }
    public long hypersize(string key) {
        return stringredistemplate.opsforhyperloglog().size(key);
    }
    public void hyperdel(string key) {
        stringredistemplate.opsforhyperloglog().delete(key);
    }
    public long hyperunion(string destkey, string srckey1, string srckey2) {
        return stringredistemplate.opsforhyperloglog().union(destkey, srckey1, srckey2);
    }
    public long hyperunion(string destkey, string... srckeys) {
        return stringredistemplate.opsforhyperloglog().union(destkey, srckeys);
    }
  public long geoadd(string key, double lat, double lon, string member) {
        return stringredistemplate.opsforgeo().add(key, new point(lat, lon), member);
    }
    public list<point> geoget(string key, string member) {
        return stringredistemplate.opsforgeo().position(key, member);
    }
    public distance geodistance(string key, string member1, string member2, metric metric) {
        return stringredistemplate.opsforgeo().distance(key, member1, member2, metric);
    }
    public georesults<redisgeocommands.geolocation<string>> georadius(string key, double longitude, double latitude, double radius, redisgeocommands.distanceunit unit) {
        point point = new point(longitude, latitude);
        circle circle = new circle(point, new distance(radius, unit));
        redisgeocommands.georadiuscommandargs args = redisgeocommands.georadiuscommandargs.newgeoradiusargs().includedistance().includecoordinates().sortascending();
        return stringredistemplate.opsforgeo().radius(key, circle, args);
    }
    public georesults<redisgeocommands.geolocation<string>> geonearbymember(string key, string member, double radius, redisgeocommands.distanceunit unit) {
        redisgeocommands.georadiuscommandargs args = redisgeocommands.georadiuscommandargs.newgeoradiusargs().includedistance().includecoordinates().sortascending();
        return stringredistemplate.opsforgeo().radius(key, member, new distance(radius, unit), args);
    }
}

4.小结

1.bitmap其实就是一个存储二进制数字（0 和 1）的数组，通过一个bit位可以表示某个元素对应的值或者状态，key 就是对应元素本身。一个字节（byte）占8个bit，因此bitmap能极大节省存储空间。
2.hyperloglog数据结构在redis中占用固定空间，因此不适合存储大量数据。同时它只能提供近似值，对于精确度要求较高的场景不太适用。bitmap适合存储大量数据，但对于少量数据而言不够高效。
3.geospatial index（地理空间索引）主要用于存储地理位置信息，适用于根据距离进行查询、统计等一系列场景。

5.参考文献

1.https://juejin.cn/post/6844903785744056333
2.https://javaguide.cn/database/redis/redis-data-structures-02.html
3.https://www.cnblogs.com/lykbk/p/15871615.html
4.https://hogwartsrico.github.io/2020/06/08/bloomfilter-hyperloglog-bitmap/index.html

到此这篇关于redis三种特殊类型数据结构(bitmap、hyperloglog、geo)的文章就介绍到这了,更多相关redis 中 rdb 与 aof 的区别内容请搜索代码网以前的文章或继续浏览下面的相关文章希望大家以后多多支持代码网！