SQL分页性能优化

JDBC的分页是通过offset+limit实现的,数据库是通过抓取所有数据并跳过前面n行来数据来实现offset功能的,越到后面跳过的数据越多offset越耗时:

1
2
3
4
5
6
7
CREATE TABLE players AS
SELECT generate_series(1, 10000000) AS id,
uuid_in(md5(random()::TEXT || clock_timestamp()::TEXT)::CSTRING) AS username,
floor(random() * 100) AS score;

CREATE UNIQUE INDEX players_id_uindex
ON players (id);

OFFSET + LIMIT:

1
2
3
4
5
SELECT * FROM players ORDER BY id OFFSET 10 LIMIT 10; // Execution Time: 0.034 ms
SELECT * FROM players ORDER BY id OFFSET 500000 LIMIT 10; // Execution Time: 115.706 ms
SELECT * FROM players ORDER BY id OFFSET 1000000 LIMIT 10; // Execution Time: 235.941 ms
SELECT * FROM players ORDER BY id OFFSET 5000000 LIMIT 10; // Execution Time: 1212.642 ms
SELECT * FROM players ORDER BY id OFFSET 9000000 LIMIT 10; // Execution Time: 2181.703 ms

解决办法就是用keyset pagination,用表里面一些唯一的列用来分页:

1
2
3
4
5
SELECT * FROM players WHERE id > 10 ORDER BY id LIMIT 10; // Execution Time: 0.033 ms
SELECT * FROM players WHERE id > 500000 ORDER BY id LIMIT 10; // Execution Time: 0.043 ms
SELECT * FROM players WHERE id > 1000000 ORDER BY id LIMIT 10; // Execution Time: 0.041 ms
SELECT * FROM players WHERE id > 5000000 ORDER BY id LIMIT 10; // Execution Time: 0.037 ms
SELECT * FROM players WHERE id > 9000000 ORDER BY id LIMIT 10; // Execution Time: 0.069 ms

实际情况中可能会以score排序:

创建索引

1
CREATE INDEX players_score_username_index ON players (score DESC, username DESC);

OFFSET + LIMIT

1
SELECT * FROM players ORDER BY score DESC, username DESC OFFSET 500000 LIMIT 10; //Execution Time: 1508.534 ms

Keyset pagination

1
2
3
4
5
6
7
SELECT * FROM players WHERE (score, username) < (94, 'fdf62b1a-5b4e-c561-38ad-74ac4f099a46') ORDER BY score DESC, username DESC LIMIT 10; //Execution Time: 0.077 ms


(score, username) < (94, 'fdf62b1a-5b4e-c561-38ad-74ac4f099a46')

// 效果等同于
score < 94 OR (score = 94 AND username < 'fdf62b1a-5b4e-c561-38ad-74ac4f099a46')

进阶

找出分页边界,这里的关键点是你要找出一列或者多列组成的唯一的列,如果你只按score排序的话,score会有重复的值,会导致排序结果不一致。

1
2
3
4
5
6
7
8
9
10
SELECT p.id,
p.username,
p.score,
CASE row_number()
OVER (ORDER BY p.score DESC , p.username DESC ) % 10
WHEN 0 THEN 1
ELSE 0
END page_boundary
FROM players AS p
ORDER BY p.score DESC, p.username DESC

查询结果:

id username score page_boundary
6603631 ffffa267-7102-d2e9-0316-572c0044ca4f 99 0
3647464 ffff44d0-f54a-9139-afb0-f3d188c79a77 99 0
422584 ffff378c-b897-2498-8133-9d88183e52ec 99 0
6213135 fffe9b37-c685-4d5f-7781-31489e3b04ea 99 0
7930940 fffbde40-2009-8faf-4423-d28116f8dfee 99 0
1078914 fffb2654-23d1-e7f4-cf83-d4298b3c7208 99 0
9984059 fffb0de3-366c-5d42-ce61-9f7ee197707b 99 0
798388 fffac15d-a418-83eb-9597-487aa0a9136b 99 0
9247069 fff8cea2-378d-7bb8-1954-3bc6e6971eee 99 0
1711860 fff79537-86b8-5bd1-a474-717f151e0448 99 1
4319563 fff786e7-75b4-5cd7-09c1-eb4dc690dbbc 99 0
6021293 fff774fb-cf17-d02e-96b0-15555cb36b39 99 0
987629 fff756f7-e76b-8e8b-9941-c39061dac542 99 0
149766 fff75386-890f-af96-1c4c-64df0f6f4546 99 0
5546133 fff6ebcb-f3fb-d4b3-6b33-b56c718d44e4 99 0
5167740 fff6ce5a-b4c1-5363-554b-a285b6917054 99 0
5931868 fff5d7a6-1b0d-cf87-e0df-2440a53a7e05 99 0
4577984 fff55361-9b07-dfc6-9d1a-5fc08cdec7a6 99 0
4013672 fff4c37a-7ddf-6042-6b8f-eb9d7f8ea691 99 0
7863996 fff47c1f-18d8-9980-72fb-e51213ff0f17 99 1

找出每一页的最后一条数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT p.id,
p.username,
p.score,
page_boundary,
row_number()
OVER (ORDER BY p.score DESC , p.username DESC) + 1 page_number
FROM (
SELECT p.id,
p.username,
p.score,
CASE row_number()
OVER (ORDER BY p.score DESC , p.username DESC ) % 10
WHEN 0 THEN 1
ELSE 0
END page_boundary
FROM players AS p
ORDER BY p.score DESC, p.username DESC
) AS p
WHERE p.page_boundary = 1;

查询结果:

id username score page_boundary page_number
1711860 fff79537-86b8-5bd1-a474-717f151e0448 99 1 2
7863996 fff47c1f-18d8-9980-72fb-e51213ff0f17 99 1 3
7016430 ffe9a2c3-f257-24f2-e6f1-f7219c3581ea 99 1 4
994965 ffe61629-a620-7fcb-af3f-4c2c15821b77 99 1 5
9267095 ffe1b897-24ac-61d0-a5c2-9103a43b4e77 99 1 6
936851 ffda271d-5c69-1a4b-061f-077d75b77f49 99 1 7
9772528 ffd521da-c0bf-9909-77a1-9bce8e2c39a6 99 1 8
3913178 ffccfdeb-5608-c02e-6dd8-fe372245ee2b 99 1 9
3757222 ffc54f77-1f21-d63c-d660-a35910625bfe 99 1 10
9150019 ffbe0ffe-db67-44e2-2e84-41d0b54467b6 99 1 11

所以如果我们要查询第二页的数据,执行下面的sql即可:

1
SELECT * FROM players WHERE (score, username) < (99, 'fff79537-86b8-5bd1-a474-717f151e0448') ORDER BY score DESC, username DESC;

查询结果:

id username score
4319563 fff786e7-75b4-5cd7-09c1-eb4dc690dbbc 99
6021293 fff774fb-cf17-d02e-96b0-15555cb36b39 99
987629 fff756f7-e76b-8e8b-9941-c39061dac542 99
149766 fff75386-890f-af96-1c4c-64df0f6f4546 99
5546133 fff6ebcb-f3fb-d4b3-6b33-b56c718d44e4 99
5167740 fff6ce5a-b4c1-5363-554b-a285b6917054 99
5931868 fff5d7a6-1b0d-cf87-e0df-2440a53a7e05 99
4577984 fff55361-9b07-dfc6-9d1a-5fc08cdec7a6 99
4013672 fff4c37a-7ddf-6042-6b8f-eb9d7f8ea691 99
7863996 fff47c1f-18d8-9980-72fb-e51213ff0f17 99

OFFSET + LIMIT:

1
SELECT * FROM players ORDER BY score DESC, username DESC OFFSET 10 LIMIT 10;

查询结果:

id username score
4319563 fff786e7-75b4-5cd7-09c1-eb4dc690dbbc 99
6021293 fff774fb-cf17-d02e-96b0-15555cb36b39 99
987629 fff756f7-e76b-8e8b-9941-c39061dac542 99
149766 fff75386-890f-af96-1c4c-64df0f6f4546 99
5546133 fff6ebcb-f3fb-d4b3-6b33-b56c718d44e4 99
5167740 fff6ce5a-b4c1-5363-554b-a285b6917054 99
5931868 fff5d7a6-1b0d-cf87-e0df-2440a53a7e05 99
4577984 fff55361-9b07-dfc6-9d1a-5fc08cdec7a6 99
4013672 fff4c37a-7ddf-6042-6b8f-eb9d7f8ea691 99
7863996 fff47c1f-18d8-9980-72fb-e51213ff0f17 99

参考

Python3 logging用法

1
2
3
4
5
6
7
8
9
10
11
12
13
import logging


log = logging.getLogger()
log.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
formatter = logging.Formatter('[%(threadName)s] %(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
log.addHandler(handler)

log.info("Hello %s", "world)

PostGIS快速入门

安装

1
CREATE EXTENSION postgis

添加列

1
2
ALTER TABLE your_table ADD COLUMN geom geometry(Point, 4326);
UPDATE your_table SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);

半径查找

1
2
3
4
5
6
7
8
9
10
11
12
SELECT lat,
lon,
location,
cnt
FROM ggg
WHERE 1 = 1
AND cnt > 1
AND ST_DWithin(
ggg.geom,
ST_MakePoint(42.9764, 47.5024)::GEOGRAPHY,
200000
);

参考:

https://live.osgeo.org/en/quickstart/postgis_quickstart.html
https://gis.stackexchange.com/questions/145007/creating-geometry-from-lat-lon-in-table-using-postgis

Openwrt环境下某2ray `too many open files`问题解决

too many open files大部分情况下是由于配置错误引起的。如果配置没有问题,可以在某2ray的启动脚本里加上这一句:

1
2
ulimit -SHn 65535
/usr/bin/2ray -config config.json

启动后可以用这个命令来看是否生效:

1
cat /proc/{2AY进程id}/limits

参考:

Kubernetes Create User

Create User

kubectl apply -f eks-admin-service-account.yaml

1
2
3
4
5
apiVersion: v1
kind: ServiceAccount
metadata:
name: eks-admin
namespace: kube-system

kubectl apply -f eks-admin-cluster-role-binding.yaml

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: eks-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: eks-admin
namespace: kube-system

Get certificate

1
kubectl get secret default-token-cvn2d -o jsonpath="{['data']['ca\.crt']}" | base64 --decode

Get Token

1
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')

Test If the role works

Setup credential

1
2
3
4
kubectl config set-cluster kubernetes --certificate-authority=ca.crt --server=$K8S_SERVER_URL
kubectl config set-credentials $K8S_USERNAME --token=$K8S_USER_TOKEN
kubectl config set-context aws --cluster=kubernetes --namespace=$K8S_NAMESPACE --user=$K8S_USERNAME
kubectl config use-context aws --user=$K8S_USERNAME

Test command

1
kubectl get pod

Ref

Openwrt下dnsmasq配置

配置文件

dnsmasq.conf

1
2
3
4
5
6
7
conf-dir=/root/dnsmasq.d/
//设置DNS缓存时间
min-cache-ttl=3600
//缓存数量
cache-size=1024
//重新加载后清空缓存
clear-on-reload

配置说明

转发域名解析

解析海外域名需要将dns解析请求转发到上游的无污染的dns服务器,配置文件 /root/dnsmas.d/not_china.conf

1
2
server=/google.com/127.0.0.1#5353
...

国内域名直接用国内的dns服务器,china.conf

1
2
server=/baidu.com/221.6.4.66
...

缓存域名解析

海外域名的解析时间一般会比较长,频繁的去请求上游DNS服务器既浪费时间又无意义,可以通过设置dnsmasq缓存来加快解析,减少请求上游DNS服务器的频率。

1
2
3
4
//设置DNS缓存时间
min-cache-ttl=3600
//缓存数量,最多10000
cache-size=9999

如果你不知道缓存数量应该设置为多少,可以通过下面命令查看dnsmasq的域名请求数作为参考:

killall -s USR1 dnsmasq

关于 no-resolv 配置

在不打开no-resolv的情况下,dnsmasq会使用ISP提供的dns服务器作为默认的服务器,比如 xx.com域名既不在 not_china.conf又不在china.conf中,dnsmasq就会用ISP的dns服务器来解析这个域名。

如果打开了no-resolv,同时又不设置resolv-file的话,dnsmasq就会找不到默认的dns服务器来解析xx.com域名,如果你的代理服务器正好属于这类域名,将导致你无法连接到你的服务器。

Openwrt下无法访问medium.com的解决办法

家里路由器刷了Openwrt系统,在访问medium.com ,nytimes.com等网站时,会时不时出现连接重置错误,今天花时间研究了下,终于解决了这个困扰我很久的问题,在此记录下。

问题描述

  • Chrome访问这些网站是会提示ERR_CONNECTION_RESET错误,反复刷新多次后又能打开网站。
  • wget -v命令显示返回了ipv6地址并且会提示无法创建SSL连接错误。

问题原因

问题就出在ipv6上,openwrt默认会打开ipv6地址分配,这会导致电脑被分配到了一个ipv6地址,而chrome,safari在本机有ipv6的情况下,会优先访问这些网站的ipv6的地址,而路由器上的iptables只会对ipv4包进行转发,故访问这些网站的ipv6地址是无法被代理的。

解决办法

方法1

关闭路由器ipv6地址分配,这样在没有ipv6地址的情况下会访问这些网站的ipv4地址,这样就可以被代理到了。

方法2:

添加ip6tables规则,对ipv6也进行转发(没有尝试过)

参考