2020-02-09

Openwrt下无法访问medium.com的解决办法

家里路由器刷了Openwrt系统，在访问medium.com ,nytimes.com等网站时，会时不时出现连接重置错误，今天花时间研究了下，终于解决了这个困扰我很久的问题，在此记录下。

问题描述

Chrome访问这些网站是会提示ERR_CONNECTION_RESET错误，反复刷新多次后又能打开网站。
用wget -v命令显示返回了ipv6地址并且会提示无法创建SSL连接错误。

问题原因

问题就出在ipv6上，openwrt默认会打开ipv6地址分配，这会导致电脑被分配到了一个ipv6地址，而chrome，safari在本机有ipv6的情况下，会优先访问这些网站的ipv6的地址，而路由器上的iptables只会对ipv4包进行转发，故访问这些网站的ipv6地址是无法被代理的。

解决办法

方法1

关闭路由器ipv6地址分配，这样在没有ipv6地址的情况下会访问这些网站的ipv4地址，这样就可以被代理到了。

方法2:

添加ip6tables规则，对ipv6也进行转发（没有尝试过）

参考

https://superuser.com/questions/1131112/ipv4-only-wifi-in-an-ipv6-enabled-openwrt-router

2019-12-10

linux

Bitbucket pipeline传递文件内容变量

Bitbucket pipeline可以通过Repository variables来传递变量，但是如果变量包含一些特殊字符比如换行符，bitbucket就不能很好的处理，对于这种情况我们可以将变量用base64编码一下，在pipeline中再解码就可以解决这问题了。

cat file.txt | base64

// in pipeline file
echo ${YOUR_ENV} | base64 -d > file.txt

2019-12-06

java

Java BufferedImage OutOfMemoryError

最近遇到了BufferedImage OutOfMemoryError的问题，在此记录一下。

事情的起因是项目中有个功能需要将多张图片合并为一张，流程如下：

有三个图片文件，File A,File B,File C
通过ImageIO.read(inputStream) 将ABC转换成BufferedImage类型
通过java.awt.image.BufferedImage#getWidth()获取这三张图片中的最大宽度maxWidth，
计算另外两张图片等比例拉伸至maxWidth后的高度，将这三张图片的高度相加得到maxHeight
创建一张maxWidth * maxHeight的图片
通过画布将三张图片写入到这张图片里

主要代码如下：

...
                val images = mutableListOf<BufferedImage>()
                ...
                images.add(ImageIO.read(inputStream))
                ...
                joinImages(*images.toTypedArray())
...

    fun joinImages(vararg imgs: BufferedImage): BufferedImage {
        val offset = 20

        val aggregateWidth = imgs.maxBy { it.width }!!.width
        val aggregateHeight = imgs.sumBy {
            (it.height * aggregateWidth) / it.width
        } + (imgs.size - 1) * offset

        val newImage = BufferedImage(aggregateWidth, aggregateHeight, BufferedImage.TYPE_INT_ARGB)
        val g2 = newImage.createGraphics()
        val oldColor = g2.color

        g2.paint = Color.white
        g2.fillRect(0, 0, aggregateWidth, aggregateHeight)
        g2.color = oldColor

        var y = 0

        imgs.forEach {
            val height = (aggregateWidth * it.height) / it.width
            val scaled = it.getScaledInstance(aggregateWidth, height, Image.SCALE_DEFAULT)

            g2.drawImage(toBufferedImage(scaled), null, 0, y)
            y += height + offset
        }
        g2.dispose()

        return newImage
    }

运行过程中发现第4和17行代码时不时会抛出OutOfMemoryError错误，研究了下发现有两个问题：

对于jpeg格式的图片，java将其载入到内存时是不会对其进行压缩的，一个像素会占用3个字节的内存，如果图片的尺寸比较大，会占用非常大的内存，比如这张图片，实际文件大小为84kb，载入到内存里后的大小为11mb：

val file = File("/Downloads/fff.jpeg")
val image = ImageIO.read(FileInputStream(file))

println(image.width)  //2000
println(image.height) //2000
println(ObjectSizeCalculator.getObjectSize(image)) //12000928  -> 11mb

合并图片前代码将这三张图片一次性全部加载到内存里，这也会占用比较大的内存。

对于问题1，目前只能规避这问题，加载图片前会预先判断下该图片会占用多少内存，对于会超出内存使用的jpeg图片不予合并。同时在代码第14行，我们对合并后的最大宽度进行限制，避免合并后的图片尺寸过大，占用的内存超出限制。

对于问题2，这三张图片可以按序加载，不用一次性全部加载到内存里，在需要合并时才加载到内存里，同时可以改进获取图片尺寸的代码，不用将图片加载到内存后再获取尺寸。

最终代码如下：

fun getImageSize(file: File): Dimension {
    ImageIO.createImageInputStream(file).use { `in` ->
        val readers = ImageIO.getImageReaders(`in`)
        if (readers.hasNext()) {
            val reader = readers.next()
            try {
                reader.input = `in`
                return Dimension(reader.getWidth(0), reader.getHeight(0))
            } finally {
                reader.dispose()
            }
        }
    }

    return Dimension(0, 0)
}


fun joinImages(vararg files: File): File {
    ...
    val offset = 20
    val dimensions = files.map { getImageSize(it) }
    val aggregateWidth = min(dimensions.maxBy { it.width }!!.width, 800)
    val aggregateHeight = dimensions.sumBy { (it.height * aggregateWidth) / it.width } + (files.size - 1) * offset
    val newImage = try {
        BufferedImage(aggregateWidth, aggregateHeight, BufferedImage.TYPE_INT_ARGB)
    } catch (e: OutOfMemoryError) {
        ...
        throw e
    }
    val g2 = newImage.createGraphics()
    val oldColor = g2.color

    g2.paint = Color.white
    g2.fillRect(0, 0, aggregateWidth, aggregateHeight)
    g2.color = oldColor

    var y = 0

    files.forEach {
        var image = try {
            ImageIO.read(it)
        } catch (e: OutOfMemoryError) {
            ...
            throw e
        }
        val height = (aggregateWidth * image.height) / image.width
        val scaled = image.getScaledInstance(aggregateWidth, height, Image.SCALE_DEFAULT)

        g2.drawImage(toBufferedImage(scaled), null, 0, y)

        image.flush()
        image = null

        y += height + offset
    }
    g2.dispose()

    val joinImageFile = File.createTempFile(UUID.randomUUID().toString(), ".png")

    ImageIO.write(newImage, "png", joinImageFile)

    return joinImageFile
}

按上述代码修改后，再也没有发生OutOfMemoryError。对于问题1，目前发现apache commons-imaging似乎可以解决这问题，有时间去尝试下，到时候再来更新本文。

参考：

https://coderanch.com/t/416485/java/Java-BufferedImage-OutOfMemoryError

2019-11-28

linux

Kubernetes环境配置JVM内存

我们知道JVM在docker容器环境中是无法正确检测到可用内存的，最近正好遇到了一个与之相关的问题，在此记录一下。

遇到问题的项目技术栈为JDK 8 + Spring Boot + Tomcat，部署在docker环境。项目运行过程中出现了java.lang.OutOfMemoryError: Java heap space异常，当时项目的部署文件如下：

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: api-deployment
  labels:
    app: api
spec:
  serviceName: api-app
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - image: ...
          imagePullPolicy: "Always"
          name: api
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 300
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 60
            periodSeconds: 5
          securityContext:
            capabilities:
              add:
                - SYS_PTRACE 
          envFrom:
            - secretRef:
                name: secret

问题应该出在k8s内存设置与JVM的配置这边，网上查询资料后发现tomcat可以通过CATALINA_OPTS环境变量来设置JVM参数，UseCGroupMemoryLimitForHeap 可以让JVM自动检测容器的可用内存，MaxRAMFraction 为容器内存和堆内存的比例，比如容器内存为2G，MaxRAMFraction为2，则最大堆内存为2G/2=1G，这里将MaxRAMFraction设置为2比较安全，设置了这两个参数后，JVM就能通过检测容器的内存来自动调整堆内存大小，不用再显示设置堆内存了。

更新后的配置文件里加了如下代码：

...
          env:
              - name: CATALINA_OPTS
                value: "-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2"
          resources:
            requests:
              memory: "512Mi"

项目运行一段时间后发现问题依旧，研究了下UseCGroupMemoryLimitForHeap参数，发现它是通过读取系统/sys/fs/cgroup/memory/memory.limit_in_bytes文件来检测内存的，登录到容器里查看了下该文件，发现里面是一个很大的值：9223372036854771712，等于没有内存限制，查了下资料发现这个字段是通过k8s文件中的resources->limits的这个属性来配置的，更新文件，加了如下代码：

1 2	limits: memory: "2048Mi"

观察一段时间后内存就没再溢出，最终完整配置文件如下：

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: api-deployment
  labels:
    app: api
spec:
  serviceName: api-app
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - image: ...
          imagePullPolicy: "Always"
          name: api
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 300
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 60
            periodSeconds: 5
          securityContext:
            capabilities:
              add:
                - SYS_PTRACE 
          env:
              - name: CATALINA_OPTS
                value: "-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2"
          envFrom:
            - secretRef:
                name: secret
          resources:
            requests:
              memory: "512Mi"
            limits:
              memory: "2048Mi"

参考

2019-11-11

linux

Google Cloud Pub Sub的一些研究

多个客户端可以使用一个subscription，每个客户端只会收到一部分的消息：

Multiple subscribers can make pull calls to the same “shared” subscription. Each subscriber will receive a subset of the messages
出处：https://cloud.google.com/pubsub/docs/subscriber#push-subscription

对于一些处理比较耗时的消息，客户端有后台任务会自动更新ack
deadline，详情可见extendDeadlines方法。
关于Message retention duration

By default, a message that cannot be delivered within the maximum retention time of 7 days is deleted and is no longer accessible. This typically happens when subscribers do not keep up with the flow of messages. Note that you can configure message retention duration (the range is from 10 minutes to 7 days).

意思就是如果消息在Message retention duration时间内没有被处理完成的话，这条消息就会删除并且再也访问不到。

比如程序的BUG导致消息一直无法被处理，经过 Message retention duration 后这条消息就再也收不到了，会导致数据丢失。

2019-10-23

linux

Safari can't establish a secure connection 解决办法

公司测试用的是6.x版本的Safari，在测试时发现无法打开项目网站，提示Safari can't establish a secure connection错误，测试了下其他网站也只有一部分能打开。

一开始以为是系统配置的问题，尝试了下网上的解决方案如：修改DNS服务器地址，信任证书等，发现还是无法打开项目网站。后来发现Safari 6.x是不支持TLS 1.2版本的，而我们项目用的正好是1.2版本的TLS。

解决办法：

升级Safari版本
让项目支持低版本的TLS

参考：

2019-09-19

linux

Google Cloud Function Http 认证配置

最近在做的项目用到了google cloud，其中有个功能模块的google cloud function需要调用另一个通过http请求触发的function，在研究配置http function认证信息期间花了不少时间，在这记录一下。

默认情况下创建http请求触发的function会勾选Allow unauthenticated invocations选项，这样无需通过认证就可以调用function，但这样做显然是不安全的，一旦接口地址泄漏就可能会被恶意调用。

查看了google cloud文档，google建议用Cloud Endpoint来做认证，但是在cloud console始终无法创建endpoint，遂放弃。

一开始想了个临时的解决方案：在勾选Allow unauthenticated invocations的情况下，在function代码里加上token验证，如果token不匹配就返回并提示forbidden，虽然可以减少function在被恶意调用的情况下的执行时间，但还是会产生一定的费用。

今天抽时间重新研究了下这个问题，终于找到了解决方案：

假设调用者function为 xxx-master，被调用的function为xxx-slave,

通过下面命令给 xxx-slave function加上roles/cloudfunctions.invoker 角色，允许 roles/cloudfunctions.invoker这个角色来调用xxx-slave function

1
2
3

gcloud beta functions add-iam-policy-binding RECEIVING_FUNCTION \
  --member='serviceAccount:CALLING_FUNCTION_IDENTITY' \
  --role='roles/cloudfunctions.invoker'

RECEIVING_FUNCTION -> 被调用的function名字
CALLING_FUNCTION_IDENTITY -> 一般为 PROJECT_ID@appspot.gserviceaccount.com

比如：

1
2
3

gcloud beta functions add-iam-policy-binding xxx-slave \
  --member='serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com' \
  --role='roles/cloudfunctions.invoker'

在 xxx-master function中加上获取token代码，请求时将token放在Authorization header里，以python为例：

# Requests is already installed, no need to add it to requirements.txt
import requests

def calling_function(request):
  # Make sure to replace variables with appropriate values
  receiving_function_url = 'https://us-central1-graphical-bus-248617.cloudfunctions.net/xxx-slave
'

  # Set up metadata server request
  # See https://cloud.google.com/compute/docs/instances/verifying-instance-identity#request_signature
  metadata_server_token_url = 'http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience='

  token_request_url = metadata_server_token_url + receiving_function_url
  token_request_headers = {'Metadata-Flavor': 'Google'}

  # Fetch the token
  token_response = requests.get(token_request_url, headers=token_request_headers)
  jwt = token_response.content.decode("utf-8")

  # Provide the token in the request to the receiving function
  receiving_function_headers = {'Authorization': f'bearer {jwt}'}
  function_response = requests.get(receiving_function_url, headers=receiving_function_headers)

  return function_response.content

完成上面两部后，xxx-slave function就可以在被认证的情况下调用了。

这里还需要注意的是要将Cloud Functions Invoker 中的 all user移除，不然xxx-slave方法还是公开的，操作步骤：

Google Cloud Console -> Cloud Function
勾选xxx-slave function
点击左侧的 PERMISSIONS tab
点开 Cloud Functions Invoker
将 all user移除

参考：

https://cloud.google.com/functions/docs/securing/authenticating#function-to-function

2019-08-24

linux

Google Cloud使用kubernetes，let’s encrypt和nginx-ingress部署web app

环境配置

连接cluster

1	gcloud container clusters get-credentials CLUSER_NAME --zone us-central1-a --project PROJECT_NAME

安装helm和tiller

1
2
3

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller

安装nginx-ingress

1	helm install --name nginx-ingress stable/nginx-ingress --set rbac.create=true --set controller.publishService.enabled=true

安装let’s encrypt

kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml

helm repo add jetstack https://charts.jetstack.io
helm install --name cert-manager --namespace cert-manager jetstack/cert-manager

kubectl create -f issuer.yaml

issuer.yaml

apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: youremail@yourdomain.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    http01: {}

配置DNS

获取nginx-ingress-controller的IP

1	kubectl get svc -n default

output:

NAME                            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                      AGE
kubernetes                      ClusterIP      10.64.0.1      <none>          443/TCP                      64m
nginx-ingress-controller        LoadBalancer   10.64.11.170   35.188.93.188   80:30393/TCP,443:30515/TCP   59m
nginx-ingress-default-backend   ClusterIP      10.64.5.162    <none>          80/TCP                       59m

将你的DNS记录指向 EXTERNAL-IP -> 35.188.93.188

部署web项目

创建namespace

1 2	kubectl create ns demo kubectl config set-context $(kubectl config current-context) --namespace=demo

部署项目

Deployment

1	kubectl apply -f deployment.yaml

deplyment.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: api-deployment
  labels:
    app: api
spec:
  serviceName: api-app
  replicas: 1
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - image: tomcat:8.5.45-jdk8-openjdk-slim
          imagePullPolicy: "Always"
          name: api
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 150
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: 8080
            initialDelaySeconds: 150
            periodSeconds: 5

Service

1	kubectl apply -f service.yaml

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: api-service
  labels:
    app: api
spec:
  type: ClusterIP
  selector:
    app: api
  ports:
   - protocol: TCP
     port: 80
     targetPort: 8080

Ingress

1	kubectl apply -f ingress.yaml

ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: name-virtual-host-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls: # < placing a host in the TLS config will indicate a cert should be created
  - hosts:
    - demo.w2x.me
    secretName: letsencrypt-prod
  rules:

  - host: demo.w2x.me
    http:
      paths:
      - backend:
          serviceName: api-service
          servicePort: 80

等deployment的pod ready后，访问你配置的域名，就可以看到https加密后的tomcat主页了。

Links

Update

2019-09-08: 添加了let’s encrypt部分缺失的安装步骤。

2019-07-25

linux

Zsh命令行历史同步插件推荐

公司和家里用的是两台电脑，经常遇到一些用过却记不住的命令，恰巧这些命令在另外台电脑执行过，就想找个能在不同机器上同步命令行历史的插件。

尝试过将用户目录文件夹下的.zsh_history文件软链接到dropbox，发现还是有点问题，比如多台机器同步是会发生文件冲突，一些情况下zsh会重新创建新的.zsh_history文件，导致数据丢失。期间又尝试了下history-sync这个插件，发现有会将历史文件清空的BUG。

无意中发现了zsh-histdb，这个插件正好满足了我的要求：在多台电脑之间同步历史记录，自动合并冲突文件。配合zsh-autosuggestions简直完美。

2019-06-30

linux

Docker环境访问fontello.com无限重定向问题

前阵子要把公司几个年代比较久远的前端项目迁移到新的jenkins环境，由于这些项目打包依赖库的版本比较老，要配置好环境会比较花时间，以后再迁移环境的话还要在搭建一遍，于是就想着用docker来打包项目。

本地搭建好docker环境后，测试打包的时候发现项目从fontello.com下载字体是老是会报一个超出最大重定向次数的异常，ssh到docker环境发现 wget -vdS fontello.com 返回的Response状态码一直是302，而本机环境和服务器都没这问题，docker环境wget其他网址也没有问题，一开始以为是fontello网站配置的问题，今天突然冒出一个想法会不会是我docker环境配置了代理的原因，于是将 fontello.com 域名加到了代理白名单里，发现就没有重定向异常了，字体文件能正常下载了。

这个问题的根本原因很有可能是docker代理功能和 fontello.com 两边的问题，在这里记一下，以防下次再遇到类似问题。