Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVCea-00FpZD-1Y for pgsql-bugs@arkaria.postgresql.org; Mon, 15 Dec 2025 17:46:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vVCeZ-001Zke-1F for pgsql-bugs@arkaria.postgresql.org; Mon, 15 Dec 2025 17:46:36 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVCeY-001ZkN-38 for pgsql-bugs@lists.postgresql.org; Mon, 15 Dec 2025 17:46:35 +0000 Received: from mail-ej1-x633.google.com ([2a00:1450:4864:20::633]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vVCeY-000p9Y-0b for pgsql-bugs@lists.postgresql.org; Mon, 15 Dec 2025 17:46:34 +0000 Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-b79d0a0537bso508071066b.2 for ; Mon, 15 Dec 2025 09:46:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765820787; x=1766425587; darn=lists.postgresql.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mCfk6xaZQu1hvoq024f6sSzbXiRTv+PT+f0Grl5hkVc=; b=jLK8ZC9CHeaPQTv1DQMJK2RI9VRV+YWQlThOOcsiLAbieR4yBAMStrocXMJ6L5SXqJ bsdL22Si+q/+Upl3TZH3jLiZSiId3bcnWNtMWHaqUHXXMZ3JkeA++8Pl1yCLk28voT5k i+oO+e/T2QfY48ZJFf2dQMtqOLrWz90gYOjrErYVvqvCxgiH1wAZ2lKAhvEmEORlQD8s YA3MniftlgBu0jOvWa4dvKqxgXSyJ00+brIcgGlU21YUY6HdAJOSPOFx4OR8TGROuSSh ge3bGfORxePPeKdexQdnCvBdZSGEiEDaHG8MAMlDrDQqDIk3R7EYLH/ee8oWVqfmCaM5 PYtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765820787; x=1766425587; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=mCfk6xaZQu1hvoq024f6sSzbXiRTv+PT+f0Grl5hkVc=; b=Wr0fGDJ4H6JwnHYR1m9Glst5blunjSQJivsyaQjAOJJaiVEILjtfsaz0V4WSUnvpgT Ctz4fgl/V7vkrL43IfiryMkurjjfbEfo1TCf8sJ/ojWiLhZFnclAIvKjHw8mgVpZiWf9 4nHhuhPSBU6bbnn0TwOQbxBWH8DoebZhCHYV1lvU+DLyGcrn4AJIm/nRvi0DNffc/ZR3 oKr8gBVAQQPVCmxiMVbChLt3LHeBovkkIr17/VLSnUCkfq2ZleswlgOAbzxxY2fU/81C Stb1BHlDHyuPpx7qekCtVgCcIc2zL7e/XBp2UyhueXHSex44dMPFgOW0mP86e/5uzmlQ /d/g== X-Forwarded-Encrypted: i=1; AJvYcCXl3B4iUmw48U5kU3ypeeZ/sWkdbFXgyzS/0B11P42pTCUSYeVgG3YPZIUpaLNG3uqMlkfcDkackG42@lists.postgresql.org X-Gm-Message-State: AOJu0Ywx443rf03dz+VPLAQfovmm6LageGwQqPhwWMAmM119zVg8J4vy LXv4jfqA6waJjFD2pAuFWuSU+BXdCJok1Pf2iHd7iTxREXv47RzK18r3y6ah5oqNSwJPdA3pvuV +ZjpRiuYxZp0u255ZOx76NsTrUMqoNsHvKxZy X-Gm-Gg: AY/fxX7Tv2jyY5rwBCpemkeItFbhHfzJ/WPcrxpGl4FxUM/VzQUxMFVBnCtAWyQicKK tyzk+7lpetSYODvlgXLXiiXjCgNgalVZ8lhVEk3mMw3W4EAMuEdlGHHMkySG/x6/lnv1x4RS6t/ f8y59qcCWNDfw+znECAeq8BIHeJxVwE+Ol5PwJO7hqleSuUkoihXzHjr6+uOa0fzuGe5001tz/9 bpEljdclP7C2LtYuMjukJgz+Szl0rRVi5IvqDJ1/yusIMIKMf5D4sippDuFHWEEgW5DD4KtBp8Y R/QYkGvEkl6AZD8zziPRC7MGFIE= X-Google-Smtp-Source: AGHT+IGcdM+quaZhYyWjOys789kCdv/8uRPTJAHMsXVQ/fneqZeW9nPMX1QC6+F0ugJdurdYb3GZ3Npep66k55h8EZM= X-Received: by 2002:a17:906:6a04:b0:b79:fcc9:b00d with SMTP id a640c23a62f3a-b7d23a9f4b3mr1240016266b.59.1765820787197; Mon, 15 Dec 2025 09:46:27 -0800 (PST) MIME-Version: 1.0 References: <19354-eefe6d8b3e84f9f2@postgresql.org> In-Reply-To: <19354-eefe6d8b3e84f9f2@postgresql.org> From: Robert Haas Date: Mon, 15 Dec 2025 12:46:15 -0500 X-Gm-Features: AQt7F2pYuP5H4DdKLR3Dy317bK2skwux1j6NrQkH69zHtYT6rl8AS8rEwUvBDdw Message-ID: Subject: Re: BUG #19354: JOHAB rejects valid byte sequences To: jtvjtv@gmail.com, pgsql-bugs@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Dec 13, 2025 at 2:12=E2=80=AFPM PG Bug reporting form wrote: > Calling libpq, connecting to a UTF8 database and successfully setting cli= ent > encoding to JOHAB, this statement: > > PQexec(connection, "SELECT '\x8a\x5c'"); > > Returned an empty result with this error message: > > ERROR: invalid byte sequence for encoding "JOHAB": 0x8a 0x5c > > AFAICT, 0x8a 0x5c is a valid JOHAB sequence making up Hangul character "= =EA=B5=8E". > Easily verified in Python: > > print(b'\x8a\x5c'.decode('johab')) > > It's the same story for some other valid sequences I tried, including thi= s > character's "neighbours" 0x8a 0x5b and 0x8a 0x5d. My reading of pg_johab_verifystr() is that it accepts any character without the high bit set as a single-byte character. Otherwise, it calls pg_joham_mblen() to determine the length of the character, and that in turn calls pg_euc_mblen(), which returns 3 if the first byte is 0x8f and otherwise 2. Whatever the answer, it then wants each byte to pass IS_EUC_RANGE_VALID() which allows for bytes from 0xa1 to 0xfe. Your byte string doesn't match that rule, so it makes sense that it fails. What confuses me is that https://en.wikipedia.org/wiki/KS_X_1001#Johab_encoding seems to say that the encoding is always a 2-byte encoding and that any 2-byte sequence with the high bit set on the first character is a valid character. So the rules we're implementing don't seem to match that at all. But unfortunately the intent behind the current code is not clear. It was introduced by Bruce in 2002 in commit a8bd7e1c6e026678019b2f25cffc0a94ce62b24b, but I don't see comments there or elsewhere explaining what the thought was behind the way the code works, so I don't know if this is some weird variant of JOHAB that intentionally works differently or if this was just never correct. --=20 Robert Haas EDB: http://www.enterprisedb.com