Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVJMO-000H1N-25 for pgsql-bugs@arkaria.postgresql.org; Tue, 16 Dec 2025 00:56:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vVJML-003OaK-2F for pgsql-bugs@arkaria.postgresql.org; Tue, 16 Dec 2025 00:56:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVJML-003OaB-1P for pgsql-bugs@lists.postgresql.org; Tue, 16 Dec 2025 00:56:14 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVJMK-000sUM-24 for pgsql-bugs@lists.postgresql.org; Tue, 16 Dec 2025 00:56:13 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 5BG0u9mA2292890; Mon, 15 Dec 2025 19:56:09 -0500 From: Tom Lane To: Jeroen Vermeulen cc: Robert Haas , pgsql-bugs@lists.postgresql.org Subject: Re: BUG #19354: JOHAB rejects valid byte sequences In-reply-to: References: <19354-eefe6d8b3e84f9f2@postgresql.org> Comments: In-reply-to Jeroen Vermeulen message dated "Tue, 16 Dec 2025 01:07:12 +0100" MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <2292888.1765846569.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Date: Mon, 15 Dec 2025 19:56:09 -0500 Message-ID: <2292889.1765846569@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Jeroen Vermeulen writes: > This bit worries me: "TlOther, vendor-defined, Johab variants also exist= " =E2=80=94 > such as an EBCDIC-based one and a stateful one! Yeah. So what we have here is: 1. Our JOHAB implementation has apparently been wrong since day one. 2. Wrongness may be in the eye of the beholder, since there are multiple versions of JOHAB. 3. Your complaint is the first, AFAIR. 4. That wikipedia page says "Following the introduction of Unified Hangul Code by Microsoft in Windows 95, and Hangul Word Processor abandoning Johab in favour of Unicode in 2000, Johab ceased to be commonly used." Given these things, I wonder if we shouldn't desupport JOHAB rather than attempt to fix it. Fixing would likely be a significant amount of work: if we don't even have the character lengths right, how likely is it that our conversions to other character sets are correct? I also worry that if different PG versions have different ideas of the mapping, there could be room for dump/reload problems, and maybe even security problems related to the backslash issue. regards, tom lane