Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVPhR-002WFw-1m for pgsql-bugs@arkaria.postgresql.org; Tue, 16 Dec 2025 07:42:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vVPhQ-00533U-1g for pgsql-bugs@arkaria.postgresql.org; Tue, 16 Dec 2025 07:42:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVPhQ-00533J-0k for pgsql-bugs@lists.postgresql.org; Tue, 16 Dec 2025 07:42:25 +0000 Received: from mail-ua1-x92d.google.com ([2607:f8b0:4864:20::92d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vVPhO-0010Wi-1x for pgsql-bugs@lists.postgresql.org; Tue, 16 Dec 2025 07:42:24 +0000 Received: by mail-ua1-x92d.google.com with SMTP id a1e0cc1a2514c-93f4f04d9f6so2551326241.0 for ; Mon, 15 Dec 2025 23:42:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765870940; x=1766475740; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Z8h5VPUZOEqC6Z9pkmK6KHI8jRtfAfoWRjyoP+gykrU=; b=OlgOVvqHeS0DxBQlNrNdDrwsd3FBIG64NyKBz5wfdJnImrlAas9d3M0KLEIbIowFZb n6zHmZ+JejK5xXoUz1zHQfvOjhcpCWtgnusmJ1RpQiDPwOtQw1iD608Wo7GpnOvmgMuM Em6FY8nAnDcXkXsnF7j779xCcrT73NItHnKuljLZNHoOfzd7a99kF2S5rBSAfTs8/yUl J/Hrg1EJs3pwYHgg27Gp1OU8RrbCGByXMLvHSCihkCkQm+UlFrMS8SUGl8+RERP/8txa f49w3mcg3VEQ4WaNyYEs6cLHPhkEn41YIvElwosCCUFw8CVH+srSLQO6vHU+DYExw0mh FVkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765870940; x=1766475740; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Z8h5VPUZOEqC6Z9pkmK6KHI8jRtfAfoWRjyoP+gykrU=; b=p4wP1SCED9VSh5bN19YWNRfTEVsXOh5j5gu42kSeaq1J9mmUqATVRUi9uLtU2pQOLX Eno8LkJd1JBpjJr7PEWbcGVmeOA6xQcPPWaarwg2uiX51CNqd5V1G1w83BkVIVNPY0FF 7R6wL18MzyAhN1Y5USRalb0/g0RR9QxjA+cedCEcGNwu/TqE+Zjl8xWFSoRvRnkaP6F2 OMKKVb9PWcXYIjmylErOCKFZr93R92SoN58bDu4eksZjq1gKBBZp8vwCSRDv4GJblFW5 LRmbDiVc7GmZ3d4FTrxm6AHUXWHnI9DK/CTgo/4CuIn3r/nqZ4ucLEZq4sKOxcAaq/2a ciNA== X-Forwarded-Encrypted: i=1; AJvYcCWnQlkcC5Uux8+a2s8FmivOvS5IGdxSnYiQ+DcXrdO1DvcS4GBZr7uebzWTgNnIdmwADl5sOegIchnK@lists.postgresql.org X-Gm-Message-State: AOJu0YxPBLGJuKBenenrlkmNiXMIy83TOehosylZMs3SFfui44NIZaio kLS68io9FniLePKLmert95BF2RVfYRT2XNmX3oTDB29rScdNxVZ8rPNLWrDb9Xkdne/sCXO70Gf JJKIsnzUnzU0lWstI4YSJuGQm7/f85NY= X-Gm-Gg: AY/fxX6Ji7TrRtW+uHFMICzXQUOyLVGAjadYc4fSe28Kfqu6zoARAhiIe38TQAMyvCS itLQNW4ZcusfVGjmbn9+gkFxR38FNGGT9Jy0oFMak16Cb97T13HMPWtqGFo5hwbMbysAP9TelBa BkMrTA5kPu5UMsGzDDIWCfi1RMMi42p5EV6+JjnXQk0YUXIAaqAMkWxDFvoVySv6Rh7STO87ZoP OjyhyaJN5dnkE+Omw2/XZIyM+I3KKfusoTyTAgrGNEIJG1ANVNRfM1bTHicYv0oJQAhD3V2H7+y L8USlfDkSWAiBV6xXgZzUxfz5zotFSCf4GtOhFgF X-Google-Smtp-Source: AGHT+IFV0akz8o8SUFeT7/0bBFcVucMkpJLcGD8K2W1biQ5+6O8TmjF/5Z1Z921AnVM47Y2mU3nX/99DL5ChfUhIlzI= X-Received: by 2002:a05:6102:4a84:b0:5db:e93b:692c with SMTP id ada2fe7eead31-5e825a0aa9emr4841588137.12.1765870940585; Mon, 15 Dec 2025 23:42:20 -0800 (PST) MIME-Version: 1.0 References: <19354-eefe6d8b3e84f9f2@postgresql.org> <2292889.1765846569@sss.pgh.pa.us> In-Reply-To: From: Jeroen Vermeulen Date: Tue, 16 Dec 2025 08:42:09 +0100 X-Gm-Features: AQt7F2qYlaFGxc6Q4tX9ojAVKE4oejEntFszbIvPkq8pjEgt4zmIqMdeceVZbfM Message-ID: Subject: Re: BUG #19354: JOHAB rejects valid byte sequences To: VASUKI M Cc: Tom Lane , Robert Haas , pgsql-bugs@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000022019306460cde9d" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000022019306460cde9d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable My one worry is perhaps Johab is on the list because one important user needed it. But even then that requirement may have gone away? Jeroen On Tue, Dec 16, 2025, 07:23 VASUKI M wrote: > Thanks all,That analysis makes a lot of sense. > > Given the lack of a clear spec,the existence of multiple JOHAB > variants,and how long this has apparently been "working" without anyone > noticing,IMHO desupporting it does seem like the least risky option.At th= is > point,trying to fix JOHAB variants feels like opening a pretty big can of > worms,especially with the potential for dump/reload surprises or subtle > parsing/security issues. > > I don't have additional data to add,but +1 on removal or deprecation bein= g > a reasonable outcome here,given how obscure and effectively dead the > encoding is nowadays. > > Thanks for digging into this. > > Cheers, > Vasuki M > > On Tue, Dec 16, 2025 at 11:46=E2=80=AFAM Tom Lane wro= te: > >> Jeroen Vermeulen writes: >> > This bit worries me: "TlOther, vendor-defined, Johab variants also >> exist" =E2=80=94 >> > such as an EBCDIC-based one and a stateful one! >> >> Yeah. So what we have here is: >> >> 1. Our JOHAB implementation has apparently been wrong since day one. >> >> 2. Wrongness may be in the eye of the beholder, since there are >> multiple versions of JOHAB. >> >> 3. Your complaint is the first, AFAIR. >> >> 4. That wikipedia page says "Following the introduction of Unified >> Hangul Code by Microsoft in Windows 95, and Hangul Word Processor >> abandoning Johab in favour of Unicode in 2000, Johab ceased to be >> commonly used." >> >> Given these things, I wonder if we shouldn't desupport JOHAB >> rather than attempt to fix it. Fixing would likely be a significant >> amount of work: if we don't even have the character lengths right, >> how likely is it that our conversions to other character sets are >> correct? I also worry that if different PG versions have different >> ideas of the mapping, there could be room for dump/reload problems, >> and maybe even security problems related to the backslash issue. >> >> regards, tom lane >> >> >> >> >> --00000000000022019306460cde9d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
My one worry is perhaps Johab is on the list because one = important user needed it.

But = even then that requirement may have gone away?

<= /div>

Jeroen

On Tue, Dec 16, 2025, 07:23 VASUKI M <vasukianand0119@gmail.com> wrote:

Thanks all,That analysis mak= es a=C2=A0lot of sense.

Given the lack of a clear spec,the existence= of multiple JOHAB variants,and how long this has apparently been "wor= king" without anyone noticing,IMHO desupporting=C2=A0it does seem like= the least risky option.At this point,trying to fix JOHAB variants feels li= ke opening a pretty big can of worms,especially with the potential for dump= /reload surprises or subtle parsing/security issues.

I don't hav= e additional data to add,but=C2=A0+1 on removal or deprecation being a reas= onable outcome here,given how obscure and effectively dead the encoding is = nowadays.

Thanks for digging into this.

Cheers,
Vasuki M


On Tue, Dec 16, 2025 at 11:46=E2=80=AFAM Tom Lane <tgl@sss.pgh.pa.us= > wrote:
Jero= en Vermeulen <jtvjtv@gmail.com> writes:
> This bit worries me: "TlOther, vendor-defined, Johab variants als= o exist" =E2=80=94
> such as an EBCDIC-based one and a stateful one!

Yeah.=C2=A0 So what we have here is:

1. Our JOHAB implementation has apparently been wrong since day one.

2. Wrongness may be in the eye of the beholder, since there are
multiple versions of JOHAB.

3. Your complaint is the first, AFAIR.

4. That wikipedia page says "Following the introduction of Unified
Hangul Code by Microsoft in Windows 95, and Hangul Word Processor
abandoning Johab in favour of Unicode in 2000, Johab ceased to be
commonly used."

Given these things, I wonder if we shouldn't desupport JOHAB
rather than attempt to fix it.=C2=A0 Fixing would likely be a significant amount of work: if we don't even have the character lengths right,
how likely is it that our conversions to other character sets are
correct?=C2=A0 I also worry that if different PG versions have different ideas of the mapping, there could be room for dump/reload problems,
and maybe even security problems related to the backslash issue.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 regards, tom lane




--00000000000022019306460cde9d--