Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVOTB-001m7b-0y for pgsql-bugs@arkaria.postgresql.org; Tue, 16 Dec 2025 06:23:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vVOTA-004NU8-0F for pgsql-bugs@arkaria.postgresql.org; Tue, 16 Dec 2025 06:23:36 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vVOT9-004NU0-2A for pgsql-bugs@lists.postgresql.org; Tue, 16 Dec 2025 06:23:36 +0000 Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vVOT8-000urZ-2m for pgsql-bugs@lists.postgresql.org; Tue, 16 Dec 2025 06:23:35 +0000 Received: by mail-ej1-x629.google.com with SMTP id a640c23a62f3a-b76b5afdf04so727852066b.1 for ; Mon, 15 Dec 2025 22:23:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765866213; x=1766471013; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/1rSMbUHd17mue7bV5S3BPmW5q60UwPIqPAA0VMQf90=; b=X1CXd2HS/ZtWYEyjTWYYgJHr7e3wfTm2S/+IBFeE+EM6h1X/eBvOOa37jqeSCn0Zxs GgnmZqoA39KRRpyNC90e3K+CxqcV1yMqeES+A77d4+tpoRG3ABFMcllkVSXZnBBTWLFc lth5Fyyt1NZ71cV1IcoUiL+xjf7vJIFjObo7/35XxNvSBLukorZ4wQeiO35vDbsqkI6n 5fx/gvR/1bE6GRmXXdiiXJZgY0nuycqzA4Z/8jSqmRWZ1W5+15Ccy/xXqTQf9geDdKky JJjyQjJzwML2hjMhSWddV7JOiB1Aup/BJ7duIBWGQBShBBlhnmR+xELC3Qykug/eGS8d wBvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765866213; x=1766471013; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/1rSMbUHd17mue7bV5S3BPmW5q60UwPIqPAA0VMQf90=; b=YzvHwAS9+2iCxu8D9z+ppEKM/3WJuxqxXONH48OSk+4tsIYf/+NF3USEwYx6WDhmPR Q6oP6uICS3Qgr1YLqJvz8wUBjtq4IpNBOyrddQHYWeJJVLSSnwvTnaZmI+a071OaQwYO VUEGLDOOYMPXbKhGMq8QH6TwCVdqN51SVz25qyQ8h4xACmiWe7k7hlkha2WJjc3JOFP+ HFsjpw8MHLsNuiMy3Zh1NeoGO43hIVepv6RSuP6gPfC/h/pmhWh847S9P3yIj7e+TPcZ NJ4a01x6rh956Xn4FzhQCaZGE+6ikEGkFZQrmdEMdmwgf/vb3SQ++JPmYpDHrJ0bTy1q sR1w== X-Forwarded-Encrypted: i=1; AJvYcCVd4raHEVgjevCeDjVlqYROudZc4X71RQVzWSsplZmczs/9DaH4vTy0Wg/6T4RfJ5wgGhfmfqK3Emyj@lists.postgresql.org X-Gm-Message-State: AOJu0YwXbhjij0tAwjY3d1rdZ5JHFeBv9Jm7VUym6iyz2EWxh9va5g/e FvxK97Ohs5szrWeq2z1naMQ3Z5feIUHMzeuZg9PF8TblSAQCK2kCjD82ipKNZ34U1xfLKJsPBy/ YwBfnAMQ57GVTrUeO2h4L0f6gU2TUSvPI+10w X-Gm-Gg: AY/fxX709RiHrSB6cQ/hR7+XjckhLmWAqUcEYNgQ1On3NrQ8OHECiEeuYP0YHlqNZgQ PWFtgl1E3x4/+YSNLfe6CXsEHBrCHIOoTuBmZWX0i7jiSlzlDB+x177BX84AQI+6o2SswL7GFOY jjxv4I7gEm9YzoV/LKHy9pSRYvFQb1v96BKBrBkZXgudohx38xq5LYBiGt99rst89Vi2pZy7rKW hsDBbDoTgbJlziac0xPRFfG+xS+hAgxEBeppQ0AEzxZE6ehYFzbqRAa206XrhzK5B5FuJE= X-Google-Smtp-Source: AGHT+IFCF9lvFutvGOIil7fonM7Ge6dAMEJ5n16/web8d0Cg84GUKp57iqBVzooWucRPrC203lt1rxOyb4XKg7UBs4c= X-Received: by 2002:a17:907:2d20:b0:b71:1164:6a7e with SMTP id a640c23a62f3a-b7d23218064mr1343898166b.0.1765866212794; Mon, 15 Dec 2025 22:23:32 -0800 (PST) MIME-Version: 1.0 References: <19354-eefe6d8b3e84f9f2@postgresql.org> <2292889.1765846569@sss.pgh.pa.us> In-Reply-To: <2292889.1765846569@sss.pgh.pa.us> From: VASUKI M Date: Tue, 16 Dec 2025 11:53:48 +0530 X-Gm-Features: AQt7F2oCC1ja3zpLf9Lw9nejoOnvOuakb7UYIfMKYDr3KNKB6gqqh2OMeTg7iXI Message-ID: Subject: Re: BUG #19354: JOHAB rejects valid byte sequences To: Tom Lane Cc: Jeroen Vermeulen , Robert Haas , pgsql-bugs@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000055a4eb06460bc48e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000055a4eb06460bc48e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks all,That analysis makes a lot of sense. Given the lack of a clear spec,the existence of multiple JOHAB variants,and how long this has apparently been "working" without anyone noticing,IMHO desupporting it does seem like the least risky option.At this point,trying to fix JOHAB variants feels like opening a pretty big can of worms,especially with the potential for dump/reload surprises or subtle parsing/security issues. I don't have additional data to add,but +1 on removal or deprecation being a reasonable outcome here,given how obscure and effectively dead the encoding is nowadays. Thanks for digging into this. Cheers, Vasuki M On Tue, Dec 16, 2025 at 11:46=E2=80=AFAM Tom Lane wrote= : > Jeroen Vermeulen writes: > > This bit worries me: "TlOther, vendor-defined, Johab variants also > exist" =E2=80=94 > > such as an EBCDIC-based one and a stateful one! > > Yeah. So what we have here is: > > 1. Our JOHAB implementation has apparently been wrong since day one. > > 2. Wrongness may be in the eye of the beholder, since there are > multiple versions of JOHAB. > > 3. Your complaint is the first, AFAIR. > > 4. That wikipedia page says "Following the introduction of Unified > Hangul Code by Microsoft in Windows 95, and Hangul Word Processor > abandoning Johab in favour of Unicode in 2000, Johab ceased to be > commonly used." > > Given these things, I wonder if we shouldn't desupport JOHAB > rather than attempt to fix it. Fixing would likely be a significant > amount of work: if we don't even have the character lengths right, > how likely is it that our conversions to other character sets are > correct? I also worry that if different PG versions have different > ideas of the mapping, there could be room for dump/reload problems, > and maybe even security problems related to the backslash issue. > > regards, tom lane > > > > > --00000000000055a4eb06460bc48e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Thanks all,That analysis makes a=C2=A0lot of sense.
=
Given the lack of a clear spec,the existence of multiple JOHAB variants= ,and how long this has apparently been "working" without anyone n= oticing,IMHO desupporting=C2=A0it does seem like the least risky option.At = this point,trying to fix JOHAB variants feels like opening a pretty big can= of worms,especially with the potential for dump/reload surprises or subtle= parsing/security issues.

I don't have additional data to add,bu= t=C2=A0+1 on removal or deprecation being a reasonable outcome here,given h= ow obscure and effectively dead the encoding is nowadays.

Thanks for= digging into this.

Cheers,
Vasuki M


On T= ue, Dec 16, 2025 at 11:46=E2=80=AFAM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeroen Vermeulen <jtvjtv@gmail.com> writes:
> This bit worries me: "TlOther, vendor-defined, Johab variants als= o exist" =E2=80=94
> such as an EBCDIC-based one and a stateful one!

Yeah.=C2=A0 So what we have here is:

1. Our JOHAB implementation has apparently been wrong since day one.

2. Wrongness may be in the eye of the beholder, since there are
multiple versions of JOHAB.

3. Your complaint is the first, AFAIR.

4. That wikipedia page says "Following the introduction of Unified
Hangul Code by Microsoft in Windows 95, and Hangul Word Processor
abandoning Johab in favour of Unicode in 2000, Johab ceased to be
commonly used."

Given these things, I wonder if we shouldn't desupport JOHAB
rather than attempt to fix it.=C2=A0 Fixing would likely be a significant amount of work: if we don't even have the character lengths right,
how likely is it that our conversions to other character sets are
correct?=C2=A0 I also worry that if different PG versions have different ideas of the mapping, there could be room for dump/reload problems,
and maybe even security problems related to the backslash issue.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 regards, tom lane




--00000000000055a4eb06460bc48e--