Re: BUG #19354: JOHAB rejects valid byte sequences

public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed

From: Henson Choi <assam258@gmail.com>
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Heikki Linnakangas <hlinnaka@iki.fi>
Cc: Robert Haas <robertmhaas@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Jeroen Vermeulen <jtvjtv@gmail.com>
Cc: VASUKI M <vasukianand0119@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
Date: Wed, 15 Apr 2026 14:57:50 +0900
Message-ID: <CAAAe_zAFz1v-3b7Je4L+=wZM3UGAczXV47YVZfZi9wbJxspxeA@mail.gmail.com> (raw)
In-Reply-To: <CAAAe_zCwaccH7h+GOtHbo_docCY-o0c5NMRuYkdz15f=KL4f0g@mail.gmail.com>
References: <19354-eefe6d8b3e84f9f2@postgresql.org>
	<CA+TgmoaRGSezRaA7x00X495Qho8WGTzggbDSUt-JsruXceZWug@mail.gmail.com>
	<CA+zULE4L4rA2DLAcfy=eQL7w_ZexV4P5zpQRbP=_qrhJBEOzjg@mail.gmail.com>
	<2292889.1765846569@sss.pgh.pa.us>
	<CAE2r8H5vaSyaC_t1FcpHBo-BB_=SrFj7GFnOC-SxC6WDf5c9VA@mail.gmail.com>
	<CA+zULE47EXZOp7qKYODd+mjSgDiR-WX5ZNBkwdKnj-Zc0FT58w@mail.gmail.com>
	<CA+TgmoZaoc37ohnhF5inoPxWzfoznV483xQw8Fmw+ELFScv47g@mail.gmail.com>
	<2393116.1765899706@sss.pgh.pa.us>
	<CA+TgmoaoW4F2rRzYcQQim9ddT4-6H3oi0UYV9Ucw-rRQ5MdHsg@mail.gmail.com>
	<CA+hUKGKy-ViGBXdOjcPownBM=OdWiULO8H1RyH1r_8qNp=U4CA@mail.gmail.com>
	<6a8122ac-123d-4e93-9269-0b3be1e4a5a4@iki.fi>
	<CAAAe_zCLVunjt1u+2E86shwc3hk1x4bzUyU86nY1fq-nAVYN0Q@mail.gmail.com>
	<CA+hUKGJMrcS=hBkqVk=5pjM4w8edG=_ArASC82RqB6HQro-v-g@mail.gmail.com>
	<CAAAe_zCwaccH7h+GOtHbo_docCY-o0c5NMRuYkdz15f=KL4f0g@mail.gmail.com>

Subject: Fix and expand comments for Korean encodings in encnames.c

Hi hackers,

While reading through the encoding alias table in src/common/encnames.c,
I noticed a few long-standing inaccuracies and omissions in the per-entry
comments for the three Korean encodings.

The most visible issue is the JOHAB entry, whose comment describes it as
"Extended Unix Code for simplified Chinese" -- apparently a copy/paste
slip from a neighboring EUC entry.  JOHAB is in fact the Korean
combining-style encoding defined in KS X 1001 annex 3.

The attached 0002 patch makes comment-only adjustments to the three
Korean encodings:

  * JOHAB: replace the incorrect "simplified Chinese" description with
    a correct one that identifies it as the Korean combining (Johab)
    encoding standardized in KS X 1001 annex 3.

  * EUC_KR: drop a stray space before the comma in the existing
    comment, and note that the encoding covers the KS X 1001
    precomposed (Wansung) form.

  * UHC: spell out "Unified Hangul Code", clarify that it is
    Microsoft Windows CodePage 949, and describe its relationship to
    EUC-KR (superset covering all 11,172 precomposed Hangul syllables).

No behavior change, no catalog change, no pg_wchar.h change -- this
touches comments in src/common/encnames.c only.  pgindent is clean.

Thanks,
Henson Choi

From c7a7335d2cf5a2881b25d9091fd020a2d62f7661 Mon Sep 17 00:00:00 2001
From: Henson Choi <assam258@gmail.com>
Date: Wed, 15 Apr 2026 14:52:35 +0900
Subject: [PATCH v1] Fix and expand comments for Korean encodings in encnames.c

---
 src/common/encnames.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/common/encnames.c b/src/common/encnames.c
index 9085dbecce1..959b991dde4 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -61,8 +61,9 @@ static const pg_encname pg_encname_tbl[] =
 								 * Japanese, standard OSF */
 	{
 		"euckr", PG_EUC_KR
-	},							/* EUC-KR; Extended Unix Code for Korean , KS
-								 * X 1001 standard */
+	},							/* EUC-KR; Extended Unix Code for Korean
+								 * precomposed (Wansung) encoding, standard KS
+								 * X 1001 */
 	{
 		"euctw", PG_EUC_TW
 	},							/* EUC-TW; Extended Unix Code for
@@ -119,8 +120,8 @@ static const pg_encname pg_encname_tbl[] =
 	},							/* ISO-8859-9; RFC1345,KXS2 */
 	{
 		"johab", PG_JOHAB
-	},							/* JOHAB; Extended Unix Code for simplified
-								 * Chinese */
+	},							/* JOHAB; Korean combining (Johab) encoding,
+								 * standard KS X 1001 annex 3 */
 	{
 		"koi8", PG_KOI8R
 	},							/* _dirty_ alias for KOI8-R (backward
@@ -186,7 +187,9 @@ static const pg_encname pg_encname_tbl[] =
 	},							/* alias for WIN1258 */
 	{
 		"uhc", PG_UHC
-	},							/* UHC; Korean Windows CodePage 949 */
+	},							/* UHC; Unified Hangul Code, Microsoft Windows
+								 * CodePage 949; superset of EUC-KR covering
+								 * all 11,172 precomposed Hangul syllables */
 	{
 		"unicode", PG_UTF8
 	},							/* alias for UTF8 */
-- 
2.50.1 (Apple Git-155)



Attachments:

  [text/plain] 0002-Fix-and-expand-comments-for-Korean-encodings.txt (1.7K, 3-0002-Fix-and-expand-comments-for-Korean-encodings.txt)
  download | inline diff:
From c7a7335d2cf5a2881b25d9091fd020a2d62f7661 Mon Sep 17 00:00:00 2001
From: Henson Choi <assam258@gmail.com>
Date: Wed, 15 Apr 2026 14:52:35 +0900
Subject: [PATCH v1] Fix and expand comments for Korean encodings in encnames.c

---
 src/common/encnames.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/common/encnames.c b/src/common/encnames.c
index 9085dbecce1..959b991dde4 100644
--- a/src/common/encnames.c
+++ b/src/common/encnames.c
@@ -61,8 +61,9 @@ static const pg_encname pg_encname_tbl[] =
 								 * Japanese, standard OSF */
 	{
 		"euckr", PG_EUC_KR
-	},							/* EUC-KR; Extended Unix Code for Korean , KS
-								 * X 1001 standard */
+	},							/* EUC-KR; Extended Unix Code for Korean
+								 * precomposed (Wansung) encoding, standard KS
+								 * X 1001 */
 	{
 		"euctw", PG_EUC_TW
 	},							/* EUC-TW; Extended Unix Code for
@@ -119,8 +120,8 @@ static const pg_encname pg_encname_tbl[] =
 	},							/* ISO-8859-9; RFC1345,KXS2 */
 	{
 		"johab", PG_JOHAB
-	},							/* JOHAB; Extended Unix Code for simplified
-								 * Chinese */
+	},							/* JOHAB; Korean combining (Johab) encoding,
+								 * standard KS X 1001 annex 3 */
 	{
 		"koi8", PG_KOI8R
 	},							/* _dirty_ alias for KOI8-R (backward
@@ -186,7 +187,9 @@ static const pg_encname pg_encname_tbl[] =
 	},							/* alias for WIN1258 */
 	{
 		"uhc", PG_UHC
-	},							/* UHC; Korean Windows CodePage 949 */
+	},							/* UHC; Unified Hangul Code, Microsoft Windows
+								 * CodePage 949; superset of EUC-KR covering
+								 * all 11,172 precomposed Hangul syllables */
 	{
 		"unicode", PG_UTF8
 	},							/* alias for UTF8 */
-- 
2.50.1 (Apple Git-155)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: assam258@gmail.com, thomas.munro@gmail.com, hlinnaka@iki.fi, robertmhaas@gmail.com, tgl@sss.pgh.pa.us, jtvjtv@gmail.com, vasukianand0119@gmail.com, pgsql-bugs@lists.postgresql.org
  Subject: Re: BUG #19354: JOHAB rejects valid byte sequences
  In-Reply-To: <CAAAe_zAFz1v-3b7Je4L+=wZM3UGAczXV47YVZfZi9wbJxspxeA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox