Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJz3Q-000T8b-2b for pgsql-general@arkaria.postgresql.org; Mon, 04 May 2026 19:34:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wJz3O-008dDa-1j for pgsql-general@arkaria.postgresql.org; Mon, 04 May 2026 19:34:06 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wJz3O-008dDS-0h for pgsql-general@lists.postgresql.org; Mon, 04 May 2026 19:34:06 +0000 Received: from dverite2024.planet-service.net ([185.16.44.252] helo=mail.verite.pro) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wJz3L-00000000BwA-2Ur7 for pgsql-general@lists.postgresql.org; Mon, 04 May 2026 19:34:05 +0000 Received: by mail.verite.pro (Postfix, from userid 1000) id 128C32C7A6E; Mon, 4 May 2026 21:34:01 +0200 (CEST) Content-Type: text/plain; charset="iso-8859-15" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "Daniel Verite" To: "Laurenz Albe" Cc: Ron Johnson , pgsql-general@lists.postgresql.org Subject: Re: Choosing default collation/ctype In-Reply-To: <63e4b5165442ada9f187a0e14bbfe04795088bcd.camel@cybertec.at> Date: Mon, 04 May 2026 21:34:00 +0200 Message-Id: <627add7e-94df-49ca-aa12-ae3900b7945f@manitou-mail.org> X-Mailer: Manitou v1.7.3 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Laurenz Albe wrote: > > Then choose UTF8. >=20 > Right! And I recommend "C" for the collation. Yet the "C" collation is unsuitable for handling character types beyond ASCII. For instance, it considers that accented letters are not letters, so upper('=E9t=E9') is '=E9T=E9' instead of '=C9T=C9', and '=E9' ~ '\w' is = false. C.UTF-8 solves that, and since Postgres 17, it's available for all operating systems with the builtin provider. So if you target Postgres 17+, C.UTF-8 from the builtin provider is a better choice for UTF-8 databases than "C" . Best regards, --=20 Daniel V=E9rit=E9=20 https://postgresql.verite.pro/