CCSID sanity
CCSID problems are the most-reported, most-misdiagnosed, and most-googled class of IBM i open-source bug. The good news is that there are only about six concepts to learn, and once you’ve internalized them, you can debug any CCSID issue in under five minutes.
This chapter gives you the concepts, in order, and ends with a copy-pasteable diagnostic checklist.
What a CCSID actually is
A CCSID (Coded Character Set Identifier) is IBM’s name for what the rest of the world calls a “character encoding.” Every byte on an IBM i has, in principle, a CCSID associated with it that tells the OS how to interpret it as text.
The four you’ll encounter every day:
| CCSID | What it is | Where it shows up |
|---|---|---|
| 37 | US English EBCDIC | Default for ILE / native source members; default JOBCCSID for many users |
| 819 | ISO 8859-1 (Latin-1) | Common for IFS files; what bootstrap.sh writes its log in |
| 1208 | UTF-8 | What /QOpenSys/pkgs/* uses; what you want for source code, configs, anything modern |
| 65535 | “Binary / no conversion / leave alone” | The CCSID that means “I refuse to declare a CCSID” |
The fifth, less common, that matters in some shops:
| 1252 | Windows-1252 (Latin-1 superset) | Files that came in via Windows tools |
That’s most of what you need. Yes, there are hundreds of CCSIDs; in 25 years of running open source on IBM i you’ll see maybe a dozen.
CCSID 65535: the one that ruins your day
Files tagged with CCSID 65535 mean “treat as binary; don’t translate.” This is fine for ZIP files, JPEGs, executables. It’s a disaster for text files.
When PASE writes a text file into the IFS outside /QOpenSys, it sometimes ends up tagged 65535. When you then try to read that file from ILE — or from a PASE program with JOBCCSID(65535) — IBM i refuses to translate, and you get bytes pretending to be characters.
Check the CCSID of any IFS file:
attr /home/jesse/somefile.txt CCSID
If it says 65535 and the file is actually text, fix it:
setccsid 1208 /home/jesse/somefile.txt
Or for a whole tree:
find /home/jesse/projects -type f -name "*.php" -exec setccsid 1208 {} \;
setccsid only re-tags the file. It does not convert the bytes. If the file was actually written as Latin-1 but tagged 65535, setccsid 1208 will tag it as UTF-8 but the bytes are still Latin-1, and you’ll see the same garbage. To actually convert, use iconv:
iconv -f ISO-8859-1 -t UTF-8 < old.txt > new.txt
setccsid 1208 new.txt
/QOpenSys is the easy region
/QOpenSys is special: it’s a UNIX-style filesystem that always uses CCSID 1208 (UTF-8). Files written there are tagged 1208 automatically. PASE programs that operate inside /QOpenSys see and write UTF-8 by default.
This is why most of your yum-installed software Just Works without CCSID intervention: it lives in /QOpenSys, it reads UTF-8, it writes UTF-8, IBM i agrees with it.
The CCSID complications start the moment your PHP / Python / Node code reads or writes a file outside /QOpenSys — in /home, /tmp, /www, /etc. Those are normal IFS, where the CCSID of newly-created files depends on the parent directory’s CCSID, the job’s JOBCCSID, and the program’s environment.
The K3S convention: keep application source code under /home/<user>/ or a project-root directory, and explicitly set those directories to CCSID 1208 at creation:
mkdir /home/jesse/projects
setccsid 1208 /home/jesse/projects
Files created in there by PASE programs running with JOBCCSID(1208) will inherit 1208.
JOBCCSID: the job’s notion of text
Every IBM i job has a JOBCCSID. It’s set from (in order):
- The job description’s
CCSIDparameter, if not*USRPRF. - The user profile’s
CCSIDparameter, if not*SYSVAL. - The system value
QCCSID.
Most older systems have QCCSID = 65535. Most user profiles have CCSID(*SYSVAL). So the inherited JOBCCSID is often 65535, which causes the problems described above.
You don’t have to change the system value (and probably shouldn’t without thinking carefully). You can change individual user profiles, which is the right granularity for open-source work:
CHGUSRPRF USRPRF(JESSE) CCSID(1208)
For SSH-only service accounts (web server runs as, batch jobs run as), do the same:
CHGUSRPRF USRPRF(K3SWEB) CCSID(1208)
CHGUSRPRF USRPRF(K3SAPP) CCSID(1208)
Verify on a running job:
WRKJOB JOB(0/JESSE/QPADEV0001) OPTION(*DFNATR)
Look for Coded character set identifier. (From PASE: system 'DSPJOB OPTION(*DFNATR)'.)
PASE_LANG: PASE’s notion of text
PASE has its own concept of locale, separate from JOBCCSID. The environment variable is PASE_LANG, and the value you almost always want is:
export PASE_LANG=EN_US.UTF-8
Set this at three levels:
- System-wide for all PASE jobs: append to
/QOpenSys/etc/profile. - Per-user: in
~/.profileor~/.bash_profile. - Per-script: at the top of any shell script that runs in a context where the others might not apply (e.g., cron jobs, SBMJOB-launched shells).
Plus the GNU userland variables that most modern utilities also check:
export LANG=EN_US.UTF-8
export LC_ALL=EN_US.UTF-8
The complete environment fragment
A canonical ~/.profile for an IBM i open-source user:
# Path
export PATH=/QOpenSys/pkgs/bin:$PATH
# Locale / CCSID for PASE
export PASE_LANG=EN_US.UTF-8
export LANG=EN_US.UTF-8
export LC_ALL=EN_US.UTF-8
# Editor preferences
export EDITOR=vim
export VISUAL=vim
# Common aliases
alias ll='ls -la'
alias l.='ls -d .*'
# Re-exec into bash if we landed in something else
if [ -x /QOpenSys/pkgs/bin/bash ] && [ -z "$BASH_VERSION" ]; then
exec bash -l
fi
Pair that with CHGUSRPRF USRPRF(JESSE) CCSID(1208) from CL, and the user is set up.
The VS Code complication
VS Code over SSH runs whatever bash you’ve configured. If your SSH-side environment is correct (UTF-8 everywhere), VS Code’s terminal will be too, and the editor will save files as UTF-8 by default. The Code for IBM i extension respects your locale settings.
Two things to double-check:
- In VS Code settings:
"files.encoding": "utf8"(the default). -
The PASE-side
JOBCCSIDof the SSH job. From a VS Code terminal:system "DSPJOB OPTION(*DFNATR)" | grep -i ccsidYou want 1208.
Diagnostic checklist
When something looks wrong with characters — bad accents, garbage in logs, “why does my PHP write UTF-8 to the database but read it back as Latin-1” — work this list in order:
- What CCSID is the source file?
attr myfile.php CCSID - What’s the job’s
JOBCCSID?system "DSPJOB OPTION(*DFNATR)" | grep -i ccsid - What’s
PASE_LANG?echo $PASE_LANG - What’s
LANGandLC_ALL?echo $LANG $LC_ALL - For files: was the file written with the current settings, or by something earlier with different settings?
findwith-newerand check creation timestamps.
- For DB2: what’s the column’s CCSID and what’s the connection’s CCSID negotiation?
- For columns:
SELECT CCSID FROM SYSCOLUMNS WHERE TABLE_NAME = 'MYTABLE' AND COLUMN_NAME = 'MYCOL'. - For ODBC connections: see the ODBC chapter and the PHP/PDO/ODBC toolkit guide.
- For columns:
In K3S experience, 90% of CCSID complaints resolve to one of:
- A file tagged 65535 that should be 1208.
- A user profile with
CCSID(*SYSVAL)chained back toQCCSID = 65535. - An ODBC connection that doesn’t have
UNICODESQL=1set. - A web-server-spawned PASE job that doesn’t read
/QOpenSys/etc/profileand so hasPASE_LANGunset.
Where next
CCSID is handled. Service users and authority is the next foundation chapter — it covers running daemons as something other than QSECOFR, IFS authority, and the conventions that keep a multi-user IBM i partition manageable.