Why do garbled characters occur? Explaining the cause/reason, how to fix it, and how to restore it
Home Garbled Characters Why do garbled characters occur? Explaining the cause/reason, how to fix it, and how to restore it

Why do garbled characters occur? Explaining the cause/reason, how to fix it, and how to restore it

by

in

Anyone who uses a computer or email for work has probably experienced the phenomenon of “garbled characters” at least once. Knowing the causes of garbled characters and how to fix them will definitely be useful in your work.

Therefore, in this article, we will explain why garbled characters occur, their causes and reasons, and what to do when garbled characters occur in emails. We will also introduce restoration tools, so please refer to them.



What is garbled characters? Explain the definition


First, I will explain the definition of garbled characters. Garbled characters are characters that are displayed as different characters when the computer reads them, due to an error in the encoder, etc.

Garbled characters occur not only when loading websites, but also when downloading emails and files. Also, garbled characters refer to characters that should be displayed in a different form, but in some cases the characters themselves may not be displayed.

Naturally, if the characters are garbled, the characters cannot be read and there is a high possibility that business operations will be disrupted. In order to prevent these problems, if garbled characters occur, it is important to promptly investigate the cause and know how to deal with them appropriately.



Reasons why garbled characters occur


Earlier, I explained an overview of garbled characters. But why does garbled characters occur? There are various reasons, but the most common ones are listed below.

  1. Character code cannot be read accurately
  2. Different formats and types
  3. Loading is not done properly

Let’s look at each in turn.



Character code cannot be read accurately


A character code is like a number assigned to a character. The premise is that although computers can output Kanji and English, they do not understand Kanji or English.

Computers can only understand numbers. So, what is going on behind the scenes is to understand the hiragana, kanji, and English characters entered on the keyboard by replacing them with numbers. For example, if you type “a”, your computer will recognize it as “1”.

The character code plays the role of “assign this number to this character” as described above. In addition, there are two types of typical character codes:

  1. Shift_JIS
  2. Unicode

I will explain each in turn.



Shift_JIS


Shift_JIS is the JIS mark, and is a character code specifically compatible with Japanese. Although it is specialized in Japanese, it also supports general English and alphabets. However, it does not support complex symbols or languages ​​that are not widely used. Therefore, if the text contains characters that are not supported by these characters, it cannot be recognized as a character code, and “?” is often displayed on the computer.



Unicode


While Shift_JIS is a character code specific to Japanese, Unicode can be said to be a character code that is compatible with the world. Even though it is compatible with the world, there are some characters that are not supported, but the goal is to be compatible with all characters in the world. Unicode also includes “hentai kana”, which supports characters that appear in ancient texts and characters that are not used in modern times.

The number of characters registered in Unicode continues to increase every year. It is said that there are over 140,000 characters registered, and it is unique in that it includes ligatures.



Different formats and types


I explained that there are character codes such as Shift_JIS and Unicode, but when actually importing into a computer, the type of format also comes into play. Even if the character codes match, if the formats are different, garbled characters will occur.

There are three typical formats:

  1. UTF-〇〇
  2. BE・LE
  3. BOM

I will explain each in turn.



UTF-〇〇


Typical examples of UTF-〇〇 are UTF-8 and UTF-16. UTF is a format that changes the way Unicode numbers are represented to make them easier to process within software. The number in the 〇〇 part differs depending on the bit of the number. The most mainstream is UTF-8.



BE・LE


BE/LE are officially expressed as “Big Endian” and “Little Endian.” These are the ways to represent numbers when supporting UTF-〇〇. It depends on the software you are using and what you are using it for.



BOM


BOM is an abbreviation for Byte Order Mark. Information used to determine the content of a document (characters), for example, “The characters written here are UTF-8 BE.”

When it comes to BOM, you need to be careful because if you use it in the wrong place, there are many cases where the BOM will become garbled. In addition, when saving character and text data, it is also possible to decide whether or not to include a BOM.



Loading is not done correctly


Another reason for garbled characters is that the loading is not done properly. If the text does not load properly and garbled characters occur, you may be able to resolve the issue by simply pressing F5 or the reload button on your keyboard. It is also useful not only when characters are garbled, but also when images are not displayed properly or a blank website is displayed.



Why do garbled characters occur in emails?


Even if you say garbled characters, it doesn’t just happen on websites. Garbled characters can also occur in emails. There are three reasons for this:

When garbled characters appear in emails, in most cases, correcting these will resolve the garbled characters.

  1. Adopts HTML format
  2. Using environment-dependent characters
  3. Encoding conversion is not performed



Adopts HTML format


By using

HTML

format, you can add colors and decorations to the text. When distributing e-mail newsletters, etc., many people use HTML format emails because they are no longer simply based on black text.

However, the HTML format is prone to garbled characters. In many cases, the fix is ​​to specify the character code “ISO-2022-JP” and, if you are the person opening (receiving), change the display settings to Unicode (UTF-8). Sho.



Using environment-dependent characters


Next, if you are using environment-dependent characters that are only displayed on a specific device, garbled characters will occur even in cases that do not apply to that specific device. In these cases, there is no solution on the part of the email recipient. We will introduce specific model-dependent characters later.



Encoding conversion is not performed


Regarding the last encoding, garbled characters will also occur if no conversion is performed. Why not try encoding for each email software and see if the garbled characters are resolved?



Steps to fix garbled characters


Finally, I will show you how to fix garbled characters.

  1. find out the cause
  2. Use a garbled character decoding tool
  3. If characters are garbled when printing

Let’s look at each in turn.



find out the cause


The causes and solutions for garbled characters vary depending on the website, email, or other case. Therefore, the first thing to do is to check what is causing the garbled characters.

In most cases, you should be able to resolve the issue by implementing the workarounds introduced in this article. Therefore, we recommend that you first refer to the contents of this article, depending on each case.



Use garbled character decoding tools and restoration tools


There may be cases where you have tried solutions on websites, emails, and other cases, but for some reason the issue still does not resolve. In such cases, try using a garbled character decoding tool or a restoration tool.

There are many garbled text decoding tools available for free on the Internet. If you cannot resolve the issue using the solutions in this article, please consider using such tools. We will introduce specific decoding tools later.



If characters are garbled when printing


In this article, we have explained what to do and the causes of garbled characters on your computer. However, although the characters are not garbled on the computer, there are often cases where the characters are garbled when printed.

In this case, the problem may be resolved by reinstalling the printer driver. Many printers have a test print function, so after reinstalling the driver, try using the test print function to check.

In addition, just like when characters are garbled on a computer, if you are using environment-dependent characters, you should refrain from using those environment-dependent characters. If the printer does not support it, the characters will naturally be garbled, so we recommend that you check the basics as well.



What are machine-dependent characters?


Device-dependent characters refer to character data that is handled electronically and can become garbled due to differences in the OS type and environment of devices such as computers and smartphones.

In particular, garbled characters in emails can make the content almost unreadable. You must be especially careful not to use it incorrectly in sentences written in e-mails. Examples of typical machine-dependent characters include the following.

  1. Enclosed alphanumeric characters/enclosed characters
  2. roman numerals
  3. abbreviation character
  4. Year/unit symbol
  5. Windows/Mac specific kanji



Enclosed alphanumeric characters/enclosed characters


Enclosed alphanumeric characters are a Unicode block that refers to alphanumeric characters enclosed in circles or parentheses. Enclosed characters also refer to characters that are enclosed in circles, parentheses, etc.

Model-dependent characters that are displayed correctly only on Windows
enclosed alphanumeric characters
Image: Boxed alphanumeric characters
enclosed text
Image: Boxed text
Model-dependent characters that are displayed correctly only on Macintosh
enclosed alphanumeric characters
Image: Boxed alphanumeric characters
enclosed text
Image: Boxed text



roman numerals


Roman numerals are a type of symbol that uses parts of Latin letters to represent numbers. Windows can only display numbers 1 to 10 normally, but Macintosh can display numbers 1 to 15. Each covers uppercase and lowercase letters.

Model-dependent characters that are displayed correctly only on Windows
Image: Roman numerals
Model-dependent characters that are displayed correctly only on Macintosh
Image: Roman numerals



abbreviation character


Abbreviation characters are used in a variety of ways, but the machine-dependent characters that are displayed correctly on Windows are the characters enclosed in parentheses such as “stock”, “yu”, and “dai”. Like Romaji, Macintosh has a wider variety of characters.

Model-dependent characters that are displayed correctly only on Windows
Image: Abbreviation character
Model-dependent characters that are displayed correctly only on Macintosh
Image: Abbreviation character



Year/unit symbol


Year names and unit symbols are mainly written using a method called “kumimoji,” in which multiple characters are written in the space of one full-width character.

This symbol has long been used in media such as newspapers where the number of printed characters is limited. The year symbols that can be displayed correctly on both Windows and Macintosh are the same, but as with the above item, Macintosh has the upper hand in unit symbols such as millimeters, kilos, and liters.

Model-dependent characters that are displayed correctly only on Windows
Year number
Image: Year number
unit symbol
Image: unit symbol
Model-dependent characters that are displayed correctly only on Macintosh
Year number
Image: Year number
unit symbol
Image: unit symbol



Windows/Mac specific kanji


In Windows, there are a large number of kanji that are machine-dependent characters. There are many kanji that are unlikely to be used normally, but it doesn’t hurt to memorize them. On the other hand, Macintosh has almost no kanji, only marks.

Model-dependent characters that are displayed correctly only on Windows
Image: Windows/Mac specific kanji
Model-dependent characters that are displayed correctly only on Macintosh
Image: Windows/Mac specific kanji



List of garbled character conversion/deciphering/restoration tools


Now, I will introduce a tool for converting, decoding, and restoring garbled characters.

  1. Garbled text tester
  2. Garbled text restoration tool – Ikunaga Tools
  3. Garbled text deciphering tool “Moji Bakeratta”



Garbled text tester


Garbled text tester is a tool that generates “artificial garbled characters” from normal sentences. The question arises, “Why do we need to artificially generate garbled characters?” However, he says that this tool was originally created for developers and was not intended to be made available to the general public.

However, because it was often used by non-developers to play with garbled characters, they created a separate page for the general public and started providing it. For general use, this is limited to converting character codes “UTF-8” and “Shift_JIS”, which are frequently used and are prone to garbled characters.

I’ll show you how to use it below.

1. Enter normal text from the top page of “Character Corruption Tester” (https://tools.m-bsys.com/dev_tools/char_corruption.php) and click “Character Corruption”. Then, garbled text will be output.

Screenshot: Garbled tester

2.Further below, there is a column for “Restore garbled characters”. This will be output automatically without you having to click a button.

Screenshot: Garbled tester

The original text was not restored and the text was displayed corrupted.

In fact, once any text is garbled, it is unlikely to be completely restored to its original state, and it has been shown that only about 60% to 80% of text can be restored.

Screenshot: Garbled tester

Source:

Garbled Tester

For example, there are two types of garbled characters: “garbled characters in which information is lost” and “garbled characters in which information is not lost.” The garbled characters in the part replaced by “? = question mark” in the above diagram are “garbled characters in which information is lost,” and it is basically impossible to restore such parts.

The characters whose information will be lost are known in advance, and for example, as shown in the image above, the “.= punctuation mark” at the end of a sentence is unlikely to be recovered. For this reason, it is necessary to be careful to avoid garbled characters as much as possible, especially in emails.



Garbled text restoration tool – Ikunaga Tools


Ikunaga Tools is a tool that restores garbled text. Like the above-mentioned “garbled characters tester”, this also supports character codes “UTF-8” and “Shift_JIS”, which are prone to garbled characters. It clearly states that the recovery rate for garbled characters is between 60% and 80%, and “?=question mark” is similarly impossible to recover.

It’s easy to use, just paste the garbled text into the frame and click “Restore garbled text!” I tried restoring the sentence without the question mark and seeing if I could restore it to the original sentence.

Screenshot: Garbled text restoration tool - Ikunaga Tools

Then the restore was successful.

Screenshot: Garbled text restoration tool - Ikunaga Tools



Garbled text deciphering tool “Moji Bakeratta”


The garbled text decoding tool “Moji Bakeratta” is also a tool for restoring garbled text. It supports many character codes, and you can check various character codes. In the same way as above, input the garbled characters of “Albert Einstein” and try to restore it.

Screenshot: Garbled text decoding tool “Moji Bakeratta”

For some reason, conversion failed for all character codes.

Screenshot: Garbled text decoding tool “Moji Bakeratta”



List of machine-dependent character checking tools


We would like to introduce a tool to check machine-dependent characters.

  1. Model-dependent character checker | Submit! JAPAN
  2. [WEB Tool] Model-dependent character conversion tool
  3. Environment-dependent character check tool│Digital Daishogun



Model-dependent character checker | Submit! JAPAN


The machine-dependent character checker is a tool that displays machine-dependent characters in red when they appear in a sentence. It’s easy to use; just enter the text and click “Check for model-dependent characters.”

Screenshot: Model-dependent character checker | Submit! JAPAN

Then, the machine-dependent characters will be checked as shown below.

Screenshot: Model-dependent character checker | Submit! JAPAN



[WEB Tool] Model-dependent character conversion tool


[WEB Tool] The device-dependent character conversion tool is a tool that automatically converts device-dependent characters into displayable characters. However, the machine-dependent characters that can be converted are only those listed in the correspondence table, and Windows extended kanji are not supported. The correspondence table of model-dependent characters that can be converted is shown below.

Screenshot: [WEB Tool] Model-dependent character conversion tool

To use, paste the text in the “Input text” field and click “Check”.

Screenshot: [WEB Tool] Model-dependent character conversion tool

The device-dependent characters will then be automatically converted to displayable characters.

Screenshot: [WEB Tool] Model-dependent character conversion tool



Environment-dependent character check tool│Digital Daishogun


The environment-dependent character check tool is a tool that detects environment-dependent characters.

To use it, simply enter the text in the “Enter text” field and click “Detect environment-dependent characters.”

Screenshot: Environment-dependent character check tool│Digital Taishogun

However, like the “[WEB Tool] Model-dependent Character Conversion Tool”, this also does not seem to be compatible with Windows extended kanji, so it was not possible to detect kanji.

Screenshot: Environment-dependent character check tool│Digital Taishogun



Summary: Use decryption/recovery tools etc.


In this article, we explained about garbled characters. There are various reasons why garbled characters occur, such as loading not being done correctly or character codes not being read accurately.

Additionally, garbled characters can occur not only on websites, but also in emails and other environments. When garbled characters occur, be sure to investigate the cause and take appropriate measures. We also recommend using a garbled character decoding tool or restoration tool if necessary.