class="article_content clearfix">
class="htmledit_views">
len = msg.getBytes() .length 的值是7,这是因为Windows 7操作系统字符编码是GBK(GB2312或GBK或GB18030),Java在运行程序时以操作系统默认编码来编码字符,所以字符所占字节数是7。
Java中字符串所占的字节数和字符编码密切相关。
Java编码实际上可以涉及这几个方面的知识:IDE的编码,操作系统默认编码,Java字符编码。
例如:我们使用eclipse编写Java程序时,可以在工程属性中设置Java程序的编码,若不设置,则程序的编码默认是操作系统的编码,这里设置的编码即为代码文件的编码;或者我们使用vim编写Java程序时,可以设置系统的环境变量LANG,例如 zh_CN.UTF-8,zh_CN.GB18030等,此时,代码文件的编码就是LANG所指定的编码。这就是IDE的编码,IDE的编码很重要,例如一个Java代码文件是UTF-8编码的,而你的IDE是GB18030编码,则显示就会出现乱码了。
Java中字符的编码是指Java中的字符串所采取的编码,例如有下面一段程序,用于计算字符串所占字节数,运行在Windows 7上:
class="dp-highlighter bg_class="tags" href="/tags/JAVA.html" title=java>java">
class="bar">
class="tools">
[class="tags" href="/tags/JAVA.html" title=java>java]
view plain
copy
print
?
- class="dp-j">
- class="alt">class="keyword">public class="keyword">class Charset {
- class="alt"> class="keyword">public class="keyword">static class="keyword">void main(String[] args) {
- class="comment">// TODO Auto-generated method stub
- class="alt"> String msg = class="class="tags" href="/tags/STRING.html" title=string>string">"中国abc";
- System.out.println(msg);
- class="alt"> class="keyword">int len = msg.getBytes().length;class="comment">//按操作系统默认编码来编码
- System.out.println(len);
- class="alt"> class="keyword">try{
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"GB2312").length;class="comment">//输出7
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"GB2312: "+len);
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"GBK").length;class="comment">//输出7
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"GBK: "+len);
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"GB18030").length;class="comment">//输出7, 2*2+3,一个汉字占2字节,一个英文字母一个字节
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"GB18030: "+len);
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"UTF-8").length;class="comment">//输出9, 2*3+3=9,一个汉字占3字节,一个英文字母一个字节.
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"UTF-8: "+len);
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"UTF-16").length;class="comment">//输出12
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"UTF-16: "+len);
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"UTF-32").length;class="comment">//输出20
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"UTF-32: "+len);
- len = msg.getBytes(class="class="tags" href="/tags/STRING.html" title=string>string">"Unicode").length;class="comment">//输出12
- class="alt"> System.out.println(class="class="tags" href="/tags/STRING.html" title=string>string">"Unicode: "+len);
- } class="keyword">catch ( class="tags" href="/tags/JAVA.html" title=java>java.io.UnsupportedEncodingException e)
- class="alt"> {
- System.out.println(e.getMessage().toString());
- class="alt"> }
- }
- class="alt">
- }
程序输出是:
中国abc
7
GB2312: 7
GBK: 7
GB18030: 7
UTF-8: 9
UTF-16: 12
UTF-32: 20
Unicode: 12
len = msg.getBytes() .length 的值是7,这是因为Windows 7操作系统字符编码是GBK(GB2312或GBK或GB18030),Java在运行程序时以操作系统默认编码来编码字符,所以字符所占字节数是7。
若该段程序放在,
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ export LANG=zh_CN.GB18030
- [zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ vim Charset.class="tags" href="/tags/JAVA.html" title=java>java (编写Java代码文件时,使用的编码是zh_CN.GB18030,即代码文件中的编码是 GB18030)
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>javac Charset.class="tags" href="/tags/JAVA.html" title=java>java
- [zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>java Charset (LANG=zh_CN.GB18030,即系统默认编码是GB18030)
- class="alt">中国abc
- 7 (系统默认编码是GB18030,所以占7个字节)
- class="alt">GB2312: 7
- GBK: 7
- class="alt">GB18030: 7
- UTF-8: 9
- class="alt">UTF-16: 12
- UTF-32: 20
- class="alt">Unicode: 12
- [zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ export LANG=zh_CN.UTF-8 (更改系统编码为 UTF-8 )
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>java Charset
- 涓..abc (由于XShellclass="tags" href="/tags/ZhongDuan.html" title=终端>终端编码没有设置成 UTF-8,所以打印出现乱码)
- class="alt">9 (操作系统编码是UTF-8,所以占9个字节)
- GB2312: 7
- class="alt">GBK: 7
- GB18030: 7
- class="alt">UTF-8: 9
- UTF-16: 12
- class="alt">UTF-32: 20
- Unicode: 12
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt">{设置XShellclass="tags" href="/tags/ZhongDuan.html" title=终端>终端编码为 utf-8 }
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>java Charset
- 中国abc (打印正常)
- class="alt">9
- GB2312: 7
- class="alt">GBK: 7
- GB18030: 7
- class="alt">UTF-8: 9
- UTF-16: 12
- class="alt">UTF-32: 20
- Unicode: 12
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ vim Charset.class="tags" href="/tags/JAVA.html" title=java>java
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>javac Charset.class="tags" href="/tags/JAVA.html" title=java>java (程序代码文件编码是 GB18030,而编译时系统编码是 UTF-8,class="tags" href="/tags/BianYiQi.html" title=编译器>编译器编译时若没有任何指定就会以操作系统编码的方式去读取代码文件进行编译,所以出现警告)
- Charset.class="tags" href="/tags/JAVA.html" title=java>java:6: 璀?.锛.??.UTF8 ?.??..灏..绗
- class="alt"> String msg = "锟叫癸拷abc";
- ^
- class="alt">Charset.class="tags" href="/tags/JAVA.html" title=java>java:6: 璀?.锛.??.UTF8 ?.??..灏..绗
- String msg = "锟叫癸拷abc";
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt">[zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>javac -encoding gb18030 Charset.class="tags" href="/tags/JAVA.html" title=java>java (使用 -encoding 选项指定程序文件的编码格式,则编译不会出问题)
- [zhankunlin@IctHTC class="tags" href="/tags/JAVA.html" title=java>javatest]$ class="tags" href="/tags/JAVA.html" title=java>java Charset {打印正常,因为XShellclass="tags" href="/tags/ZhongDuan.html" title=终端>终端编码已经设置为了 utf-8 }}
- class="alt">中国abc
- 9
- class="alt">GB2312: 7
- GBK: 7
- class="alt">GB18030: 7
- UTF-8: 9
- class="alt">UTF-16: 12
- UTF-32: 20
- class="alt">Unicode: 12
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt"><pre>
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt"></pre><pre name="code" class="plain">
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt"></pre><pre name="code" class="plain">
class="dp-highlighter bg_plain">
class="bar">
class="tools">
[plain]
view plain
copy
print
?
- class="alt"></pre><pre name="code" class="plain"><pre>