자바에서 문자열의 바이트
Java에서 String이있는 경우 x
해당 문자열의 바이트 수를 어떻게 계산할 수 있습니까?
문자열은 문자 목록입니다 (예 : 코드 포인트). 문자열을 나타내는 데 사용되는 바이트 수는 문자열 을 바이트로 변환하는 데 사용하는 인코딩에 전적으로 달려 있습니다 .
즉, 문자열을 바이트 배열로 바꾸고 다음과 같이 크기를 볼 수 있습니다.
// The input string for this test
final String string = "Hello World";
// Check length, in characters
System.out.println(string.length()); // prints "11"
// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"
final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"
final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"
final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"
final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"
따라서 간단한 "ASCII"문자열조차 사용되는 인코딩에 따라 표현에 다른 바이트 수를 가질 수 있습니다. 귀하의 사례에 관심있는 문자 세트를에 대한 인수로 사용하십시오 getBytes()
. 그리고 UTF-8이 모든 문자를 단일 바이트로 표현한다고 가정하는 함정에 빠지지 마십시오 .
final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms
// Check length, in characters
System.out.println(interesting.length()); // prints "4"
// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"
final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"
final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"
final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")
final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")
(Note that if you don't provide a character set argument, the platform's default character set is used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)
If you're running with 64-bit references:
sizeof(string) =
8 + // object header used by the VM
8 + // 64-bit reference to char array (value)
8 + string.length() * 2 + // character array itself (object header + 16-bit chars)
4 + // offset integer
4 + // count integer
4 + // cached hash code
In other words:
sizeof(string) = 36 + string.length() * 2
On a 32-bit VM or a 64-bit VM with compressed OOPs (-XX:+UseCompressedOops), the references are 4 bytes. So the total would be:
sizeof(string) = 32 + string.length() * 2
This does not take into account the references to the string object.
The pedantic answer (though not necessarily the most useful one, depending on what you want to do with the result) is:
string.length() * 2
Java strings are physically stored in UTF-16BE
encoding, which uses 2 bytes per code unit, and String.length()
measures the length in UTF-16 code units, so this is equivalent to:
final byte[] utf16Bytes= string.getBytes("UTF-16BE");
System.out.println(utf16Bytes.length);
And this will tell you the size of the internal char
array, in bytes.
Note: "UTF-16"
will give a different result from "UTF-16BE"
as the former encoding will insert a BOM, adding 2 bytes to the length of the array.
According to How to convert Strings to and from UTF8 byte arrays in Java:
String s = "some text here";
byte[] b = s.getBytes("UTF-8");
System.out.println(b.length);
A String
instance allocates a certain amount of bytes in memory. Maybe you're looking at something like sizeof("Hello World")
which would return the number of bytes allocated by the datastructure itself?
In Java, there's usually no need for a sizeof
function, because we never allocate memory to store a data structure. We can have a look at the String.java
file for a rough estimation, and we see some 'int', some references and a char[]
. The Java language specification defines, that a char
ranges from 0 to 65535, so two bytes are sufficient to keep a single char in memory. But a JVM does not have to store one char in 2 bytes, it only has to guarantee, that the implementation of char
can hold values of the defines range.
So sizeof
really does not make any sense in Java. But, assuming that we have a large String and one char
allocates two bytes, then the memory footprint of a String
object is at least 2 * str.length()
in bytes.
There's a method called getBytes(). Use it wisely .
Try this :
Bytes.toBytes(x).length
Assuming you declared and initialized x before
참고URL : https://stackoverflow.com/questions/4385623/bytes-of-a-string-in-java
'IT' 카테고리의 다른 글
docker-compose를 사용하여 항상 새로운 이미지에서 컨테이너를 다시 만드는 방법은 무엇입니까? (0) | 2020.06.09 |
---|---|
클래스 내에서 정적 메소드를 호출합니까? (0) | 2020.06.09 |
Java에서 &와 &&의 차이점은 무엇입니까? (0) | 2020.06.08 |
이진 파일을 비교하여 동일한 지 확인하는 방법은 무엇입니까? (0) | 2020.06.08 |
줄 바꿈이 파일의 마지막 문자 인 경우 어떻게 삭제합니까? (0) | 2020.06.08 |