|
2 | 2 |
|
3 | 3 | # Introduction
|
4 | 4 |
|
5 |
| -Because Rust is a systems programming language, one of its goals is to |
6 |
| -interoperate well with C code. |
| 5 | +This tutorial will use the [snappy](https://code.google.com/p/snappy/) |
| 6 | +compression/decompression library as an introduction to writing bindings for |
| 7 | +foreign code. Rust is currently unable to call directly into a C++ library, but |
| 8 | +snappy includes a C interface (documented in |
| 9 | +[`snappy-c.h`](https://code.google.com/p/snappy/source/browse/trunk/snappy-c.h)). |
7 | 10 |
|
8 |
| -We'll start with an example, which is a bit bigger than usual. We'll |
9 |
| -go over it one piece at a time. This is a program that uses OpenSSL's |
10 |
| -`SHA1` function to compute the hash of its first command-line |
11 |
| -argument, which it then converts to a hexadecimal string and prints to |
12 |
| -standard output. If you have the OpenSSL libraries installed, it |
13 |
| -should compile and run without any extra effort. |
| 11 | +The following is a minimal example of calling a foreign function which will compile if snappy is |
| 12 | +installed: |
14 | 13 |
|
15 | 14 | ~~~~ {.xfail-test}
|
16 |
| -extern mod std; |
17 |
| -use core::libc::c_uint; |
| 15 | +use core::libc::size_t; |
18 | 16 |
|
19 |
| -extern mod crypto { |
20 |
| - fn SHA1(src: *u8, sz: c_uint, out: *u8) -> *u8; |
21 |
| -} |
22 |
| -
|
23 |
| -fn as_hex(data: ~[u8]) -> ~str { |
24 |
| - let mut acc = ~""; |
25 |
| - for data.each |&byte| { acc += fmt!("%02x", byte as uint); } |
26 |
| - return acc; |
27 |
| -} |
28 |
| -
|
29 |
| -fn sha1(data: ~str) -> ~str { |
30 |
| - unsafe { |
31 |
| - let bytes = str::to_bytes(data); |
32 |
| - let hash = crypto::SHA1(vec::raw::to_ptr(bytes), |
33 |
| - vec::len(bytes) as c_uint, |
34 |
| - ptr::null()); |
35 |
| - return as_hex(vec::from_buf(hash, 20)); |
36 |
| - } |
| 17 | +#[link_args = "-lsnappy"] |
| 18 | +extern { |
| 19 | + fn snappy_max_compressed_length(source_length: size_t) -> size_t; |
37 | 20 | }
|
38 | 21 |
|
39 | 22 | fn main() {
|
40 |
| - io::println(sha1(core::os::args()[1])); |
| 23 | + let x = unsafe { snappy_max_compressed_length(100) }; |
| 24 | + println(fmt!("max compressed length of a 100 byte buffer: %?", x)); |
41 | 25 | }
|
42 | 26 | ~~~~
|
43 | 27 |
|
44 |
| -# Foreign modules |
45 |
| - |
46 |
| -Before we can call the `SHA1` function defined in the OpenSSL library, we have |
47 |
| -to declare it. That is what this part of the program does: |
| 28 | +The `extern` block is a list of function signatures in a foreign library, in this case with the |
| 29 | +platform's C ABI. The `#[link_args]` attribute is used to instruct the linker to link against the |
| 30 | +snappy library so the symbols are resolved. |
48 | 31 |
|
49 |
| -~~~~ {.xfail-test} |
50 |
| -extern mod crypto { |
51 |
| - fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; } |
52 |
| -~~~~ |
| 32 | +Foreign functions are assumed to be unsafe so calls to them need to be wrapped with `unsafe {}` as a |
| 33 | +promise to the compiler that everything contained within truly is safe. C libraries often expose |
| 34 | +interfaces that aren't thread-safe, and almost any function that takes a pointer argument isn't |
| 35 | +valid for all possible inputs since the pointer could be dangling, and raw pointers fall outside of |
| 36 | +Rust's safe memory model. |
53 | 37 |
|
54 |
| -An `extern` module declaration containing function signatures introduces the |
55 |
| -functions listed as _foreign functions_. Foreign functions differ from regular |
56 |
| -Rust functions in that they are implemented in some other language (usually C) |
57 |
| -and called through Rust's foreign function interface (FFI). An extern module |
58 |
| -like this is called a foreign module, and implicitly tells the compiler to |
59 |
| -link with a library that contains the listed foreign functions, and has the |
60 |
| -same name as the module. |
| 38 | +When declaring the argument types to a foreign function, the Rust compiler will not check if the |
| 39 | +declaration is correct, so specifying it correctly is part of keeping the binding correct at |
| 40 | +runtime. |
61 | 41 |
|
62 |
| -In this case, the Rust compiler changes the name `crypto` to a shared library |
63 |
| -name in a platform-specific way (`libcrypto.so` on Linux, for example), |
64 |
| -searches for the shared library with that name, and links the library into the |
65 |
| -program. If you want the module to have a different name from the actual |
66 |
| -library, you can use the `"link_name"` attribute, like: |
| 42 | +The `extern` block can be extended to cover the entire snappy API: |
67 | 43 |
|
68 | 44 | ~~~~ {.xfail-test}
|
69 |
| -#[link_name = "crypto"] |
70 |
| -extern mod something { |
71 |
| - fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; |
| 45 | +use core::libc::{c_int, size_t}; |
| 46 | +
|
| 47 | +#[link_args = "-lsnappy"] |
| 48 | +extern { |
| 49 | + fn snappy_compress(input: *u8, |
| 50 | + input_length: size_t, |
| 51 | + compressed: *mut u8, |
| 52 | + compressed_length: *mut size_t) -> c_int; |
| 53 | + fn snappy_uncompress(compressed: *u8, |
| 54 | + compressed_length: size_t, |
| 55 | + uncompressed: *mut u8, |
| 56 | + uncompressed_length: *mut size_t) -> c_int; |
| 57 | + fn snappy_max_compressed_length(source_length: size_t) -> size_t; |
| 58 | + fn snappy_uncompressed_length(compressed: *u8, |
| 59 | + compressed_length: size_t, |
| 60 | + result: *mut size_t) -> c_int; |
| 61 | + fn snappy_validate_compressed_buffer(compressed: *u8, |
| 62 | + compressed_length: size_t) -> c_int; |
72 | 63 | }
|
73 | 64 | ~~~~
|
74 | 65 |
|
75 |
| -# Foreign calling conventions |
| 66 | +# Creating a safe interface |
76 | 67 |
|
77 |
| -Most foreign code is C code, which usually uses the `cdecl` calling |
78 |
| -convention, so that is what Rust uses by default when calling foreign |
79 |
| -functions. Some foreign functions, most notably the Windows API, use other |
80 |
| -calling conventions. Rust provides the `"abi"` attribute as a way to hint to |
81 |
| -the compiler which calling convention to use: |
| 68 | +The raw C API needs to be wrapped to provide memory safety and make use higher-level concepts like |
| 69 | +vectors. A library can choose to expose only the safe, high-level interface and hide the unsafe |
| 70 | +internal details. |
82 | 71 |
|
83 |
| -~~~~ |
84 |
| -#[cfg(target_os = "win32")] |
85 |
| -#[abi = "stdcall"] |
86 |
| -extern mod kernel32 { |
87 |
| - fn SetEnvironmentVariableA(n: *u8, v: *u8) -> int; |
| 72 | +Wrapping the functions which expect buffers involves using the `vec::raw` module to manipulate Rust |
| 73 | +vectors as pointers to memory. Rust's vectors are guaranteed to be a contiguous block of memory. The |
| 74 | +length is number of elements currently contained, and the capacity is the total size in elements of |
| 75 | +the allocated memory. The length is less than or equal to the capacity. |
| 76 | + |
| 77 | +~~~~ {.xfail-test} |
| 78 | +pub fn validate_compressed_buffer(src: &[u8]) -> bool { |
| 79 | + unsafe { |
| 80 | + snappy_validate_compressed_buffer(vec::raw::to_ptr(src), src.len() as size_t) == 0 |
| 81 | + } |
88 | 82 | }
|
89 | 83 | ~~~~
|
90 | 84 |
|
91 |
| -The `"abi"` attribute applies to a foreign module (it cannot be applied |
92 |
| -to a single function within a module), and must be either `"cdecl"` |
93 |
| -or `"stdcall"`. We may extend the compiler in the future to support other |
94 |
| -calling conventions. |
| 85 | +The `validate_compressed_buffer` wrapper above makes use of an `unsafe` block, but it makes the |
| 86 | +guarantee that calling it is safe for all inputs by leaving off `unsafe` from the function |
| 87 | +signature. |
95 | 88 |
|
96 |
| -# Unsafe pointers |
| 89 | +The `snappy_compress` and `snappy_uncompress` functions are more complex, since a buffer has to be |
| 90 | +allocated to hold the output too. |
97 | 91 |
|
98 |
| -The foreign `SHA1` function takes three arguments, and returns a pointer. |
| 92 | +The `snappy_max_compressed_length` function can be used to allocate a vector with the maximum |
| 93 | +required capacity to hold the compressed output. The vector can then be passed to the |
| 94 | +`snappy_compress` function as an output parameter. An output parameter is also passed to retrieve |
| 95 | +the true length after compression for setting the length. |
99 | 96 |
|
100 | 97 | ~~~~ {.xfail-test}
|
101 |
| -# extern mod crypto { |
102 |
| -fn SHA1(src: *u8, sz: libc::c_uint, out: *u8) -> *u8; |
103 |
| -# } |
104 |
| -~~~~ |
| 98 | +pub fn compress(src: &[u8]) -> ~[u8] { |
| 99 | + unsafe { |
| 100 | + let srclen = src.len() as size_t; |
| 101 | + let psrc = vec::raw::to_ptr(src); |
105 | 102 |
|
106 |
| -When declaring the argument types to a foreign function, the Rust |
107 |
| -compiler has no way to check whether your declaration is correct, so |
108 |
| -you have to be careful. If you get the number or types of the |
109 |
| -arguments wrong, you're likely to cause a segmentation fault. Or, |
110 |
| -probably even worse, your code will work on one platform, but break on |
111 |
| -another. |
| 103 | + let mut dstlen = snappy_max_compressed_length(srclen); |
| 104 | + let mut dst = vec::with_capacity(dstlen as uint); |
| 105 | + let pdst = vec::raw::to_mut_ptr(dst); |
112 | 106 |
|
113 |
| -In this case, we declare that `SHA1` takes two `unsigned char*` |
114 |
| -arguments and one `unsigned long`. The Rust equivalents are `*u8` |
115 |
| -unsafe pointers and an `uint` (which, like `unsigned long`, is a |
116 |
| -machine-word-sized type). |
| 107 | + snappy_compress(psrc, srclen, pdst, &mut dstlen); |
| 108 | + vec::raw::set_len(&mut dst, dstlen as uint); |
| 109 | + dst |
| 110 | + } |
| 111 | +} |
| 112 | +~~~~ |
117 | 113 |
|
118 |
| -The standard library provides various functions to create unsafe pointers, |
119 |
| -such as those in `core::cast`. Most of these functions have `unsafe` in their |
120 |
| -name. You can dereference an unsafe pointer with the `*` operator, but use |
121 |
| -caution: unlike Rust's other pointer types, unsafe pointers are completely |
122 |
| -unmanaged, so they might point at invalid memory, or be null pointers. |
| 114 | +Decompression is similar, because snappy stores the uncompressed size as part of the compression |
| 115 | +format and `snappy_uncompressed_length` will retrieve the exact buffer size required. |
123 | 116 |
|
124 |
| -# Unsafe blocks |
| 117 | +~~~~ {.xfail-test} |
| 118 | +pub fn uncompress(src: &[u8]) -> Option<~[u8]> { |
| 119 | + unsafe { |
| 120 | + let srclen = src.len() as size_t; |
| 121 | + let psrc = vec::raw::to_ptr(src); |
125 | 122 |
|
126 |
| -The `sha1` function is the most obscure part of the program. |
| 123 | + let mut dstlen: size_t = 0; |
| 124 | + snappy_uncompressed_length(psrc, srclen, &mut dstlen); |
127 | 125 |
|
128 |
| -~~~~ |
129 |
| -# pub mod crypto { |
130 |
| -# pub fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } |
131 |
| -# } |
132 |
| -# fn as_hex(data: ~[u8]) -> ~str { ~"hi" } |
133 |
| -fn sha1(data: ~str) -> ~str { |
134 |
| - unsafe { |
135 |
| - let bytes = str::to_bytes(data); |
136 |
| - let hash = crypto::SHA1(vec::raw::to_ptr(bytes), |
137 |
| - vec::len(bytes), ptr::null()); |
138 |
| - return as_hex(vec::from_buf(hash, 20)); |
| 126 | + let mut dst = vec::with_capacity(dstlen as uint); |
| 127 | + let pdst = vec::raw::to_mut_ptr(dst); |
| 128 | +
|
| 129 | + if snappy_uncompress(psrc, srclen, pdst, &mut dstlen) == 0 { |
| 130 | + vec::raw::set_len(&mut dst, dstlen as uint); |
| 131 | + Some(dst) |
| 132 | + } else { |
| 133 | + None // SNAPPY_INVALID_INPUT |
| 134 | + } |
139 | 135 | }
|
140 | 136 | }
|
141 | 137 | ~~~~
|
142 | 138 |
|
143 |
| -First, what does the `unsafe` keyword at the top of the function |
144 |
| -mean? `unsafe` is a block modifier—it declares the block following it |
145 |
| -to be known to be unsafe. |
| 139 | +For reference, the examples used here are also available as an [library on |
| 140 | +GitHub](https://github.com/thestinger/rust-snappy). |
146 | 141 |
|
147 |
| -Some operations, like dereferencing unsafe pointers or calling |
148 |
| -functions that have been marked unsafe, are only allowed inside unsafe |
149 |
| -blocks. With the `unsafe` keyword, you're telling the compiler 'I know |
150 |
| -what I'm doing'. The main motivation for such an annotation is that |
151 |
| -when you have a memory error (and you will, if you're using unsafe |
152 |
| -constructs), you have some idea where to look—it will most likely be |
153 |
| -caused by some unsafe code. |
| 142 | +# Linking |
154 | 143 |
|
155 |
| -Unsafe blocks isolate unsafety. Unsafe functions, on the other hand, |
156 |
| -advertise it to the world. An unsafe function is written like this: |
157 |
| - |
158 |
| -~~~~ |
159 |
| -unsafe fn kaboom() { ~"I'm harmless!"; } |
160 |
| -~~~~ |
| 144 | +In addition to the `#[link_args]` attribute for explicitly passing arguments to the linker, an |
| 145 | +`extern mod` block will pass `-lmodname` to the linker by default unless it has a `#[nolink]` |
| 146 | +attribute applied. |
161 | 147 |
|
162 |
| -This function can only be called from an `unsafe` block or another |
163 |
| -`unsafe` function. |
164 |
| - |
165 |
| -# Pointer fiddling |
| 148 | +# Unsafe blocks |
166 | 149 |
|
167 |
| -The standard library defines a number of helper functions for dealing |
168 |
| -with unsafe data, casting between types, and generally subverting |
169 |
| -Rust's safety mechanisms. |
| 150 | +Some operations, like dereferencing unsafe pointers or calling functions that have been marked |
| 151 | +unsafe are only allowed inside unsafe blocks. Unsafe blocks isolate unsafety and are a promise to |
| 152 | +the compiler that the unsafety does not leak out of the block. |
170 | 153 |
|
171 |
| -Let's look at our `sha1` function again. |
| 154 | +Unsafe functions, on the other hand, advertise it to the world. An unsafe function is written like |
| 155 | +this: |
172 | 156 |
|
173 | 157 | ~~~~
|
174 |
| -# pub mod crypto { |
175 |
| -# pub fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } |
176 |
| -# } |
177 |
| -# fn as_hex(data: ~[u8]) -> ~str { ~"hi" } |
178 |
| -# fn x(data: ~str) -> ~str { |
179 |
| -# unsafe { |
180 |
| -let bytes = str::to_bytes(data); |
181 |
| -let hash = crypto::SHA1(vec::raw::to_ptr(bytes), |
182 |
| - vec::len(bytes), ptr::null()); |
183 |
| -return as_hex(vec::from_buf(hash, 20)); |
184 |
| -# } |
185 |
| -# } |
| 158 | +unsafe fn kaboom(ptr: *int) -> int { *ptr } |
186 | 159 | ~~~~
|
187 | 160 |
|
188 |
| -The `str::to_bytes` function is perfectly safe: it converts a string to a |
189 |
| -`~[u8]`. The program then feeds this byte array to `vec::raw::to_ptr`, which |
190 |
| -returns an unsafe pointer to its contents. |
191 |
| - |
192 |
| -This pointer will become invalid at the end of the scope in which the vector |
193 |
| -it points to (`bytes`) is valid, so you should be very careful how you use |
194 |
| -it. In this case, the local variable `bytes` outlives the pointer, so we're |
195 |
| -good. |
196 |
| - |
197 |
| -Passing a null pointer as the third argument to `SHA1` makes it use a |
198 |
| -static buffer, and thus save us the effort of allocating memory |
199 |
| -ourselves. `ptr::null` is a generic function that, in this case, returns an |
200 |
| -unsafe null pointer of type `*u8`. (Rust generics are awesome |
201 |
| -like that: they can take the right form depending on the type that they |
202 |
| -are expected to return.) |
203 |
| - |
204 |
| -Finally, `vec::from_buf` builds up a new `~[u8]` from the |
205 |
| -unsafe pointer that `SHA1` returned. SHA1 digests are always |
206 |
| -twenty bytes long, so we can pass `20` for the length of the new |
207 |
| -vector. |
208 |
| - |
209 |
| -# Passing structures |
| 161 | +This function can only be called from an `unsafe` block or another `unsafe` function. |
210 | 162 |
|
211 |
| -C functions often take pointers to structs as arguments. Since Rust |
212 |
| -`struct`s are binary-compatible with C structs, Rust programs can call |
213 |
| -such functions directly. |
| 163 | +# Foreign calling conventions |
214 | 164 |
|
215 |
| -This program uses the POSIX function `gettimeofday` to get a |
216 |
| -microsecond-resolution timer. |
| 165 | +Most foreign code exposes a C ABI, and Rust uses the platform's C calling convention by default when |
| 166 | +calling foreign functions. Some foreign functions, most notably the Windows API, use other calling |
| 167 | +conventions. Rust provides the `abi` attribute as a way to hint to the compiler which calling |
| 168 | +convention to use: |
217 | 169 |
|
218 | 170 | ~~~~
|
219 |
| -extern mod std; |
220 |
| -use core::libc::c_ulonglong; |
221 |
| -
|
222 |
| -struct timeval { |
223 |
| - tv_sec: c_ulonglong, |
224 |
| - tv_usec: c_ulonglong |
| 171 | +#[cfg(target_os = "win32")] |
| 172 | +#[abi = "stdcall"] |
| 173 | +extern mod kernel32 { |
| 174 | + fn SetEnvironmentVariableA(n: *u8, v: *u8) -> int; |
225 | 175 | }
|
| 176 | +~~~~ |
226 | 177 |
|
227 |
| -#[nolink] |
228 |
| -extern mod lib_c { |
229 |
| - fn gettimeofday(tv: *mut timeval, tz: *()) -> i32; |
230 |
| -} |
231 |
| -fn unix_time_in_microseconds() -> u64 { |
232 |
| - unsafe { |
233 |
| - let mut x = timeval { |
234 |
| - tv_sec: 0 as c_ulonglong, |
235 |
| - tv_usec: 0 as c_ulonglong |
236 |
| - }; |
237 |
| - lib_c::gettimeofday(&mut x, ptr::null()); |
238 |
| - return (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64); |
239 |
| - } |
240 |
| -} |
| 178 | +The `abi` attribute applies to a foreign module (it cannot be applied to a single function within a |
| 179 | +module), and must be either `"cdecl"` or `"stdcall"`. The compiler may eventually support other |
| 180 | +calling conventions. |
241 | 181 |
|
242 |
| -# fn main() { assert!(fmt!("%?", unix_time_in_microseconds()) != ~""); } |
243 |
| -~~~~ |
| 182 | +# Interoperability with foreign code |
244 | 183 |
|
245 |
| -The `#[nolink]` attribute indicates that there's no foreign library to |
246 |
| -link in. The standard C library is already linked with Rust programs. |
| 184 | +Rust guarantees that the layout of a `struct` is compatible with the platform's representation in C. |
| 185 | +A `#[packed]` attribute is available, which will lay out the struct members without padding. |
| 186 | +However, there are currently no guarantees about the layout of an `enum`. |
247 | 187 |
|
248 |
| -In C, a `timeval` is a struct with two 32-bit integer fields. Thus, we |
249 |
| -define a `struct` type with the same contents, and declare |
250 |
| -`gettimeofday` to take a pointer to such a `struct`. |
| 188 | +Rust's owned and managed boxes use non-nullable pointers as handles which point to the contained |
| 189 | +object. However, they should not be manually because they are managed by internal allocators. |
| 190 | +Borrowed pointers can safely be assumed to be non-nullable pointers directly to the type. However, |
| 191 | +breaking the borrow checking or mutability rules is not guaranteed to be safe, so prefer using raw |
| 192 | +pointers (`*`) if that's needed because the compiler can't make as many assumptions about them. |
251 | 193 |
|
252 |
| -This program does not use the second argument to `gettimeofday` (the time |
253 |
| - zone), so the `extern mod` declaration for it simply declares this argument |
254 |
| - to be a pointer to the unit type (written `()`). Since all null pointers have |
255 |
| - the same representation regardless of their referent type, this is safe. |
| 194 | +Vectors and strings share the same basic memory layout, and utilities are available in the `vec` and |
| 195 | +`str` modules for working with C APIs. Strings are terminated with `\0` for interoperability with C, |
| 196 | +but it should not be assumed because a slice will not always be nul-terminated. Instead, the |
| 197 | +`str::as_c_str` function should be used. |
256 | 198 |
|
| 199 | +The standard library includes type aliases and function definitions for the C standard library in |
| 200 | +the `libc` module, and Rust links against `libc` and `libm` by default. |
0 commit comments