Skip to content

Commit 5d5e1a6

Browse files
committed
auto merge of #5849 : thestinger/rust/ffi, r=brson
The code samples are xfail'ed because the buildbot won't have `libsnappy.so` and I don't want to add boilerplate to all the snippets (but they're directly from my snappy bindings so I'll just send a pull request whenever they break). It works really well as an example because it's tiny library and lets the caller manage all the buffers. I think everything stated in the `Interoperability with foreign code` section is accurate but it deserves a thorough double-check :).
2 parents 82a8815 + 1faa359 commit 5d5e1a6

File tree

1 file changed

+139
-195
lines changed

1 file changed

+139
-195
lines changed

doc/tutorial-ffi.md

+139-195
Original file line numberDiff line numberDiff line change
@@ -2,255 +2,199 @@
22

33
# Introduction
44

5-
Because Rust is a systems programming language, one of its goals is to
6-
interoperate well with C code.
5+
This tutorial will use the [snappy](https://code.google.com/p/snappy/)
6+
compression/decompression library as an introduction to writing bindings for
7+
foreign code. Rust is currently unable to call directly into a C++ library, but
8+
snappy includes a C interface (documented in
9+
[`snappy-c.h`](https://code.google.com/p/snappy/source/browse/trunk/snappy-c.h)).
710

8-
We'll start with an example, which is a bit bigger than usual. We'll
9-
go over it one piece at a time. This is a program that uses OpenSSL's
10-
`SHA1` function to compute the hash of its first command-line
11-
argument, which it then converts to a hexadecimal string and prints to
12-
standard output. If you have the OpenSSL libraries installed, it
13-
should compile and run without any extra effort.
11+
The following is a minimal example of calling a foreign function which will compile if snappy is
12+
installed:
1413

1514
~~~~ {.xfail-test}
16-
extern mod std;
17-
use core::libc::c_uint;
15+
use core::libc::size_t;
1816
19-
extern mod crypto {
20-
fn SHA1(src: *u8, sz: c_uint, out: *u8) -> *u8;
21-
}
22-
23-
fn as_hex(data: ~[u8]) -> ~str {
24-
let mut acc = ~"";
25-
for data.each |&byte| { acc += fmt!("%02x", byte as uint); }
26-
return acc;
27-
}
28-
29-
fn sha1(data: ~str) -> ~str {
30-
unsafe {
31-
let bytes = str::to_bytes(data);
32-
let hash = crypto::SHA1(vec::raw::to_ptr(bytes),
33-
vec::len(bytes) as c_uint,
34-
ptr::null());
35-
return as_hex(vec::from_buf(hash, 20));
36-
}
17+
#[link_args = "-lsnappy"]
18+
extern {
19+
fn snappy_max_compressed_length(source_length: size_t) -> size_t;
3720
}
3821
3922
fn main() {
40-
io::println(sha1(core::os::args()[1]));
23+
let x = unsafe { snappy_max_compressed_length(100) };
24+
println(fmt!("max compressed length of a 100 byte buffer: %?", x));
4125
}
4226
~~~~
4327

44-
# Foreign modules
45-
46-
Before we can call the `SHA1` function defined in the OpenSSL library, we have
47-
to declare it. That is what this part of the program does:
28+
The `extern` block is a list of function signatures in a foreign library, in this case with the
29+
platform's C ABI. The `#[link_args]` attribute is used to instruct the linker to link against the
30+
snappy library so the symbols are resolved.
4831

49-
~~~~ {.xfail-test}
50-
extern mod crypto {
51-
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; }
52-
~~~~
32+
Foreign functions are assumed to be unsafe so calls to them need to be wrapped with `unsafe {}` as a
33+
promise to the compiler that everything contained within truly is safe. C libraries often expose
34+
interfaces that aren't thread-safe, and almost any function that takes a pointer argument isn't
35+
valid for all possible inputs since the pointer could be dangling, and raw pointers fall outside of
36+
Rust's safe memory model.
5337

54-
An `extern` module declaration containing function signatures introduces the
55-
functions listed as _foreign functions_. Foreign functions differ from regular
56-
Rust functions in that they are implemented in some other language (usually C)
57-
and called through Rust's foreign function interface (FFI). An extern module
58-
like this is called a foreign module, and implicitly tells the compiler to
59-
link with a library that contains the listed foreign functions, and has the
60-
same name as the module.
38+
When declaring the argument types to a foreign function, the Rust compiler will not check if the
39+
declaration is correct, so specifying it correctly is part of keeping the binding correct at
40+
runtime.
6141

62-
In this case, the Rust compiler changes the name `crypto` to a shared library
63-
name in a platform-specific way (`libcrypto.so` on Linux, for example),
64-
searches for the shared library with that name, and links the library into the
65-
program. If you want the module to have a different name from the actual
66-
library, you can use the `"link_name"` attribute, like:
42+
The `extern` block can be extended to cover the entire snappy API:
6743

6844
~~~~ {.xfail-test}
69-
#[link_name = "crypto"]
70-
extern mod something {
71-
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
45+
use core::libc::{c_int, size_t};
46+
47+
#[link_args = "-lsnappy"]
48+
extern {
49+
fn snappy_compress(input: *u8,
50+
input_length: size_t,
51+
compressed: *mut u8,
52+
compressed_length: *mut size_t) -> c_int;
53+
fn snappy_uncompress(compressed: *u8,
54+
compressed_length: size_t,
55+
uncompressed: *mut u8,
56+
uncompressed_length: *mut size_t) -> c_int;
57+
fn snappy_max_compressed_length(source_length: size_t) -> size_t;
58+
fn snappy_uncompressed_length(compressed: *u8,
59+
compressed_length: size_t,
60+
result: *mut size_t) -> c_int;
61+
fn snappy_validate_compressed_buffer(compressed: *u8,
62+
compressed_length: size_t) -> c_int;
7263
}
7364
~~~~
7465

75-
# Foreign calling conventions
66+
# Creating a safe interface
7667

77-
Most foreign code is C code, which usually uses the `cdecl` calling
78-
convention, so that is what Rust uses by default when calling foreign
79-
functions. Some foreign functions, most notably the Windows API, use other
80-
calling conventions. Rust provides the `"abi"` attribute as a way to hint to
81-
the compiler which calling convention to use:
68+
The raw C API needs to be wrapped to provide memory safety and make use higher-level concepts like
69+
vectors. A library can choose to expose only the safe, high-level interface and hide the unsafe
70+
internal details.
8271

83-
~~~~
84-
#[cfg(target_os = "win32")]
85-
#[abi = "stdcall"]
86-
extern mod kernel32 {
87-
fn SetEnvironmentVariableA(n: *u8, v: *u8) -> int;
72+
Wrapping the functions which expect buffers involves using the `vec::raw` module to manipulate Rust
73+
vectors as pointers to memory. Rust's vectors are guaranteed to be a contiguous block of memory. The
74+
length is number of elements currently contained, and the capacity is the total size in elements of
75+
the allocated memory. The length is less than or equal to the capacity.
76+
77+
~~~~ {.xfail-test}
78+
pub fn validate_compressed_buffer(src: &[u8]) -> bool {
79+
unsafe {
80+
snappy_validate_compressed_buffer(vec::raw::to_ptr(src), src.len() as size_t) == 0
81+
}
8882
}
8983
~~~~
9084

91-
The `"abi"` attribute applies to a foreign module (it cannot be applied
92-
to a single function within a module), and must be either `"cdecl"`
93-
or `"stdcall"`. We may extend the compiler in the future to support other
94-
calling conventions.
85+
The `validate_compressed_buffer` wrapper above makes use of an `unsafe` block, but it makes the
86+
guarantee that calling it is safe for all inputs by leaving off `unsafe` from the function
87+
signature.
9588

96-
# Unsafe pointers
89+
The `snappy_compress` and `snappy_uncompress` functions are more complex, since a buffer has to be
90+
allocated to hold the output too.
9791

98-
The foreign `SHA1` function takes three arguments, and returns a pointer.
92+
The `snappy_max_compressed_length` function can be used to allocate a vector with the maximum
93+
required capacity to hold the compressed output. The vector can then be passed to the
94+
`snappy_compress` function as an output parameter. An output parameter is also passed to retrieve
95+
the true length after compression for setting the length.
9996

10097
~~~~ {.xfail-test}
101-
# extern mod crypto {
102-
fn SHA1(src: *u8, sz: libc::c_uint, out: *u8) -> *u8;
103-
# }
104-
~~~~
98+
pub fn compress(src: &[u8]) -> ~[u8] {
99+
unsafe {
100+
let srclen = src.len() as size_t;
101+
let psrc = vec::raw::to_ptr(src);
105102
106-
When declaring the argument types to a foreign function, the Rust
107-
compiler has no way to check whether your declaration is correct, so
108-
you have to be careful. If you get the number or types of the
109-
arguments wrong, you're likely to cause a segmentation fault. Or,
110-
probably even worse, your code will work on one platform, but break on
111-
another.
103+
let mut dstlen = snappy_max_compressed_length(srclen);
104+
let mut dst = vec::with_capacity(dstlen as uint);
105+
let pdst = vec::raw::to_mut_ptr(dst);
112106
113-
In this case, we declare that `SHA1` takes two `unsigned char*`
114-
arguments and one `unsigned long`. The Rust equivalents are `*u8`
115-
unsafe pointers and an `uint` (which, like `unsigned long`, is a
116-
machine-word-sized type).
107+
snappy_compress(psrc, srclen, pdst, &mut dstlen);
108+
vec::raw::set_len(&mut dst, dstlen as uint);
109+
dst
110+
}
111+
}
112+
~~~~
117113

118-
The standard library provides various functions to create unsafe pointers,
119-
such as those in `core::cast`. Most of these functions have `unsafe` in their
120-
name. You can dereference an unsafe pointer with the `*` operator, but use
121-
caution: unlike Rust's other pointer types, unsafe pointers are completely
122-
unmanaged, so they might point at invalid memory, or be null pointers.
114+
Decompression is similar, because snappy stores the uncompressed size as part of the compression
115+
format and `snappy_uncompressed_length` will retrieve the exact buffer size required.
123116

124-
# Unsafe blocks
117+
~~~~ {.xfail-test}
118+
pub fn uncompress(src: &[u8]) -> Option<~[u8]> {
119+
unsafe {
120+
let srclen = src.len() as size_t;
121+
let psrc = vec::raw::to_ptr(src);
125122
126-
The `sha1` function is the most obscure part of the program.
123+
let mut dstlen: size_t = 0;
124+
snappy_uncompressed_length(psrc, srclen, &mut dstlen);
127125
128-
~~~~
129-
# pub mod crypto {
130-
# pub fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out }
131-
# }
132-
# fn as_hex(data: ~[u8]) -> ~str { ~"hi" }
133-
fn sha1(data: ~str) -> ~str {
134-
unsafe {
135-
let bytes = str::to_bytes(data);
136-
let hash = crypto::SHA1(vec::raw::to_ptr(bytes),
137-
vec::len(bytes), ptr::null());
138-
return as_hex(vec::from_buf(hash, 20));
126+
let mut dst = vec::with_capacity(dstlen as uint);
127+
let pdst = vec::raw::to_mut_ptr(dst);
128+
129+
if snappy_uncompress(psrc, srclen, pdst, &mut dstlen) == 0 {
130+
vec::raw::set_len(&mut dst, dstlen as uint);
131+
Some(dst)
132+
} else {
133+
None // SNAPPY_INVALID_INPUT
134+
}
139135
}
140136
}
141137
~~~~
142138

143-
First, what does the `unsafe` keyword at the top of the function
144-
mean? `unsafe` is a block modifier—it declares the block following it
145-
to be known to be unsafe.
139+
For reference, the examples used here are also available as an [library on
140+
GitHub](https://github.com/thestinger/rust-snappy).
146141

147-
Some operations, like dereferencing unsafe pointers or calling
148-
functions that have been marked unsafe, are only allowed inside unsafe
149-
blocks. With the `unsafe` keyword, you're telling the compiler 'I know
150-
what I'm doing'. The main motivation for such an annotation is that
151-
when you have a memory error (and you will, if you're using unsafe
152-
constructs), you have some idea where to look—it will most likely be
153-
caused by some unsafe code.
142+
# Linking
154143

155-
Unsafe blocks isolate unsafety. Unsafe functions, on the other hand,
156-
advertise it to the world. An unsafe function is written like this:
157-
158-
~~~~
159-
unsafe fn kaboom() { ~"I'm harmless!"; }
160-
~~~~
144+
In addition to the `#[link_args]` attribute for explicitly passing arguments to the linker, an
145+
`extern mod` block will pass `-lmodname` to the linker by default unless it has a `#[nolink]`
146+
attribute applied.
161147

162-
This function can only be called from an `unsafe` block or another
163-
`unsafe` function.
164-
165-
# Pointer fiddling
148+
# Unsafe blocks
166149

167-
The standard library defines a number of helper functions for dealing
168-
with unsafe data, casting between types, and generally subverting
169-
Rust's safety mechanisms.
150+
Some operations, like dereferencing unsafe pointers or calling functions that have been marked
151+
unsafe are only allowed inside unsafe blocks. Unsafe blocks isolate unsafety and are a promise to
152+
the compiler that the unsafety does not leak out of the block.
170153

171-
Let's look at our `sha1` function again.
154+
Unsafe functions, on the other hand, advertise it to the world. An unsafe function is written like
155+
this:
172156

173157
~~~~
174-
# pub mod crypto {
175-
# pub fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out }
176-
# }
177-
# fn as_hex(data: ~[u8]) -> ~str { ~"hi" }
178-
# fn x(data: ~str) -> ~str {
179-
# unsafe {
180-
let bytes = str::to_bytes(data);
181-
let hash = crypto::SHA1(vec::raw::to_ptr(bytes),
182-
vec::len(bytes), ptr::null());
183-
return as_hex(vec::from_buf(hash, 20));
184-
# }
185-
# }
158+
unsafe fn kaboom(ptr: *int) -> int { *ptr }
186159
~~~~
187160

188-
The `str::to_bytes` function is perfectly safe: it converts a string to a
189-
`~[u8]`. The program then feeds this byte array to `vec::raw::to_ptr`, which
190-
returns an unsafe pointer to its contents.
191-
192-
This pointer will become invalid at the end of the scope in which the vector
193-
it points to (`bytes`) is valid, so you should be very careful how you use
194-
it. In this case, the local variable `bytes` outlives the pointer, so we're
195-
good.
196-
197-
Passing a null pointer as the third argument to `SHA1` makes it use a
198-
static buffer, and thus save us the effort of allocating memory
199-
ourselves. `ptr::null` is a generic function that, in this case, returns an
200-
unsafe null pointer of type `*u8`. (Rust generics are awesome
201-
like that: they can take the right form depending on the type that they
202-
are expected to return.)
203-
204-
Finally, `vec::from_buf` builds up a new `~[u8]` from the
205-
unsafe pointer that `SHA1` returned. SHA1 digests are always
206-
twenty bytes long, so we can pass `20` for the length of the new
207-
vector.
208-
209-
# Passing structures
161+
This function can only be called from an `unsafe` block or another `unsafe` function.
210162

211-
C functions often take pointers to structs as arguments. Since Rust
212-
`struct`s are binary-compatible with C structs, Rust programs can call
213-
such functions directly.
163+
# Foreign calling conventions
214164

215-
This program uses the POSIX function `gettimeofday` to get a
216-
microsecond-resolution timer.
165+
Most foreign code exposes a C ABI, and Rust uses the platform's C calling convention by default when
166+
calling foreign functions. Some foreign functions, most notably the Windows API, use other calling
167+
conventions. Rust provides the `abi` attribute as a way to hint to the compiler which calling
168+
convention to use:
217169

218170
~~~~
219-
extern mod std;
220-
use core::libc::c_ulonglong;
221-
222-
struct timeval {
223-
tv_sec: c_ulonglong,
224-
tv_usec: c_ulonglong
171+
#[cfg(target_os = "win32")]
172+
#[abi = "stdcall"]
173+
extern mod kernel32 {
174+
fn SetEnvironmentVariableA(n: *u8, v: *u8) -> int;
225175
}
176+
~~~~
226177

227-
#[nolink]
228-
extern mod lib_c {
229-
fn gettimeofday(tv: *mut timeval, tz: *()) -> i32;
230-
}
231-
fn unix_time_in_microseconds() -> u64 {
232-
unsafe {
233-
let mut x = timeval {
234-
tv_sec: 0 as c_ulonglong,
235-
tv_usec: 0 as c_ulonglong
236-
};
237-
lib_c::gettimeofday(&mut x, ptr::null());
238-
return (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64);
239-
}
240-
}
178+
The `abi` attribute applies to a foreign module (it cannot be applied to a single function within a
179+
module), and must be either `"cdecl"` or `"stdcall"`. The compiler may eventually support other
180+
calling conventions.
241181

242-
# fn main() { assert!(fmt!("%?", unix_time_in_microseconds()) != ~""); }
243-
~~~~
182+
# Interoperability with foreign code
244183

245-
The `#[nolink]` attribute indicates that there's no foreign library to
246-
link in. The standard C library is already linked with Rust programs.
184+
Rust guarantees that the layout of a `struct` is compatible with the platform's representation in C.
185+
A `#[packed]` attribute is available, which will lay out the struct members without padding.
186+
However, there are currently no guarantees about the layout of an `enum`.
247187

248-
In C, a `timeval` is a struct with two 32-bit integer fields. Thus, we
249-
define a `struct` type with the same contents, and declare
250-
`gettimeofday` to take a pointer to such a `struct`.
188+
Rust's owned and managed boxes use non-nullable pointers as handles which point to the contained
189+
object. However, they should not be manually because they are managed by internal allocators.
190+
Borrowed pointers can safely be assumed to be non-nullable pointers directly to the type. However,
191+
breaking the borrow checking or mutability rules is not guaranteed to be safe, so prefer using raw
192+
pointers (`*`) if that's needed because the compiler can't make as many assumptions about them.
251193

252-
This program does not use the second argument to `gettimeofday` (the time
253-
zone), so the `extern mod` declaration for it simply declares this argument
254-
to be a pointer to the unit type (written `()`). Since all null pointers have
255-
the same representation regardless of their referent type, this is safe.
194+
Vectors and strings share the same basic memory layout, and utilities are available in the `vec` and
195+
`str` modules for working with C APIs. Strings are terminated with `\0` for interoperability with C,
196+
but it should not be assumed because a slice will not always be nul-terminated. Instead, the
197+
`str::as_c_str` function should be used.
256198

199+
The standard library includes type aliases and function definitions for the C standard library in
200+
the `libc` module, and Rust links against `libc` and `libm` by default.

0 commit comments

Comments
 (0)