|
1 | 1 | # Interacting with foreign code
|
2 | 2 |
|
3 |
| -FIXME to be written |
| 3 | +On of Rust's aims, as a system programming language, is to |
| 4 | +interoperate well with C code. |
| 5 | + |
| 6 | +We'll start with an example. It's a bit bigger than usual, and |
| 7 | +contains a number of new concepts. We'll go over it one piece at a |
| 8 | +time. |
| 9 | + |
| 10 | +This is a program that uses OpenSSL's `SHA1` function to compute the |
| 11 | +hash of its first command-line argument, which it then converts to a |
| 12 | +hexadecimal string and prints to standard output. If you have the |
| 13 | +OpenSSL libraries installed, it should 'just work'. |
| 14 | + |
| 15 | + use std; |
| 16 | + import std::{vec, str}; |
| 17 | + |
| 18 | + native "cdecl" mod ssl { |
| 19 | + fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; |
| 20 | + } |
| 21 | + |
| 22 | + fn as_hex(data: [u8]) -> str { |
| 23 | + let acc = ""; |
| 24 | + for byte in data { acc += #fmt("%02x", byte as uint); } |
| 25 | + ret acc; |
| 26 | + } |
| 27 | + |
| 28 | + fn sha1(data: str) -> str unsafe { |
| 29 | + let bytes = str::bytes(data); |
| 30 | + let hash = ssl::SHA1(vec::unsafe::to_ptr(bytes), |
| 31 | + vec::len(bytes), std::ptr::null()); |
| 32 | + ret as_hex(vec::unsafe::from_buf(hash, 20u)); |
| 33 | + } |
| 34 | + |
| 35 | + fn main(args: [str]) { |
| 36 | + std::io::println(sha1(args[1])); |
| 37 | + } |
| 38 | + |
| 39 | +## Native modules |
| 40 | + |
| 41 | +Before we can call `SHA1`, we have to declare it. That is what this |
| 42 | +part of the program is responsible for: |
| 43 | + |
| 44 | + native "cdecl" mod ssl { |
| 45 | + fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; |
| 46 | + } |
| 47 | + |
| 48 | +A `native` module declaration tells the compiler that the program |
| 49 | +should be linked with a library by that name, and that the given list |
| 50 | +of functions are available in that library. |
| 51 | + |
| 52 | +In this case, it'll change the name `ssl` to a shared library name in |
| 53 | +a platform-specific way (`libssl.so` on Linux, for example), and link |
| 54 | +that in. If you want the module to have a different name from the |
| 55 | +actual library, you can say `native "cdecl" mod something = "ssl" { |
| 56 | +... }`. |
| 57 | + |
| 58 | +The `"cdecl"` word indicates the calling convention to use for |
| 59 | +functions in this module. Most C libraries use cdecl as their calling |
| 60 | +convention. You can also specify `"x86stdcall"` to use stdcall |
| 61 | +instead. |
| 62 | + |
| 63 | +FIXME: Mention c-stack variants? Are they going to change? |
| 64 | + |
| 65 | +## Unsafe pointers |
| 66 | + |
| 67 | +The native `SHA1` function is declared to take three arguments, and |
| 68 | +return a pointer. |
| 69 | + |
| 70 | + fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8; |
| 71 | + |
| 72 | +When declaring the argument types to a foreign function, the Rust |
| 73 | +compiler has no way to check whether your declaration is correct, so |
| 74 | +you have to be careful. If you get the number or types of the |
| 75 | +arguments wrong, you're likely to get a segmentation fault. Or, |
| 76 | +probably even worse, your code will work on one platform, but break on |
| 77 | +another. |
| 78 | + |
| 79 | +In this case, `SHA1` is defined as taking two `unsigned char*` |
| 80 | +arguments and one `unsigned long`. The rust equivalents are `*u8` |
| 81 | +unsafe pointers and an `uint` (which, like `unsigned long`, is a |
| 82 | +machine-word-sized type). |
| 83 | + |
| 84 | +Unsafe pointers can be created through various functions in the |
| 85 | +standard lib, usually with `unsafe` somewhere in their name. You can |
| 86 | +dereference an unsafe pointer with `*` operator, but use |
| 87 | +caution—unlike Rust's other pointer types, unsafe pointers are |
| 88 | +completely unmanaged, so they might point at invalid memory, or be |
| 89 | +null pointers. |
| 90 | + |
| 91 | +## Unsafe blocks |
| 92 | + |
| 93 | +The `sha1` function is the most obscure part of the program. |
| 94 | + |
| 95 | + fn sha1(data: str) -> str unsafe { |
| 96 | + let bytes = str::bytes(data); |
| 97 | + let hash = ssl::SHA1(vec::unsafe::to_ptr(bytes), |
| 98 | + vec::len(bytes), std::ptr::null()); |
| 99 | + ret as_hex(vec::unsafe::from_buf(hash, 20u)); |
| 100 | + } |
| 101 | + |
| 102 | +Firstly, what does the `unsafe` keyword at the top of the function |
| 103 | +mean? `unsafe` is a block modifier—it declares the block following it |
| 104 | +to be known to be unsafe. |
| 105 | + |
| 106 | +Some operations, like dereferencing unsafe pointers or calling |
| 107 | +functions that have been marked unsafe, are only allowed inside unsafe |
| 108 | +blocks. With the `unsafe` keyword, you're telling the compiler 'I know |
| 109 | +what I'm doing'. The main motivation for such an annotation is that |
| 110 | +when you have a memory error (and you will, if you're using unsafe |
| 111 | +constructs), you have some idea where to look—it will most likely be |
| 112 | +caused by some unsafe code. |
| 113 | + |
| 114 | +Unsafe blocks isolate unsafety. Unsafe functions, on the other hand, |
| 115 | +advertise it to the world. An unsafe function is written like this: |
| 116 | + |
| 117 | + unsafe fn kaboom() { log "I'm harmless!"; } |
| 118 | + |
| 119 | +This function can only be called from an unsafe block or another |
| 120 | +unsafe function. |
| 121 | + |
| 122 | +## Pointer fiddling |
| 123 | + |
| 124 | +The standard library defines a number of helper functions for dealing |
| 125 | +with unsafe data, casting between types, and generally subverting |
| 126 | +Rust's safety mechanisms. |
| 127 | + |
| 128 | +Let's look at our `sha1` function again. |
| 129 | + |
| 130 | + let bytes = str::bytes(data); |
| 131 | + let hash = ssl::SHA1(vec::unsafe::to_ptr(bytes), |
| 132 | + vec::len(bytes), std::ptr::null()); |
| 133 | + ret as_hex(vec::unsafe::from_buf(hash, 20u)); |
| 134 | + |
| 135 | +The `str::bytes` function is perfectly safe, it converts a string to |
| 136 | +an `[u8]`. This byte array is then fed to `vec::unsafe::to_ptr`, which |
| 137 | +returns an unsafe pointer to its contents. |
| 138 | + |
| 139 | +This pointer will become invalid as soon as the vector it points into |
| 140 | +is cleaned up, so you should be very careful how you use it. In this |
| 141 | +case, the local variable `bytes` outlives the pointer, so we're good. |
| 142 | + |
| 143 | +Passing a null pointer as third argument to `SHA1` causes it to use a |
| 144 | +static buffer, and thus save us the effort of allocating memory |
| 145 | +ourselves. `ptr::null` is a generic function that will return an |
| 146 | +unsafe null pointer of the correct type (Rust generics are awesome |
| 147 | +like that—they can take the right form depending on the type that they |
| 148 | +are expected to return). |
| 149 | + |
| 150 | +Finally, `vec::unsafe::from_buf` builds up a new `[u8]` from the |
| 151 | +unsafe pointer that was returned by `SHA1`. SHA1 digests are always |
| 152 | +twenty bytes long, so we can pass `20u` for the length of the new |
| 153 | +vector. |
| 154 | + |
| 155 | +## Passing structures |
| 156 | + |
| 157 | +C functions often take pointers to structs as arguments. Since Rust |
| 158 | +records are binary-compatible with C structs, Rust programs can call |
| 159 | +such functions directly. |
| 160 | + |
| 161 | +This program uses the Posix function `gettimeofday` to get a |
| 162 | +microsecond-resolution timer. |
| 163 | + |
| 164 | + use std; |
| 165 | + type timeval = {tv_sec: u32, tv_usec: u32}; |
| 166 | + native "cdecl" mod libc = "" { |
| 167 | + fn gettimeofday(tv: *mutable timeval, tz: *()) -> i32; |
| 168 | + } |
| 169 | + fn unix_time_in_microseconds() -> u64 unsafe { |
| 170 | + let x = {tv_sec: 0u32, tv_usec: 0u32}; |
| 171 | + libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null()); |
| 172 | + ret (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64); |
| 173 | + } |
| 174 | + |
| 175 | +The `libc = ""` sets the name of the native module to the empty string |
| 176 | +to prevent the rust compiler from trying to link it. The standard C |
| 177 | +library is already linked with Rust programs. |
| 178 | + |
| 179 | +A `timeval`, in C, is a struct with two 32-bit integers. Thus, we |
| 180 | +define a record type with the same contents, and declare |
| 181 | +`gettimeofday` to take a pointer to such a record. |
| 182 | + |
| 183 | +The second argument to `gettimeofday` (the time zone) is not used by |
| 184 | +this program, so it simply declares it to be a pointer to the nil |
| 185 | +type. Since null pointer look the same, no matter which type they are |
| 186 | +supposed to point at, this is safe. |
0 commit comments