Skip to content

Commit 2f41a85

Browse files
committed
auto merge of #13431 : lifthrasiir/rust/rustdoc-smaller-index, r=alexcrichton
This is a series of inter-related commits which depend on #13402 (Prune the paths that do not appear in the index). Please consider this as an early review request; I'll rebase this when the parent PR get merged and rebase is required. ---- This PR aims at reducing the search index without removing the actual information. In my measurement with both library and compiler docs, the search index is 52% smaller before gzipped, and 16% smaller after gzipped: ``` 1719473 search-index-old.js 1503299 search-index.js (after #13402, 13% gain) 724955 search-index-new.js (after this PR, 52% gain w.r.t. #13402) 262711 search-index-old.js.gz 214205 search-index.js.gz (after #13402, 18.5% gain) 179396 search-index-new.js.gz (after this PR, 16% gain w.r.t. #13402) ``` Both the uncompressed and compressed size of the search index have been accounted. While the former would be less relevant when #12597 (Web site should be transferring data compressed) is resolved, the uncompressed index will be around for a while anyway and directly affects the UX of docs. Moreover, LZ77 (and gzip) can only remove *some* repeated strings (since its search window is limited in size), so optimizing for the uncompressed size often has a positive effect on the compressed size as well. Each commit represents the following incremental improvements, in the order: 1. Parent paths were referred by its AST `NodeId`, which tends to be large. We don't need the actual node ID, so we remap them to the smaller sequential numbers. This also means that the list of paths can be a flat array instead of an object. 2. We remap each item type to small predefined numbers. This is strictly intended to reduce the uncompressed size of the search index. 3. We use arrays instead of objects and reconstruct the original objects in the JavaScript code. Since this removes a lot of boilerplates, this affects both the uncompressed and compressed size. 4. (I've found that a centralized `searchIndex` is easier to handle in JS, so I shot one global variable down.) 5. Finally, the repeated paths in the consecutive items are omitted (replaced by an empty string). This also greatly affects both the uncompressed and compressed size. There had been several unsuccessful attempts to reduce the search index. Especially, I explicitly avoided complex optimizations like encoding paths in a compressed form, and only applied the optimizations when it had a substantial gain compared to the changes. Also, while I've tried to be careful, the lack of proper (non-smoke) tests makes me a bit worry; any advice on testing the search indices would be appreciated.
2 parents e2e7548 + 8f5d71c commit 2f41a85

File tree

5 files changed

+229
-76
lines changed

5 files changed

+229
-76
lines changed

src/librustdoc/html/format.rs

+10-8
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright 2013 The Rust Project Developers. See the COPYRIGHT
1+
// Copyright 2013-2014 The Rust Project Developers. See the COPYRIGHT
22
// file at the top-level directory of this distribution and at
33
// http://rust-lang.org/COPYRIGHT.
44
//
@@ -24,6 +24,8 @@ use syntax::ast;
2424
use syntax::ast_util;
2525

2626
use clean;
27+
use html::item_type;
28+
use html::item_type::ItemType;
2729
use html::render;
2830
use html::render::{cache_key, current_location_key};
2931

@@ -172,17 +174,17 @@ fn external_path(w: &mut io::Writer, p: &clean::Path, print_all: bool,
172174
},
173175
|_cache| {
174176
Some((Vec::from_slice(fqn), match kind {
175-
clean::TypeStruct => "struct",
176-
clean::TypeEnum => "enum",
177-
clean::TypeFunction => "fn",
178-
clean::TypeTrait => "trait",
177+
clean::TypeStruct => item_type::Struct,
178+
clean::TypeEnum => item_type::Enum,
179+
clean::TypeFunction => item_type::Function,
180+
clean::TypeTrait => item_type::Trait,
179181
}))
180182
})
181183
}
182184

183185
fn path(w: &mut io::Writer, path: &clean::Path, print_all: bool,
184186
root: |&render::Cache, &[~str]| -> Option<~str>,
185-
info: |&render::Cache| -> Option<(Vec<~str> , &'static str)>)
187+
info: |&render::Cache| -> Option<(Vec<~str> , ItemType)>)
186188
-> fmt::Result
187189
{
188190
// The generics will get written to both the title and link
@@ -252,12 +254,12 @@ fn path(w: &mut io::Writer, path: &clean::Path, print_all: bool,
252254
url.push_str("/");
253255
}
254256
match shortty {
255-
"mod" => {
257+
item_type::Module => {
256258
url.push_str(*fqp.last().unwrap());
257259
url.push_str("/index.html");
258260
}
259261
_ => {
260-
url.push_str(shortty);
262+
url.push_str(shortty.to_static_str());
261263
url.push_str(".");
262264
url.push_str(*fqp.last().unwrap());
263265
url.push_str(".html");

src/librustdoc/html/item_type.rs

+97
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
// Copyright 2014 The Rust Project Developers. See the COPYRIGHT
2+
// file at the top-level directory of this distribution and at
3+
// http://rust-lang.org/COPYRIGHT.
4+
//
5+
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6+
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7+
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8+
// option. This file may not be copied, modified, or distributed
9+
// except according to those terms.
10+
11+
//! Item types.
12+
13+
use std::fmt;
14+
use clean;
15+
16+
/// Item type. Corresponds to `clean::ItemEnum` variants.
17+
///
18+
/// The search index uses item types encoded as smaller numbers which equal to
19+
/// discriminants. JavaScript then is used to decode them into the original value.
20+
/// Consequently, every change to this type should be synchronized to
21+
/// the `itemTypes` mapping table in `static/main.js`.
22+
#[deriving(Eq, Clone)]
23+
pub enum ItemType {
24+
Module = 0,
25+
Struct = 1,
26+
Enum = 2,
27+
Function = 3,
28+
Typedef = 4,
29+
Static = 5,
30+
Trait = 6,
31+
Impl = 7,
32+
ViewItem = 8,
33+
TyMethod = 9,
34+
Method = 10,
35+
StructField = 11,
36+
Variant = 12,
37+
ForeignFunction = 13,
38+
ForeignStatic = 14,
39+
Macro = 15,
40+
}
41+
42+
impl ItemType {
43+
pub fn to_static_str(&self) -> &'static str {
44+
match *self {
45+
Module => "mod",
46+
Struct => "struct",
47+
Enum => "enum",
48+
Function => "fn",
49+
Typedef => "typedef",
50+
Static => "static",
51+
Trait => "trait",
52+
Impl => "impl",
53+
ViewItem => "viewitem",
54+
TyMethod => "tymethod",
55+
Method => "method",
56+
StructField => "structfield",
57+
Variant => "variant",
58+
ForeignFunction => "ffi",
59+
ForeignStatic => "ffs",
60+
Macro => "macro",
61+
}
62+
}
63+
}
64+
65+
impl fmt::Show for ItemType {
66+
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
67+
self.to_static_str().fmt(f)
68+
}
69+
}
70+
71+
impl fmt::Unsigned for ItemType {
72+
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
73+
(*self as uint).fmt(f)
74+
}
75+
}
76+
77+
pub fn shortty(item: &clean::Item) -> ItemType {
78+
match item.inner {
79+
clean::ModuleItem(..) => Module,
80+
clean::StructItem(..) => Struct,
81+
clean::EnumItem(..) => Enum,
82+
clean::FunctionItem(..) => Function,
83+
clean::TypedefItem(..) => Typedef,
84+
clean::StaticItem(..) => Static,
85+
clean::TraitItem(..) => Trait,
86+
clean::ImplItem(..) => Impl,
87+
clean::ViewItemItem(..) => ViewItem,
88+
clean::TyMethodItem(..) => TyMethod,
89+
clean::MethodItem(..) => Method,
90+
clean::StructFieldItem(..) => StructField,
91+
clean::VariantItem(..) => Variant,
92+
clean::ForeignFunctionItem(..) => ForeignFunction,
93+
clean::ForeignStaticItem(..) => ForeignStatic,
94+
clean::MacroItem(..) => Macro,
95+
}
96+
}
97+

src/librustdoc/html/render.rs

+58-53
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ use rustc::util::nodemap::NodeSet;
5252
use clean;
5353
use doctree;
5454
use fold::DocFolder;
55+
use html::item_type;
56+
use html::item_type::{ItemType, shortty};
5557
use html::format::{VisSpace, Method, FnStyleSpace};
5658
use html::layout;
5759
use html::markdown;
@@ -138,7 +140,7 @@ pub struct Cache {
138140
/// URLs when a type is being linked to. External paths are not located in
139141
/// this map because the `External` type itself has all the information
140142
/// necessary.
141-
pub paths: HashMap<ast::NodeId, (Vec<~str> , &'static str)>,
143+
pub paths: HashMap<ast::NodeId, (Vec<~str> , ItemType)>,
142144

143145
/// This map contains information about all known traits of this crate.
144146
/// Implementations of a crate should inherit the documentation of the
@@ -193,7 +195,7 @@ struct Sidebar<'a> { cx: &'a Context, item: &'a clean::Item, }
193195
/// Struct representing one entry in the JS search index. These are all emitted
194196
/// by hand to a large JS file at the end of cache-creation.
195197
struct IndexItem {
196-
ty: &'static str,
198+
ty: ItemType,
197199
name: ~str,
198200
path: ~str,
199201
desc: ~str,
@@ -262,6 +264,9 @@ pub fn run(mut krate: clean::Crate, dst: Path) -> io::IoResult<()> {
262264
});
263265
cache.stack.push(krate.name.clone());
264266
krate = cache.fold_crate(krate);
267+
268+
let mut nodeid_to_pathid = HashMap::new();
269+
let mut pathid_to_nodeid = Vec::new();
265270
{
266271
let Cache { search_index: ref mut index,
267272
orphan_methods: ref meths, paths: ref mut paths, ..} = cache;
@@ -283,48 +288,67 @@ pub fn run(mut krate: clean::Crate, dst: Path) -> io::IoResult<()> {
283288
}
284289
};
285290

286-
// Prune the paths that do not appear in the index.
287-
let mut unseen: HashSet<ast::NodeId> = paths.keys().map(|&id| id).collect();
291+
// Reduce `NodeId` in paths into smaller sequential numbers,
292+
// and prune the paths that do not appear in the index.
288293
for item in index.iter() {
289294
match item.parent {
290-
Some(ref pid) => { unseen.remove(pid); }
295+
Some(nodeid) => {
296+
if !nodeid_to_pathid.contains_key(&nodeid) {
297+
let pathid = pathid_to_nodeid.len();
298+
nodeid_to_pathid.insert(nodeid, pathid);
299+
pathid_to_nodeid.push(nodeid);
300+
}
301+
}
291302
None => {}
292303
}
293304
}
294-
for pid in unseen.iter() {
295-
paths.remove(pid);
296-
}
305+
assert_eq!(nodeid_to_pathid.len(), pathid_to_nodeid.len());
297306
}
298307

299308
// Publish the search index
300309
let index = {
301310
let mut w = MemWriter::new();
302-
try!(write!(&mut w, "searchIndex['{}'] = [", krate.name));
311+
try!(write!(&mut w, r#"searchIndex['{}'] = \{"items":["#, krate.name));
312+
313+
let mut lastpath = ~"";
303314
for (i, item) in cache.search_index.iter().enumerate() {
315+
// Omit the path if it is same to that of the prior item.
316+
let path;
317+
if lastpath == item.path {
318+
path = "";
319+
} else {
320+
lastpath = item.path.clone();
321+
path = item.path.as_slice();
322+
};
323+
304324
if i > 0 {
305325
try!(write!(&mut w, ","));
306326
}
307-
try!(write!(&mut w, "\\{ty:\"{}\",name:\"{}\",path:\"{}\",desc:{}",
308-
item.ty, item.name, item.path,
327+
try!(write!(&mut w, r#"[{:u},"{}","{}",{}"#,
328+
item.ty, item.name, path,
309329
item.desc.to_json().to_str()));
310330
match item.parent {
311-
Some(id) => {
312-
try!(write!(&mut w, ",parent:'{}'", id));
331+
Some(nodeid) => {
332+
let pathid = *nodeid_to_pathid.find(&nodeid).unwrap();
333+
try!(write!(&mut w, ",{}", pathid));
313334
}
314335
None => {}
315336
}
316-
try!(write!(&mut w, "\\}"));
337+
try!(write!(&mut w, "]"));
317338
}
318-
try!(write!(&mut w, "];"));
319-
try!(write!(&mut w, "allPaths['{}'] = \\{", krate.name));
320-
for (i, (&id, &(ref fqp, short))) in cache.paths.iter().enumerate() {
339+
340+
try!(write!(&mut w, r#"],"paths":["#));
341+
342+
for (i, &nodeid) in pathid_to_nodeid.iter().enumerate() {
343+
let &(ref fqp, short) = cache.paths.find(&nodeid).unwrap();
321344
if i > 0 {
322345
try!(write!(&mut w, ","));
323346
}
324-
try!(write!(&mut w, "'{}':\\{type:'{}',name:'{}'\\}",
325-
id, short, *fqp.last().unwrap()));
347+
try!(write!(&mut w, r#"[{:u},"{}"]"#,
348+
short, *fqp.last().unwrap()));
326349
}
327-
try!(write!(&mut w, "\\};"));
350+
351+
try!(write!(&mut w, r"]\};"));
328352

329353
str::from_utf8(w.unwrap().as_slice()).unwrap().to_owned()
330354
};
@@ -360,7 +384,7 @@ pub fn run(mut krate: clean::Crate, dst: Path) -> io::IoResult<()> {
360384
}
361385
}
362386
let mut w = try!(File::create(&dst));
363-
try!(writeln!(&mut w, r"var searchIndex = \{\}; var allPaths = \{\};"));
387+
try!(writeln!(&mut w, r"var searchIndex = \{\};"));
364388
for index in all_indexes.iter() {
365389
try!(writeln!(&mut w, "{}", *index));
366390
}
@@ -613,12 +637,13 @@ impl DocFolder for Cache {
613637
} else {
614638
let last = self.parent_stack.last().unwrap();
615639
let path = match self.paths.find(last) {
616-
Some(&(_, "trait")) =>
640+
Some(&(_, item_type::Trait)) =>
617641
Some(self.stack.slice_to(self.stack.len() - 1)),
618642
// The current stack not necessarily has correlation for
619643
// where the type was defined. On the other hand,
620644
// `paths` always has the right information if present.
621-
Some(&(ref fqp, "struct")) | Some(&(ref fqp, "enum")) =>
645+
Some(&(ref fqp, item_type::Struct)) |
646+
Some(&(ref fqp, item_type::Enum)) =>
622647
Some(fqp.slice_to(fqp.len() - 1)),
623648
Some(..) => Some(self.stack.as_slice()),
624649
None => None
@@ -678,7 +703,7 @@ impl DocFolder for Cache {
678703
clean::VariantItem(..) => {
679704
let mut stack = self.stack.clone();
680705
stack.pop();
681-
self.paths.insert(item.id, (stack, "enum"));
706+
self.paths.insert(item.id, (stack, item_type::Enum));
682707
}
683708
_ => {}
684709
}
@@ -836,7 +861,7 @@ impl Context {
836861
}
837862
title.push_str(" - Rust");
838863
let page = layout::Page {
839-
ty: shortty(it),
864+
ty: shortty(it).to_static_str(),
840865
root_path: cx.root_path.as_slice(),
841866
title: title.as_slice(),
842867
};
@@ -890,27 +915,6 @@ impl Context {
890915
}
891916
}
892917

893-
fn shortty(item: &clean::Item) -> &'static str {
894-
match item.inner {
895-
clean::ModuleItem(..) => "mod",
896-
clean::StructItem(..) => "struct",
897-
clean::EnumItem(..) => "enum",
898-
clean::FunctionItem(..) => "fn",
899-
clean::TypedefItem(..) => "typedef",
900-
clean::StaticItem(..) => "static",
901-
clean::TraitItem(..) => "trait",
902-
clean::ImplItem(..) => "impl",
903-
clean::ViewItemItem(..) => "viewitem",
904-
clean::TyMethodItem(..) => "tymethod",
905-
clean::MethodItem(..) => "method",
906-
clean::StructFieldItem(..) => "structfield",
907-
clean::VariantItem(..) => "variant",
908-
clean::ForeignFunctionItem(..) => "ffi",
909-
clean::ForeignStaticItem(..) => "ffs",
910-
clean::MacroItem(..) => "macro",
911-
}
912-
}
913-
914918
impl<'a> Item<'a> {
915919
fn ismodule(&self) -> bool {
916920
match self.item.inner {
@@ -1000,7 +1004,7 @@ impl<'a> fmt::Show for Item<'a> {
10001004
fn item_path(item: &clean::Item) -> ~str {
10011005
match item.inner {
10021006
clean::ModuleItem(..) => *item.name.get_ref() + "/index.html",
1003-
_ => shortty(item) + "." + *item.name.get_ref() + ".html"
1007+
_ => shortty(item).to_static_str() + "." + *item.name.get_ref() + ".html"
10041008
}
10051009
}
10061010

@@ -1086,13 +1090,13 @@ fn item_module(w: &mut Writer, cx: &Context,
10861090
indices.sort_by(|&i1, &i2| cmp(&items[i1], &items[i2], i1, i2));
10871091

10881092
debug!("{:?}", indices);
1089-
let mut curty = "";
1093+
let mut curty = None;
10901094
for &idx in indices.iter() {
10911095
let myitem = &items[idx];
10921096

1093-
let myty = shortty(myitem);
1097+
let myty = Some(shortty(myitem));
10941098
if myty != curty {
1095-
if curty != "" {
1099+
if curty.is_some() {
10961100
try!(write!(w, "</table>"));
10971101
}
10981102
curty = myty;
@@ -1695,8 +1699,9 @@ impl<'a> fmt::Show for Sidebar<'a> {
16951699
};
16961700
try!(write!(w, "<div class='block {}'><h2>{}</h2>", short, longty));
16971701
for item in items.iter() {
1702+
let curty = shortty(cur).to_static_str();
16981703
let class = if cur.name.get_ref() == item &&
1699-
short == shortty(cur) { "current" } else { "" };
1704+
short == curty { "current" } else { "" };
17001705
try!(write!(w, "<a class='{ty} {class}' href='{curty, select,
17011706
mod{../}
17021707
other{}
@@ -1707,7 +1712,7 @@ impl<'a> fmt::Show for Sidebar<'a> {
17071712
ty = short,
17081713
tysel = short,
17091714
class = class,
1710-
curty = shortty(cur),
1715+
curty = curty,
17111716
name = item.as_slice()));
17121717
}
17131718
try!(write!(w, "</div>"));
@@ -1726,7 +1731,7 @@ impl<'a> fmt::Show for Sidebar<'a> {
17261731
fn build_sidebar(m: &clean::Module) -> HashMap<~str, Vec<~str> > {
17271732
let mut map = HashMap::new();
17281733
for item in m.items.iter() {
1729-
let short = shortty(item);
1734+
let short = shortty(item).to_static_str();
17301735
let myname = match item.name {
17311736
None => continue,
17321737
Some(ref s) => s.to_owned(),

0 commit comments

Comments
 (0)