Skip to content

Check for either escaped or unescaped wiki filenames #8408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 9, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion routers/repo/wiki.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ package repo
import (
"fmt"
"io/ioutil"
"net/url"
"path/filepath"
"strings"

Expand Down Expand Up @@ -65,11 +66,12 @@ type PageMeta struct {
// findEntryForFile finds the tree entry for a target filepath.
func findEntryForFile(commit *git.Commit, target string) (*git.TreeEntry, error) {
entries, err := commit.ListEntries()
unescapedTarget, _ := url.QueryUnescape(target)
if err != nil {
return nil, err
}
for _, entry := range entries {
if entry.IsRegular() && entry.Name() == target {
if entry.IsRegular() && (entry.Name() == target || entry.Name() == unescapedTarget) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled. In Theory this should break made-up cases where both c%C3%A1mara and cámara exist, but I couldn't make it fail.

To handle this properly we should probably use only unescapedTarget (like in your previous PR) and add a migration step. However I don't think a migration is a very good idea because many things can go wrong for such an edge case.

All in all, If possible, I'd change this to check if there are any entries that matches target. Only if there is none then I'd resort to the one matching unescapedTarget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I'm very much still learning Go, but if I understand correctly, a case where both c%C3%A1mara and cámara exists wouldn't fail because the logic stops executing after it gets a true for the comparison against target. The check against unescapedTarget would never even happen in that case. So it should already be performing the way you describe. I think...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a for loop in a range (array), it will sequencially check values like:

target := "cámara"
unscapedTarget := "cámara" // "cámara" unescaped == "cámara"

aimara      ---> matches target? no; matches unscapedTarget? no
bamara      ---> matches target? no; matches unscapedTarget? no
c%C3%A1mara ---> matches target? no; matches unscapedTarget? yes --> exit from function
cámara      ---> never checked; loop stopped in previous value

Copy link
Contributor Author

@Tekaoh Tekaoh Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I follow your example... Wouldn't it be more like this?

target := "c%C3%A1mara"
unscapedTarget := "cámara" // "c%C3%A1mara" unescaped == "cámara"

aimara        ---> matches target? no; matches unscapedTarget? no
bamara        ---> matches target? no; matches unscapedTarget? no
c%C3%A1mara   ---> matches target? yes! if statement is true. exit from function; matches unscapedTarget? never checked. The if statement already resolved true
cámara        ---> never checked; loop stopped in previous value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original (imported) wiki has two pages cámara and c%C3%A1mara, what are the URLs for those? According to online encoders those are:

page1 :="c%C3%A1mara"
page2 :="c%25C3%25A1mara"

But then you have:

page1 :="c%C3%A1mara"
unescaped1 :="cámara"
page2 :="c%25C3%25A1mara"
unescaped2 :="c%C3%A1mara"

As you can see, unescaped2 == page1. So they look the same and you have a name collision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Now I know why I couldn't reproduce the problem: I had to import a wiki, not edit the wiki from Gitea's editor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in that case, one of those pages would be inaccessible depending on which one comes up last in the for loop. But no matter how you write your logic, I'm pretty sure that would always be the case as long as you're effectively checking against two possible filenames. I suppose I would just assume that users aren't saving their wiki pages with machine-readable sequences of characters. Sure, doing so would cause another bug. But it seems to me like that's a lot less likely to ever be an issue than the current bug.

Copy link
Member

@guillep2k guillep2k Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, If possible, I'd change this to check if there are any entries that matches target. Only if there is none then I'd resort to the one matching unescapedTarget.

With a logic like this, you leave the window open for the user to fix things in case of name collision (e.g. renaming the file or changing the links).

It's actually pretty easy:

	entries, err := commit.ListEntries()
	if err != nil {
		return nil, err
	}
	// The longest name should be checked first
	for _, entry := range entries {
		if entry.IsRegular() && entry.Name() == target {
			return entry, nil
		}
	}
	// Then the unescaped, shortest alternative
	var unescapedTarget string
	if unescapedTarget, err = url.QueryUnescape(target); err != nil {
		return nil, err
	}
	for _, entry := range entries {
		if entry.IsRegular() && entry.Name() == unescapedTarget {
			return entry, nil
		}
	}
	return nil, nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the error from QueryUnescape() should not be ignored.

return entry, nil
}
}
Expand Down