[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Re: advice for the undergrad (OL)

On Sun, Jul 06, 2008 at 09:52:31AM -0700, T. Michael Keesey scripsit:
> On Sun, Jul 6, 2008 at 6:53 AM, Graydon <oak@uniserve.com> wrote:
> > Having taught them [regular expressions], I'm not sure I'll go with
> > 'relatively short amount of time'; this is one of those 'simple to
> > learn, very hard to master' sorts of things.
> Yes, but we're talking about fluency, not mastery.
[snip chess example] 
> The syntax of regular expressions can be fit onto a few pages. 

Only if you already know what it's talking about.

Just explaining grouping and back references takes a couple of pages.
Zero width or possessive matches are worth another couple-three each,

> After thoroughly digesting those pages, someone should be able to look
> at a regular expression and figure out what it means (although of
> course complex expressions may take longer -- true for experts as
> well) and create some expressions on their own. Does that make them a
> regex master? No, that takes actual experience using regex. But are
> they fluent? Sure, why not?

That's -- to my mind -- a fairly low bar for 'fluent'.  That's more like
'how many moneys for meat?'

> > Data representation <> programming language, at least to my possibly
> > pedantic way of thinking.  So I kvetched. :)
> But I was specifically talking about computer languages *other* than
> programming languages. Sorry if that was unclear.

That was clear; I just wouldn't call these languages, as such; they're
data formats.

> (And actually, some of those, like XML and some extended forms of SQL,
> can be used for programming. The line's a bit blurry sometimes.)

Well... whether you can implement what kind of state machine isn't,
generally, blurry, though that doesn't help with recognizing how the
thing is actually used.

Some languages like XSL (which happens to be expressed in XML) are
Turing complete, but that doesn't mean that XML itself is.

SQL itself isn't Turning complete; some of its extensions may be.

> >> I dunno, knowing the basics of XML, for example, can help an awful lot
> >> for many tasks.
> >
> > XML has no basics.
> <?xml version="1.0"?>
> <sentence>
>    <subject>
>       <pronoun person="first" number="singular>I</pronoun>
>    </subject>
>    <predicate>
>        <verb transitive="false">disagree</verb>
>    </predicate>
>    <terminator>.</terminator>
> </sentence>

If you haven't got a DTD or schema, that's decorated text using some
syntax conventions; that's not actually valid XML.  (Well formed,

> > (and I so wish I'd realized that three years ago when I started
> > explaining an XML content management system to the folks who use
> > it....)
> Ouch. Yeah, never underestimate the incompetence of someone who isn't
> genuinely interested in what they're doing.

To be fair, they're interested in what they are doing; they don't view
what they are doing as using an information quality state machine, is
all. :)

> > Either you're using it as a markup language, in which case you don't
> > need to understand what you're doing beyond (possibly) the aspect of
> > semantic labelling, or you're using it in a 'create a vocabulary'
> > sense, and that's not basic at all; you get to start at the Unicode
> > collation algorithm and go from there.
> Someone can quite happily use XML for years on end without ever
> hearing the term "Unicode collation". We're talking about using XML,
> not building a parser.

I thought we were talking about knowing computer languages and data
representation mechanisms?

If you want to really understand -- and you do need to really understand
-- what you're doing so you can create that XML vocabulary for
cladograms or phylogenetic relationships, you're going to have to
understand what happens to the parsed character data, and how you're
going to represent things like binomial names.  Especially since those
could now legitimately (or could soon legitimately) include non-latin

> > This is potentially cool stuff, and XSL is fundamentally a tree
> > manipulation language, and it's not at all a bad thing to know, but
> > if you're going for a data representation XML vocabulary for some
> > specialized purpose, you will need to understand how the whole stack
> > works from Unicode up through DTDs or schemas through all the
> > various XML rules.
> That's like saying you have to read an unabridged dictionary from
> cover to cover before you can write a short essay. It's simply not
> true. (BTW, since when is XML restricted to Unicode? Did I dream up
> the "encoding" attribute?)

Any encoding beyond utf-8 and utf-16 is an optional implementation
detail; the standard only requires parsers to handle those two. (By
default, utf-8.)  So if you use something that isn't one of those two,
depending on the parser, you don't know what's going to happen.

> > So while an XML vocabulary for cladograms (and some sort of renderer for
> > it) would be of great potential use, allowing simple exchange of large,
> > complex trees, actually building it in a robust way would require you to
> > understand what you're doing.
> <?xml version="1.0"?>
> <clade name="Dinosauria">
>    <clade name="Ornithischia">
>       <species name="Heterodontosaurus tucki"/>
>       <clade name="Genasauria">
>            <clade name="Sauropodomorpha"/>
>            <clade name="Theropoda"/>
>       </clade>
>    </clade>
>    <clade name="Saurischia">
>       <species name="Herrerasaurus ischigualastensis"/>
>       <clade name="Eusaurischia">
>            <clade name="Sauropodomorpha"/>
>            <clade name="Theropoda"/>
>       </clade>
>    </clade>
> </clade>

With no DTD (or schema of some flavour or other), that's decorated text.
There's no way to do anything with it that isn't dependent on human
understanding, which is just what one does not want when trying to build
a rendering system for a data representation.

> >  http://www.w3.org/XML/ gives (I think) a better overview, but then
> >  again I actively use XML in some fairly complex ways so I'm
> >  doubtless biased.
> Well, of course it's a better overview--it's the actual specification
> in all of its detailed, tedious glory. I really don't see that it's a
> better introduction for the uninitiated, though. I was trying to
> provide a Dick & Jane book, not Strunk & White.

I don't think the actual spec is difficult; tedious, I will grant you.
(Well, perhaps _dry_ is better than 'tedious'.)

Anyone going into paleontology as a career seems to have some need of a
capacity to handle 'dry', though, so I would really expect it to be too
much trouble.

> > I think, rather than going for specifics, someone interested in what
> > language to learn for a paleo career might do well to:
> >        - find out what they're using where you want to go to grad school
> >          and learn that
> >        - take a guess at what you want to do and find out what's being used
> >          for that
> >        - stick to open, public file formats for absolutely everything; it
> >          can be an initial hassle but it pays off over time
> Now that I can agree to!

Oh good!

Would hate to drift off into some equivalent of vi vs emacs. :)

-- Graydon