Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Australian Stats Agency Goes Open Source

timothy posted more than 3 years ago | from the as-it-should-be dept.

Australia 51

jimboh2k writes "The Australian Bureau of Statistics will use the 2011 Census of Population and Housing as a dry run for XML-based open source standards DDI and SDMX in a bid to make for easier machine-to-machine data, allowing users to better search for and access census datasets. The census will become the first time the open standards are used by an Australian Federal Government agency."

Sorry! There are no comments related to the filter you selected.

Your mother. (-1, Troll)

Anonymous Coward | more than 3 years ago | (#34584158)

I hope she gets fucking raped.

XML? that's so 1990 (4, Insightful)

goombah99 (560566) | more than 3 years ago | (#34584190)

I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.

Re:XML? that's so 1990 (4, Informative)

goombah99 (560566) | more than 3 years ago | (#34584242)

To see how clean YAML is to reads for humans and to parse by machine look at a Sample Document [wikipedia.org] . And here's something truly impressive, a Yaml Quick reference card [yaml.org] written entirely in YAML itself. Not only is it a marvelously short card, it's human and machine readable. It's a superset of JSON too.

Re:XML? that's so 1990 (1)

martin-boundary (547041) | more than 3 years ago | (#34584292)

Interesting. How does YAML handle validation and user defined grammars?

Re:XML? that's so 1990 (2, Informative)

Anonymous Coward | more than 3 years ago | (#34584390)

Interesting. How does YAML handle validation and user defined grammars?

Multiple ways of varing stringency. For the simple case you can define types (.e.g. floats, ints, or user defined types). For the vast majority of uses that's all you need for validation. Now if you want to define a schema there are several different ones that are used. Kwalify and Rx are two. Finally, there are YAML 2 XML converters. So you can just convert the YAML to XML and use your favorite XML validator. Thus the validation itself other than the types is not baked into the definition and thus people have overlayed flexible approaches to this.

Re:XML? that's so 1990 (0)

tabrisnet (722816) | more than 3 years ago | (#34584296)

Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.

Meanwhile, XML might not be quite as nice as YAML for reading, but it is easier to figure out where you made a mistake, assuming you're pretty printing it (but the best thing is that pretty printing it is unnecessary).

Re:XML? that's so 1990 (4, Insightful)

goombah99 (560566) | more than 3 years ago | (#34584410)

Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.

Oh come on man. This is like the ancient discarded whitespace lament about python. I was once like you before I started writing python. Then I saw the huge huge light of why white space indenting is so great. I could explain but I'm not sure I could have convinced even myself before trying it.

Bottom line. it's freakin easy to get the white space right and any decent editor with context sensitive tabs does it for you. emacs, vim, bbedit, eclipse. Is there any that don't?

This is a NON ISSUE

Meanwhile, XML might not be quite as nice as YAML for reading, but it is easier to figure out where you made a mistake, assuming you're pretty printing it (but the best thing is that pretty printing it is unnecessary).

Ha! you make me laugh. So now we need special editors and printers for XML reading. Were we not just complaining about white space. Now you pretty print to put perfect white space in XML?

Re:XML? that's so 1990 (1)

tabrisnet (722816) | more than 3 years ago | (#34584674)

Notepad, which is so often used by the technically non-clueful. Of which, I seem to work with a few.

Of course, you should use a real editor. This somehow doesn't prevent people from using notepad b/c they don't know better, or using vim but not knowing HOW to use vim and still we lose all indenting.

and I never said you needed a special editor for XML. Not even that you need one for JSON or YAML.

Pretty printing isn't MANDATORY for XML... which is really the point. With it NOT necessary, means you can fuck up the whitespace and NOT break the data. That's what I need.

Python: YMMV (0)

Anonymous Coward | more than 3 years ago | (#34585070)

Then I saw the huge huge light of why white space indenting is so great. I could explain but I'm not sure I could have convinced even myself before trying it.

I tried. And the whitespace syntax still itches me. When the idea was original (I think back in the eighties, anyone remembers Occam?), I thought "cool". Once Python arrived, I thought "meh". Since then, I've written a couple of K lines in Python, and still, I think it's nice to have *two* channels: block structure for the compiler (i.e. (..), {..}, begin..end, whatever) *and* indenting for the humans.

I do appreciate the extra flexibility this gives me.

For me, THIS IS AN ISSUE.

Still, it's a comparatively small issue. There are many things in Python which itch me far more than this.

Not that I think XML is a good idea. I think it's broken beyond repair.

YMMV, as always.

Re:Python: YMMV (2)

Raumkraut (518382) | more than 3 years ago | (#34585194)

When is it ever desirable for indentation to not match the logical structure of a program?
The only possible reason I can come up with is if you're intentionally attempting to obfuscate your code.

Re:Python: YMMV (1)

samjam (256347) | more than 3 years ago | (#34585606)

You're getting close, it's definitely to allow an intentional expression and it's going to be a bug for all people who use white space to express more than just the {}'ness.

I wonder why a language has to enforce something that could have been enforced by the editor for those that value it.

Strictness on this is what kept back so many perl coders and stopped python from ruling the world.

But... I don't mind... and python-ites prefer white space to world domination, so thats good too!

Re:Python: YMMV (1)

Anonymous Coward | more than 3 years ago | (#34585984)

bullshit, idiot. indentation is simple.

Re:Python: YMMV (1)

Rysc (136391) | more than 3 years ago | (#34585622)

I'm with you on the python whitespace thing, but for YAML it's different. We're not talking about writing code here. It can be tricky to get the whitespace right but it's a damn sight easier than learning and reading XML syntax. Remember that 99% of the time machines process these files and we only care to make reading easy (where YAML whitespace is a non-issue) and human editing easy, where it isn't too bad. Composing from scratch by hand isn't really something you're going to be doing with YAML (or XML).

Re:XML? that's so 1990 (1)

Anonymous Coward | more than 3 years ago | (#34584438)

Great for human readability. Terrible (due to some python-like indent rules) for humans to add content to.

Apparently you are not aware that YAML, being a superset of JSON, can be written entirely in JSON, or a mixof the two. in JSON you don't need to use white space. So you use the white space in YAML when it makes sense (nearly always) and when you get into absurd edge cases then you toss in a little JSON syntax when apropos.

So sorry, you just don't have a case to make here unless you want to say something bad about JSON as well.

Re:XML? that's so 1990 (1)

tabrisnet (722816) | more than 3 years ago | (#34584648)

I use JSON (and occasionally YAML), but only for data interchange formats where I don't expect a human to need to modify it.

Yes, I am aware that JSON and YAML are largely related. And I a few times tried to write up files in JSON, just as a mockup of my intended data structure. Yes, I used a real editor with proper tab indenting. It still got to be pretty unreadable. I use Data::Dumper whenever I want the data format to be as explict as possible, but only for debugging.

But it's so much worse than that. XML doesn't NEED the indenting, so if some tard uses wordpad or notepad or something equally stupid to modify the file, he CANNOT mess it up.

Yes, you want to assume that only Real Programmers will be modifying your data. I had to unlearn that theory, and I'm working for a very well known internet company (I'd rather not say, albeit I may have left clues [or even spilt the beans] in other comments).

In particular, I had used XML for a human writable/readable data file (the same project I had tried to use JSON for), and was told a month ago that I had to write a GUI editor for it, or it just wouldn't get used. That and I've watched a few QA contractors and simply how little actual programming they know.

XML also gets a lot of flack b/c it is typically 'too hard to parse' whereas YAML and JSON are intended to be trivially parsed into a natural tree format... On the other hand, I found a perl module (XML::TreePP) that makes XML just as simple to manipulate.

Re:XML? that's so 1990 (1)

downundarob (184525) | more than 3 years ago | (#34584498)

Does anyone else feel like they just looked at some COBOL source when looking at the YAML example?

Re:XML? that's so 1990 (1)

Paua Fritter (448250) | more than 3 years ago | (#34584538)

I'm perplexed why people continue to use XML when there is YAML. What is it that makes XML so attractive as a durable format? it's not human readable in a practicale sense, and YAML very much is. Since it's delimeters are comlicated and variable, It's harder to parse in ad hoc ways than yaml (line and white space) which means that for rapidly extracting things there are no shorcuts to instantiating a whole document. It's hard to grep. And both formats can fully do the other ones job so they are interchangeable.

I would actually dispute all of your comments, but picking up on the last point in bold, one of XML's key features is "mixed content [w3schools.com] ", which is apparently (according to http://yaml.org/xml.html [yaml.org] ) not possible in YAML.

Re:XML? that's so 1990 (0)

Anonymous Coward | more than 3 years ago | (#34584736)

Mixed content I believe is just basically a struct declaration. In yaml you can have a type actually be a local class in the document reader. So when the document is read the class gets it's attributes from the data present. Which is I'd suggest more functional than the dead XML definition. But if all you want is validation then look at any of several schema validators for YAML.

YAML is not for standards (1)

Nicolas MONNET (4727) | more than 3 years ago | (#34584912)

XML is perfectly suitable for long term data storage and exchange. You have namespaces, schemas, and a millions of tools to handle it.

YAML is OK for storing configuration data. It's not even that good for anything else.

Also anyone who "parses in ad hoc ways" deserves to be slapped in the face.

Re:YAML is not for standards (0)

Anonymous Coward | more than 3 years ago | (#34587858)

First off YAML has intrinsic type labels on it's data so in many cases you don't need an external schema to specify it. When you do there are multiple schema protocols available.
Rx [codesimply.com] or Doctrine [doctrine-project.org] or Kwalify [kuwata-lab.com] to name 3.

Re:XML? that's so 1990 (1)

c0lo (1497653) | more than 3 years ago | (#34585116)

I'm perplexed why people continue to use XML when there is YAML.

Can you point to me, please, to the reference on how one can define in YAML the equivalent of a schema?
You know, to act as the "contract" for the data exchange protocol... extensions (to allow 3rd party custom data sections) and namespaces (to isolate the 3rd party extensions that I'm not interested in) would be a real bonus.

Re:XML? that's so 1990 (0)

Anonymous Coward | more than 3 years ago | (#34587358)

I'm perplexed why people continue to use XML when there is YAML.

Can you point to me, please, to the reference on how one can define in YAML the equivalent of a schema?
You know, to act as the "contract" for the data exchange protocol... extensions (to allow 3rd party custom data sections) and namespaces (to isolate the 3rd party extensions that I'm not interested in) would be a real bonus.

Here ya go:
  Kwalify [kuwata-lab.com] or Doctrine [doctrine-project.org] or Rx [codesimply.com]

Re:XML? that's so 1990 (1)

kwerle (39371) | more than 3 years ago | (#34603872)

I'm perplexed why people continue to use XML when there is YAML...

The real answer is: who cares? They're both easy [enough] to parse data formats. It's about as interesting as arguing about what your favorite editor is and why. Or your favorite database. Everyone knows the ins and outs, and nobody cares (except maybe you and the person you're arguing with). We all have libraries. We all have parsers. It really doesn't matter.

The trivial answer to your question is: because YAML is very new in the grand scheme of things. And it's not so different that it's really interesting.

Sample DDI files (0)

Anonymous Coward | more than 3 years ago | (#34584210)

If you want to see some example DDI xml files, check out http://www.colectica.com/ddi [colectica.com] . They have documented several public datasets. It is neat how they can document the survey as well as the data for the 2010 US Census.

The First Time? (2)

digipres (877201) | more than 3 years ago | (#34584224)

"The census will become the first time the open standards are used by an Australian Federal Government agency."

Really?
http://xena.sourceforge.net/ [sourceforge.net]

Re:The First Time? (1)

deniable (76198) | more than 3 years ago | (#34584672)

Next year we'll try to get them to switch to metric and TCP/IP.

Re:The First Time? (1)

wylf (657051) | more than 3 years ago | (#34584812)

I think you'll find the NAA isn't a Federal Government Agency; I think it might be a cultural institution. The distinction is important for things like legislation and accountability.

Re:The First Time? (1)

digipres (877201) | more than 3 years ago | (#34584908)

Um, yeah we are. We're an accountability agency, administered under the FMA and an executive agency under Prime Minister and Cabinet.

http://naa.gov.au/about-us/director-general/index.aspx [naa.gov.au]

Re:The First Time? (1)

wylf (657051) | more than 3 years ago | (#34585682)

I stand corrected - thanks :)

Re:The First Time? (1)

digipres (877201) | more than 3 years ago | (#34585726)

Sanity on Slashdot? What do you think you're doing?

British Commonwealth Apples & Oranges (2)

the_other_one (178565) | more than 3 years ago | (#34584270)

Australia is openly embracing census data and enhancing it's availability.
Canada's government is going out of its way to prevent census data collection.

Re:British Commonwealth Apples & Oranges (1)

Jamz (89107) | more than 3 years ago | (#34584376)

Seems logical - as a Tax Payer, the data should be available to me.
Although I hope its not leveraged too heavily by the commercial sector.

Re:British Commonwealth Apples & Oranges (0)

Anonymous Coward | more than 3 years ago | (#34585630)

Julian, is that you?
--
The CIA

Re:British Commonwealth Apples & Oranges (1)

bryxal (933863) | more than 3 years ago | (#34584572)

Take action to change that!

http://www.liberal.ca/open/ [liberal.ca]

The Liberal Open Government Initiative will:

        * Immediately restore the long-form census;
        * Make as many government datasets as possible available to the public online free of
            charge at opendata.gc.ca in an open and searchable format, starting with Statistics
            Canada data, including data from the long-form census;
        * Post all Access to Information requests, responses, and response times online at
            accesstoinformation.gc.ca; and
        * Make information on government grants, contributions and contracts available through
            a searchable, online database at accountablespending.gc.ca.

For complete details, please read the PDF policy brochure. http://lpc.ca/opengov [lpc.ca]

(Disclaimer, I work as a developper for the party)

Re:British Commonwealth Apples & Oranges (1)

Noughmad (1044096) | more than 3 years ago | (#34584650)

Australia is openly embracing census data and extending it's availability.
Canada's government is going out of its way to extinguish census data collection.

FTFY

but the govt still thinks sharing is bad (1)

chronoss2010 (1825454) | more than 3 years ago | (#34584398)

and open source you share your code freely to help everyone....WOW isn't this a oxymoron gov't

Re:but the govt still thinks sharing is bad (1)

c0lo (1497653) | more than 3 years ago | (#34585156)

but the govt still thinks sharing is bad

This is why an Australian invented Wikileaks... I mean... "information wants to be free" and such...

and open source you share your code freely to help everyone

Hey, where does it say that they'll share the code? TFA quote:

with the ABS directing software developer Space-Time Research to utilise the standards for both input and output of all data collected next year.

So:
1. it is the data that will be shared (govt takes preemtive - still legal - actions against Wikileaks? ;) )
2. the guys that are doing the software is Space Time Reseach [spacetimeresearch.com] - the way I know, a bit far from a open source establisment (note: I have no affiliation with them)

How anonymised will the data be? (0)

Anonymous Coward | more than 3 years ago | (#34584426)

The best part of having statistics is the ability to find correlations between different sets of data. For instance, do people living in suburbs with greater access to parklands live healthier lives? What is the household income for people who are over the age of 65, own 3 or more properties and use public transport on a regular basis?

I am guessing that the data they plan to release will be anonymised to a level that makes finding correlations very hard or impossible to accomplish? You can obtain some high level correlations by looking at data on a suburb by suburb basis however for much of the data they collect, suburb of residence isn't an important factor.

Re:How anonymised will the data be? (1)

sjwt (161428) | more than 3 years ago | (#34584728)

it looks like they want all data up, the only data not collected is names and addresses, you can use any of the questions to define your sets.

"DDI and SDMX are good at describing things, and we're testing the very notion that you can actually consume this stuff and make it discoverable metadata for your search engines."

"We definitely want to see who's keen, who's interested in statistics and metadata, open data, data linking and what people can do with it as well."

Meanwhile, in other agencies and private (1)

dbIII (701233) | more than 3 years ago | (#34584440)

Meanwhile, in other government agencies and private enterprise there are open file formats such as the geophyical SEGD and SEGY formats that have been used since at least the 1980s. That means you can read data files from 1982 on current software.
Closed file formats are an "innovation" of Microsoft and similar companies. It's really any different from the bastards that write unreadable code in an attempt to provide job security.
hopefully in the future some of the practices of elements of Microsoft and many others will be remembered like the claim salters and others with "sharp" business practices in the old west.

74% of people don't believe in statistics anyway (1)

canatech (982314) | more than 3 years ago | (#34584534)

We should find out what percentage of the population thinks that this is a good idea....

Hope they've studied Munich's woes... (1)

bogaboga (793279) | more than 3 years ago | (#34584616)

...and here's why:

It's official - Munich Linux migration is "dead - abandoned in all but name." - Linux

Yes, you read right: "Dead - abandoned in all but name". [fixunix.com]

Actually open source seems fine in Munich (2)

perpenso (1613749) | more than 3 years ago | (#34584764)

Munich Linux migration is "dead - abandoned in all but name."

Last I heard it was a migration to open source and they were successfully using open source desktop applications. The operating system may be Windows rather than Linux but this still seems to be a victory for open source. On the desktop the applications are far more important than the operating system.

Re:Hope they've studied Munich's woes... (1)

icebraining (1313345) | more than 3 years ago | (#34585400)

Open source standards, no open source code. Very different issue.

Open source, or open standards? (1)

nOw2 (1531357) | more than 3 years ago | (#34584784)

There is some difference. I'm not clear from the summary exactly what's going on.

Re:Open source, or open standards? (1)

c0lo (1497653) | more than 3 years ago | (#34585186)

TFA mentions "open standards" in the opening and only once. I reckon the reporter (or the proof-readers? or editor?) had a slip-of-fingers on the keyboard. 'Tis clear they speak of Open Standards rather.

I'm curious to see.. (1)

outsider007 (115534) | more than 3 years ago | (#34585030)

How many Jedi's currently live in Australia.

Re:I'm curious to see.. (1)

c0lo (1497653) | more than 3 years ago | (#34585202)

How many Jedi's currently live in Australia.

None: for the moment, Assange is retained by the dark side of the force and too dry Australia is for master Yoda.

YAML is not the answer (1)

Anonymous Coward | more than 3 years ago | (#34585274)

As the author of the Perl module YAML::Tiny, and the current maintainer of the original YAML.pm I call troll on the parent.

YAML as a specification is way more complex than XML and it's way harder to implement.

And who in their right mind is going to read the raw census statistical quads directly? The point is moot.

XML is ideal for machine to machine communication. It's easily machine readable, and easily debuggable by nerds (which is the bit of "readable" that really matters here). And machine readable is what the ABS has in mind here as their goal.

Adam K (too lazy to log in)

Incorrect Summary (1)

Anonymous Coward | more than 3 years ago | (#34585566)

The census will become the first time the open standards are used by an Australian Federal Government agency.

What the hell are you talking about? We use a variety open standards every day of every minute across every department with any modern IT assets, I think what you meant to say was the first time that open standards are being used by an Australian Federal Government agency to communicate with the general public. Even then, it's not exactly news, it was going to happen eventually.

Australian Federal Govt. is still closed-source (0)

Anonymous Coward | more than 3 years ago | (#34602266)

Many Australian federal agencies that I know still use closed-source products. All databases run Oracle, accounting: SAP. all desktops: Windows XP and they use that XP to manage hundreds of Redhat machines (via Putty). Every product they install has to be be "Enterprise" edition or "corporate" If it's not a proprietary product coming from a large company they are not going to deploy it. That's why companies like: Cisco, Juniper, HP, IBM, Oracle/ Sun, Redhat, SAP are big in Canberra (home of Australian federal government). It's basically a license to print money.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?