Revision of Getting started as cheaply as possible from Thu, 2008-05-08 21:50

Eliot Kimber suggested some low-cost (if not free) setups for DITA - and others commented on the dita-users mailing list.

A zero-cost option, with nothing to install, to get you familiar with editing DITA Topics and DITA Maps, is to join DITA Users. You use the browser-based DITA Storm editor or the desktop <oXygen/> XML editor with WebDAV access to author structured content in your own online workspace folder.

You can have multiple projects in your personal workspace. Each project includes source files, build files, and output files. Process your files to HTML, PDF, Help, and other publishing formats with the DITA Open Toolkit on the server. DITA Users is SaaS (Software as a Service). Download your deliverables - or advertise links to your folder to exhibit your work online.

Eliot wrote:

I've been thinking a lot lately about one of the singular aspects of
DITA compared to other XML approaches for document creation which is
that it significantly lowers both the cost of entry and the cost of
ownership of very sophisticated systems, systems that even two years ago
were prohibitively expensive for all but the wealthiest enterprises
(that is, enterprises who had sufficient cash or credit to make the
investment necessary to realize the ROI that the use of XML represents).
[By "sophisticated" I mean the information representation and processing
features (e.g., linking, use-by-reference, conditional processing,
etc.), not the sophistication of the supporting tools necessarily--one
of the aspects of XML is that content sophistication is usually much
more important than tool sophistication, as DITA demonstrates.]

Even if you were an early DITA adopter, trying to use DITA 1.0, things
were too expensive and DITA was still not quite sufficiently cooked.

But now, at the beginning of 2008, we have a number of things that,
taken together, make the use of DITA as inexpensive as it could possibly
be, bordering on free (for a certain value of "free"). At a minimum, it
lowers the up-front cash outlay required to get started, although you
still have to do some implementation work to get a useful system. But
the cost of the implementation is a function of the skills and resources
you have available--if you've got somebody on staff who can do the
implementation, then it is truly free. If you don't, you're not spending
any money you wouldn't have had to spend with any other XML approach you
could have chosen, and will likely be able to spend much less than you
would otherwise have had to spend.

The cost-lowering artifacts available in 2008 that weren't there in 2006
include:

- DITA 1.1 fills in most of the critical holes in DITA 1.0, providing a
sufficiently complete solution able to meet most documentation
requirements, certainly in the domain of technical documentation (but
beyond that as well)

- Version 1.4.1 of the DITA Open Toolkit adds some important
functionality (better handling of output organization, support for
chunk=, etc.) as well as providing improved documentation for how to use it.

- New low-cost DITA-aware XML editors, including Syntext Serna and
OxygenXML, provide excellent value for graphical authoring of DITA content.

- A deeper body of community knowledge and published knowledge make it
easier to learn and apply DITA generally.

- More third-party support providing various helpful bits any system
would need.

So, given that, it raises the question of what a low-cost,
production-capable DITA environment might look like. Obviously there are
a number of choices and those choices change day to day as new products
and tools are introduced and existing products are improved.

So here's my question to those who care to offer an opinion: what would
you recommend as a low-cost or lowest-cost system? Let's assume a
10-person or smaller writing team, meaning that their operating budget
is "as little as you can spend and still get your work done".

Here is my recommendation as of 20 Feb 2008, based on my practical
experience with the tools involved and my knowledge of what's available
generally:

Authoring:

Syntext Serna 3.5. Version 3.5 of Serna offers almost as much
functionality as XMetal and Arbortext Editor but at a significantly
lower per-seat cost. It's relatively easy to configure for the use of
local shells and specializations, easier than XMetal or Arbortext. It
still has a few fit and finish issues but it's reliable enough.

OxygenXML is a close second but its graphical editing features,
especially for maps, is not as good as Serna's and it's not enough
cheaper to make it the better value.

Content Management:


Subversion. Subversion is an open-source code control system that
functionally replaces CVS but offers several important new features,
including full support for versioning of binary objects (including UTF-8
and UTF-16-encoded XML), versioning of directories (very important for
DITA where you need flexibility to change how your topics are organized
as you refine your practices), HTTP-based access (avoids issues with
corporate firewalls), easy scripting, and arbitrary per-file metadata
(enables potentially quite sophisticated management features). There are
a number of good open-source and commercial Subversion clients,
including TortoiseSVN, Oxygen's Subversion client, and the subclipse
plug-in for Eclipse.

Coupled with good file organization and naming discipline Subversion can
get you a long way.

Production of Published Output:

DITA Open Toolkit. This is a no-brainer of course. The biggest question
here is whether or not to step up to a commercial XSL-FO implementation
if you're producing PDF. Both XEP and XSL Formatter provide better
results than FOP but either would represent a significant cost relative
to the total cost of the authoring tools, essentially doubling or
tripling the total system dollar cost. But this is where hard
requirements for print quality or features carry sufficient weight to
justify the expense.

Given the above, what would you need to do in order to have something
that could produce production-quality output, assuming you are not using
any non-standard specializations, only local shells?

1. Configure the editor to use your local shells. This requires just
setting up your entity resolution catalogs and creating Serna-specific
templates, which is mostly an exercise in copying. Should take 1/2 day
at most if you know what to do.

2. Set up branding for the HTML output. This involves creating
appropriate CSS style sheets and headers and footers, as well as
creating the appropriate scripts or Ant tasks to use them with the base
Toolkit transform. Time dependent on the complexity of the styling you
want. Say two days max to implement and deploy. 1/2 minimum for simple
style changes.

3. Set up branding for the PDF output. Assuming you're using the PDF2
plug-in, as for HTML, it depends on the complexity of your style
changes, but 1/2 day to 2 days would be typical. The main challenge here
is the lack of documentation on how to do this--it's not hard if you're
already familiar with XSLT and XSL-FO. Would be next to impossible if
you're not.

4. Set up convenience scripts or GUIs by which authors can produce
output. This can be as easy as some simple command line utilities that
just take the directory for a given publication as input or could be
more involved, like a server-based system accessed through a Web-based
front-end. 1/2 day for scripts, more for more sophisticated stuff.

The above is essentially what we are using at Really Strategies for the
product documentation for our RSuite CMS product and it's working fine
so far with a team distributed between the U.S. and China. It's not
ideal but it was cheap and easy to set up.

What would others suggest?

Jim Cain replied:

Authoring tool aside, this is exactly what we are doing to produce
project documentation. We were already using subversion for our source
code, so it was an easy decision to also store our topics and maps for
the project documentation in subversion and treat it as any other
development project.

As for authoring, the system we are currently building is deploying
XMetaL, so we decided to use XMetaL in order to allow us to share a
similar authoring experience as our client. In this case, we wanted to
be able to gain more insight into using the tool that we are asking
our client to use in their system. Beyond this project, we may
consider a cheaper authoring tool, but have not evaluated any others
at this point.

Wrightsell Hughes commented:

We are using CVS Tortoise as a DITA repository and XMLSPY as our
authoring tool. The reason we chose these tools is that they were
already being used in-house and didn't cost us anything. So far, we
are pretty happy with our setup.

 

Steve Andersen said:

I agree with everything you said, with one caveat. If you have access
to a SCMS already (say as part of the development team you work with),
you should use it instead of installing Subversion. Although you
don't have to pay to use Subversion, even if you are using a hosted
system, there are costs to set up and manage it. Not as high as with
a CMS, but it's not free. If you don't have a SCMS set up, unless you
are familiar with Subversion, I think one of the hosting solutions
should be investigated.

Which version of Serna do you think is the minimum required for
authoring in DITA? I think it's the Professional version, but that's
more than double the cost of Personal edition.

RenderX XEP can be purchased for $300, so, I think it's a no-brainer
for PDF generation if that is required. XSL Formatter was, last time
I checked, $1k more, and, although it's made big strides, I'm not sure
if FOP, with the current OT, can produce high enough quality output.

In addition to everything you listed, I think you need a XSLT
development tool. I find it very unlikely that anyone is going to be
satisfied for long using the default stylesheets in the OT. They are
very good, but they are a bit generic for most uses. I prefer oxygen
in that role, and I think that's your preferred tool, also, but
Eclipse does have some nice plugins (like Orangevolt) for XSLT
development that may be good enough.

So, here's the total cost I see:

Serna : $200
DITA OT : $0
xep: : $300
oxygen : $300
Subversion : $0
Total : $800

and what do you have? A WYSIWYG editor, professional quality HTML and
PDF output, version management, and the ability to customize both your
outputs and your inputs.

That's not bad. The nicest part is that, as your team grows, the only
cost increase is the authoring tool.

You want cheap, though?

Eclipse with XMLBuddy and OrangeVolt gives you editing and development
tools. Add in FOP and Subversion for PDFs and version control, and
you have a completely free solution.

 

Hedley Finger suggested other tools:

I have been playing with Serna, oXygen, XMLmind XML Editor (XXE) and
FrameMaker 8 with Scott Prentice's DITA-FMx plugin (replaces the
Adobe DITA plugin that comes with it).** oXygen is great for all the
other stuff around DITA -- XSL, XSLT, XSL-FO conversion, etc. But
out of all the editors, FM8 is the best (but oXygen is always open to
do those source-code jobs.

The tools are cheap. I mean, just take your hourly labour rate by
the number of writers, multiply by a week, month or year, and the
capital costs of the tools are minuscule.

It's the running costs that are huge. If you have a smart staff
member who can do the XSLT stuff and other tweaks (think a clone of
Deborah, Don or yourself), they are not free. And while they are
trying to implement your organisation's branding and document
standards, they are not doing something else productive.

For my money, FrameMaker is both your editor and PDF formatter for
print and, if you already have FM and years of skills with it, then
getting from DITA to your standard document look and feel is a doddle
that makes the XSL-FO route not worth considering, especially when
you are a one-man band like me and just don't have the time to get up
to speed on Ant scripts, XSLT conversion steps, Subversion, and all
the rest of the technology. You can use the standard DITA-OT toolkit
for all your other output.

So the cheapest startup for those currently using Word, FrameMaker,
Robohelp, etc. might be to just fork out for an integrated package
from one of the vendors because you can outsource your formatting and
scripting to them and, if done well, you will have tools that need
not be changed for years in place.

 

Troy Klukewich challenged the savings:

Cheap and easy is not always the same thing. When using open source solutions, it is helpful to have the necessary technical talent on staff. Assuming you have people that are willing to put some time under the hood, inexpensive or free solutions are more readily available. On a previous project, my team used as many open source tools as we could for a structured XML solution similar to DITA. The idea was to own our own sources with complete independence from proprietary tools and vendor lock-in. Even if we did resort to a commercial tool at points, we wanted to be able to freely swap them out (which we did in one case with an XML editor and a DB for tracking statuses).

Like others, we found Subversion with Tortoise to be a great solution both for storing XML content and for setting custom statuses on files. We were able to jettison a cumbersome, commercial database and report off Subversion itself to track file milestones. I ultimately liked Subversion for the simple reason that the writers found the Tortoise integration more intuitive than traditional source control interfaces. Training was easy.

We did buy a commercial WYSIWYG tool for editing, Arbortext Epic. Binaries and intermediary formats (like MIF) were absolutely out, so Structured Framemaker was not an option. One writer insisted on using free Emacs. As long as the content validated against the schemas, we didn't really care what people used to edit the files. For training purposes, though, it is best to standardize on one editor and include pre-built templates for each content type. At various times, we used a number of different tools on the same source, including XMetal, XML Spy, and oXygen.

We could not get away from a commercial solution for robust PDF production. We tried everything that was freely available and found serious problems with scale (into the thousands of pages). We used the Antenna House XSL:FO Processor to generate PDFs direct from XML. It was robust and perfectly reliable. The license was cheap considering all the time we saved debugging problems in free tools.

Once we automated the PDF production, we were extremely happy. We were able to jettison ancient Framemaker sources, intermediary files, and manual futzes and never looked back. Though there was an upfront cost to the XSL:FO expertise we developed, we easily recouped costs many times over with extremely fast PDF production and full compliance with requirements for simultaneous localizations. We pumped the localized XML through the same XSL:FO process. The localized PDFs were essentially free. We no longer required manual adjustments for numerous localized PDFs, which were extremely expensive and slowed time to market.

We used Saxon for the XSLT to HTML transforms and the many other help formats based on HTML. We used Python as a kind glue script to run everything. I would use Ant now. For a recent DITA project, I am using some already licensed tools within our department, plus we can use our own Oracle products. I still hold to the ideal of freely open XML source with swapable components. I do not want lock-in with any vendor tool.

Epic for editing (The DITA integration is worth the price of admission) Saxon for XML processing for Help Oracle XML Publisher for PDF (I've heard it works great and scales) Ant for driving builds DITA OT Perforce for Source Control (Maybe) Oracle UCM for content management down the road, but I would prefer a DITA-aware CMS Perl (free) and PowerGREP (commercial) for miscellaneous regular expression exercises

In most cases, a full XML shop will probably use a mix of commercial and open source solutions, weighing off what is already available inhouse, plus what is easier to buy versus configure oneself.

Hedley Finger defended FrameMaker:

FrameMaker has been able to directly open from and save to XML files
from version 7.2. It can also use XSLT and read/write rules to
transform to/from FM's internal format. Leximation DITA-FMx has more
features than the Adobe DITA plug-in that comes with FM8, and
DITA-FMx works with both 7.2 and 8.0. FrameMake has the best
structure editor/view bar none which is much easier to use. And FM8
now supports Unicode.

If you have existing in-house FrameMaker expertise and licenses, it
might be cheaper to upgrade to FM8 and purchase DITA-FMx
licences. You can easily disable the Adobe DITA plug-in. This is
likely to be cheaper than replacing your investment with another
proprietary editor such as XMetaL or Abortext Author with Antenna
House XSL-FO or RenderX processor. And, instead of having to get
your head around FO, you can use your existing FM skills to format
PDF for print or on-line presentation. In particular, you can
continue to use the much smarter FM cross-reference formats which
will round-trip to DITA <xref>s, and even FM variables instead of
@conref's acting as variabels, although this is deprecated. The
other outputs can use the DITA OT.

FrameMaker also has functionality to assist in converting legacy
unstructured FM content into DITA XML, but it will still require hand
tweaking, as it would with any other converter.

Subversion is a good cheap option and it would be even better when
someone with greater knowledge than me develops an XML-aware diff/merge tool.

I am not decrying other solutions but only suggesting that if you
already have FM it might be the cheaper way to go, both with upfront
costs and on-going costs.

And Troy replied to Hedley:

Great response and I found the information you provided about Frame's more recent capabilities useful.

It is also worth emphasizing, as you point out, that the cost of tools is minimal compared to other costs. Though, it can be fun on a shoestring to see how far one can get with time versus a cash layout.

I find that free is usually not all that free when the cost of time is figured in. Even just setting up the DITA OT for the first time on a fresh Windows machine can take some time. Of course, once it is set up, I find the OT is a reliable, powerful processing factory.

When looking at free tools, I consider if the cost of time is worth the investment versus a commercial option. It is also worth quantifying the commercial value-add versus a free option so we know why we are going commercial.

On a recent DITA conversion project that needed some serious regex processing, I ended up paying for a commercial tool, PowerGREP, for the main reason that it provided a dynamic preview mode with a drill down for mass changes. It is a killer feature and worth the time it saves. Otherwise I would use Perl.

I'm also a fan of Epic's Resource Manager and its integration with DITA. I'm happy looking at raw XML, but day-to-day I'd rather use a unified dialog to build topic paths, conrefs, and links to graphics.

With an open architecture, we can use the right mix of free and commercial tools on the same sources, the best tools for the best purposes.

 

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I