I am going through the Lamport Diffie One-time signature.
As a side note; this work is an early effort that culminated in what is known as "Stateful Hash Based Signature Methods"; modern designs include XMSS and LMS, and are surprisingly close to what's in the Merkle paper.  You may find the Wiki link more approachable...
In any case, to address your questions:
- How can long messages (greater than 100 bits) can be mapped into short messages (100 bits) by a one way function and only the short message signed?
Not to sound overly simplistic:
- We take the long message to sign 
- We use that as input to a 'one way function' (modern terminology: a hash function), which generates a fixed sized output (in this example, 100 bits) 
- We take that 100 bit output, and apply the signature algorithm to it. 
This is not specific to hash based signatures; essentially all signature algorithms (at least, any I've heard of) use a variant of this as a first step.  The only difference is that 100 bits would not be considered sufficiently long; we generally insist on much longer hash outputs (because the attacker's capabilities have grown greatly since 1979, hence we need to make our systems stronger).
The nonobvious question is "why is this secure"?  Well, if we assume that the attacker cannot find another message that the one-way function maps to the same 100 bits, and the attacker cannot generate a signature for that different set of 100 bits that the attacker's message maps to, then it is safe.
Actually, in practice, we worry about collision attacks, where the attacker has control over the valid message being signed (but he still needs to produce a forgery, that is, a valid signature for a message that was not signed); Merkle does consider that, however lets keep things simple for now.
- "the message can be encrypted with a newly generated random key ... "  Can someone explain this to me?
Well, this part can probably be skipped - this is one section of the paper that modern crypto doesn't follow.
Merkle tries to define a 'certified encryption function', and uses specific assumptions to do so, including a randomized input; message being signed are not randomized, and so he inserts an encryption function (with a random key) to make it match his assumptions.
In modern crypto, we don't do this - we generally use an off-the-shelf hash function (such as SHA-2 or SHAKE) and use that - those have been designed (and cryptanalyzed) to meet the various assumptions for cryptographic hash functions; hence we (as a signature designer) don't have to worry about that.
To be fair to Merkle, this may be the very first time anyone has tried to work out what a 'collision-resistant hash function' would look like, and he made a fairly decent attempt - it's just not the direction later cryptographers took it.
- "The method as described thus far suffers from the defect that B can alter m by changing bits that are 1's into 0's..."  Can someone explain how this fixes the problem.
Well, let us consider the simplest case; signing a single bit.  In the "method described thus far", to generate a private key, we'd pick a random value $x$, and apply a random function $F$ to it, generating the public value $F(x)$.  Then, to sign a '1' bit, we would produce as a signature $x$ (which the the verifier would be able to validate that $F$ maps that to the public key).  However, to sign a '0' bit, we wouldn't reveal anything.  It is obvious that someone who sees the valid signature of '1' could convert it into a signature of '0' (or, for that matter, generate a signature of '0' without seeing anything).
In the Lamport scheme, to sign a single bit, we generate two random values $x$ and $x'$, and publish as a public key the values $F(x)$ and $F(x')$.  To sign a 0, we produce $x$; to sign a 1, we produce $x'$; someone with the signature of 1 cannot generate a signature of 0 (because they'd have to produce $x$, and they don't know that).
Now, this works, however it is quite expensive (you end up needing a public key that has twice as many hash outputs as bits your hash function produces - for example, if your hash function outputs 100 bits, this has a public key that consists of 200 hashes).  The paper does go on to show ways to reduce it, as mentioned in sections 4 and 5 of the paper (which are what is used in practice).